Please use this identifier to cite or link to this item:
http://localhost:8081/jspui/handle/123456789/19441| Title: | APPLYING DATA MINING TECHNIQUES FOR SPATIOTEMPORAL ANALYSIS OF URBAN DATA |
| Authors: | Aggarwal, Apeksha |
| Keywords: | Spatio-Temporal, Data, Urban, Anomaly, Frequent, Itemset, Clustering, Prediction, Recommendation, Mining, Environment, K-means, Air Quality, Pollution. |
| Issue Date: | Dec-2019 |
| Publisher: | IIT Roorkee |
| Abstract: | Urban areas generate enormous volumes of heterogeneous data every second. Such data includes GPS records, traffic data, air quality parameters, meteorological data, and so on, which can be utilized for the betterment of cities. Numerous data mining algorithms have been proposed in the past to process these datasets, extract useful information from them, and apply the extracted information to solve problems of the urban world. However, processing such large heterogeneous datasets requires the design of efficient data mining algorithms facilitating reduced execution time and faster processing architectures. The dependency of these datasets on time and location further makes the designing task more complex. The present work proposes different data mining algorithms considering spatio-temporal aspects, that are useful for solving challenges of urban areas utilizing a variety of urban datasets. Primarily this work addresses the environmental issues, like the accurate prediction of future urban air quality, promoting the use of public transport, and so on. A few of these urban challenges proposed in this work are discussed further. Anomalies are those events of interest which rarely occur in the data. Some such rare events are restricted to particular space and time, thus identifying them would be extremely beneficial so that in future such incidents can be avoided. In our first objective, a hybrid of proximity and clustering-based anomaly detection approach to extract anomalies from urban air quality data is suggested. Firstly, the dataset consisting of approximately 1 million records is spatio-temporally segmented and feature vectors are generated. Secondly, the partitioning based clustering method is applied because it can exhaustively partition the data into small chunks. K-means is explicitly used because k-means repeatedly changes the centroid at each iteration, and generate the sufficient number of clusters even when the overall distance between points is not very large, unlike hierarchical and density-based methods. Further, k-means is simple, efficient and converges fast given small data, which in our case are the chunks. The Gaussian distribution property of the real-world data set is utilized further to segregate out anomalies.The results depicted twofold advantages of our approach, by efficient extraction of anomalies and with increased accuracy by reducing the number of false alarms. In our second objective, an algorithm to extract the most frequently occurring patterns from various heterogeneous datasets is proposed. We target to identify and extract the frequent patterns from time and location-aware spatio-temporal transactional data. To extract frequent patterns from large databases, most existing algorithms demand enormous amounts of resources. The present work proposes a spatio-temporal frequent pattern mining algorithm using hashing, to facilitate reduced memory access time and storage space. Hash based search technique is used to fasten the memory access by directly accessing the required spatio-temporal information from the schema. There are numerous hash based search techniques that can be used. But to reduce collision, direct address hashing is focused primarily in this thesis. Varied approaches have been proposed in the past to extract frequent patterns efficiently, but we suggest a generalized approach that can be applied to any numeric spatio-temporal transactional data, including air quality data. A detailed experimental evaluation is carried out on the synthetically generated datasets, benchmark datasets, and real-world datasets. Knowledge of current and past air quality trends plays a decisive role in mitigating future air quality levels up to a certain level. Varied approaches have been suggested lately to predict future air quality levels. We have proposed an ensemble of various machine learning classifiers from spatio-temporally partitions to predict future air quality in the third objective. Furthermore, with the revelation of deep learning models, air quality problem have been addressed far more effectively and efficiently than the traditional machine learning models. Specifically, Long Short Term Memory (LSTM) networks are a major breakthrough in understanding the complex sequential behavioral dependencies of the time series. To address the aforementioned issues, a hybrid deep learning framework is proposed. Literature suggests that one of the primary reasons for degrading air quality is the amount of traffic on the roads. The fourth objective encourages the more use of public transport over private vehicles to save the environment, energy, and resources by suggesting an improvement in the infrastructure of bus services. Transportation systems are called the lifeline of any urban area. Major transportation systems include cars, taxis, buses, trams, etc. which carries most of the local transport in a city. Further, the use of a modified clustering method is proposed in this work to select clusters intelligently. Taxi data collected from New York is used to analyze the presence of traffic in the area. Spatio-temporal data segmentation concerning different time zones is performed considering the dynamic patterns of urban traffic. Clustering is applied to the segmented data to form clusters for each time zone to identify areas of high traffic. Utilizing the knowledge from extracted clusters, upgradation of public infrastructure is done for the places with high traffic density and no bus stops. |
| URI: | http://localhost:8081/jspui/handle/123456789/19441 |
| Research Supervisor/ Guide: | Toshniwal, Durga |
| metadata.dc.type: | Thesis |
| Appears in Collections: | DOCTORAL THESES (CSE) |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| APEKSHA AGGARWAL 15911013.pdf | 8.77 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
