Please use this identifier to cite or link to this item:
http://localhost:8081/xmlui/handle/123456789/1781
Title: | TIME SERIES DATA MINING-A NEW PARADIGM |
Authors: | Toshniwal, Durga |
Keywords: | ELECTRONICS AND COMPUTER ENGINEERING;TIME SERIES DATA MINING;MODEL TIME SERIES DATA;DATA MINING |
Issue Date: | 2005 |
Abstract: | Many business and scientific domains require the collection and analysis of time series data. Some typical application domains are finance, sales, biometrics, and weather forecasting. Data mining when performed on time series data is called time series data mining. In the last decade, there have been several attempts to model time series data, design query languages for it, and to develop access structures for efficient storage and retrieval of time series data. The work presented in this thesis is an effort to propose new and efficient techniques for feature extraction from time series data, similarity search in time sequences, clustering and association rule mining from time series data. In the first part, novel feature extraction techniques for time series data have been proposed using first moments, second (time weighted) moments and cumulative variation in slopes. The techniques that employ moments are based on the observation that a high dimensional time sequence can be represented as a point called centroid (meaning center of gravity) in the 2-dimensional space. Thus time sequences are mapped as centroids in the 2-dimensional plane. The techniques for feature extraction based on cumulative variation of weighted slopes utilize the weighted sum of variation of slopes computed at corresponding points of the time sequences. The weights assigned to the slopes are dependent on the location of the slope along the time axis, thus meaning that a particular slope exists at a certain position along the time axis. The proposed schemes for feature extraction have been found to be faster than those existing so far for this purpose. This can be concluded from the time complexities of the various feature extraction techniques. Moreover our suggested techniques are simple to understand and implement. The next part of the thesis deals with the problem of similarity search in time series data. We have suggested new techniques for similarity search in time series data and have proved them to be superior to the competing techniques. Two of the proposed techniques for similarity search rely on moments. Centroids are computed using these moments (first or second). These techniques are based on the observation that similar time sequences would have their centroids close to each other. Our proposed techniques for similarity search from time series data based on moments are simple to understand, easy to visualize and more time efficient as compared to their existing counterparts. Moreover they are capable of handling variable length queries, horizontal and vertical shifts between the time sequences, global scaling or shrinking of the time sequences both along the amplitude or the time axes. They are also capable of handling flexible distance measures including the weighted Euclidean distance measure which is not possible with most other techniques that have been suggested so far. Two more similarity search techniques have been suggested that employ cumulative variation in slopes or cumulative variation in time weighted slopes for assessing similarity in time sequences. Similar time sequences would have their cumulative variations in slopes to be very small. Ideally for exact matches, this parameter would evaluate to zero. The cumulative variation in time weighted slopes also are based on the same idea. The only difference is that the variation in slopes is assigned weights in depending upon their location along the time axis. These slope based techniques also have all the advantages that have been mentioned earlier for moment based methods of similarity search. New schemes for clustering time sequences have been proposed in the later part of the thesis. These techniques are based on the concept of whole sequence clustering. The sequences are mapped to a 2-dimensional plane as points called centroids using first or second moments. These points are then clustered using the &-means clustering algorithm. Another clustering technique has been proposed based on cumulative weighted slopes. The sum of weighted slopes is used for feature extraction from time sequences. These cumulative slopes are then clustered using the &-means clustering algorithm. In the last portion of the thesis, techniques for association rule mining from time series data have been proposed. They are based on discretizing the time sequences. The time series are divided into all possible subsequences using a fixed size window. Feature extraction is then done from these subsequences using the method of first or second moments or cumulative weighted slopes. These features are then clustered using A>means clustering algorithm. Each cluster represents a basic shape. The clusters (shapes) with high frequency are then used for mining association rules from time series data. We have used the Apriori algorithm to prune the candidate set of shapes. Our proposed technique is capable of global as well as local rule discovery within one particular time sequence or across multiple sequences. Our work also includes extensive implementation examples that have been generated synthetically specially for testing the suggested techniques. Real life case data has also been used for experiments in every proposed technique. |
URI: | http://hdl.handle.net/123456789/1781 |
Other Identifiers: | Ph.D |
Research Supervisor/ Guide: | Joshi, R. C. |
metadata.dc.type: | Doctoral Thesis |
Appears in Collections: | DOCTORAL THESES (E & C) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
TIME SERIES DATA MINING - A NEW PARADIGM.pdf | 83.18 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.