Please use this identifier to cite or link to this item:
http://localhost:8081/xmlui/handle/123456789/8901
Title: | CLUSTERING MULTI-DIMENSIONAL DATA STREAM OBJECTS |
Authors: | Narsingh, Pardeshi Bharat |
Keywords: | ELECTRONICS AND COMPUTER ENGINEERING;ELECTRONICS AND COMPUTER ENGINEERING;ELECTRONICS AND COMPUTER ENGINEERING;ELECTRONICS AND COMPUTER ENGINEERING |
Issue Date: | 2010 |
Abstract: | Clustering is an important data mining technique. It is an unsupervised learning process of grouping data objects meaningfully. Data streams are temporally ordered, fast changing, high-dimensional and potentially infinite volumes of data. Clustering of data streams is however a non-trivial task because of their dynamic, high-dimensionality and voluminous nature. Existing clustering algorithms are not able to accurately cluster such data streams. Thus, the existing data stream clustering algorithms must be improved so that they are able to mine data stream objects as they arrive. The proposed research work aims at the development of improved data stream clustering algorithm. The major objective of this work is to achieve improvement in terms of clustering purity considering the time complexity. Based on partitioning technique, an algorithm termed as Partitioning-based Improved Stream (PartIS) Clustering has been proposed. This algorithm merges or splits the clusters dynamically depending on the arriving data stream objects. Using Hierarchical Clustering methodology, an algorithm termed as Hierarchical-based Improved Stream (HIS) Clustering is proposed. By projecting data objects into a high-dimensional grid structure, this algorithm performs hierarchical clustering to obtain reasonable results. Using density based approach, an algorithm termed as Denisity-based Improved Stream (DenIS) Clustering is proposed. This algorithm is able to discover clusters of any arbitrary shape along with .proper discrimination of outliers. Finally DenIS Clustering algorithm is parallelized on CUDA to achieve computational speedup. The proposed work has been implemented on Linux platform using C Language. The parallel algorithm for exploiting CUDA technology is implemented using NVidia CUDA C on Quadro FX 3700 Graphics Card. All the experiments are performed on an Intel(R) Xeon(R) E5420 CPU having 16GB of RAM. iii |
URI: | http://hdl.handle.net/123456789/8901 |
Other Identifiers: | M.Tech |
Research Supervisor/ Guide: | Toshniwal, Durga |
metadata.dc.type: | M.Tech Dessertation |
Appears in Collections: | MASTERS' THESES (E & C) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ECD20116.pdf | 4.11 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.