CLUSTERING MULTI-DIMENSIONAL DATA STREAM OBJECTS

Narsingh, Pardeshi Bharat

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/8901

Title:	CLUSTERING MULTI-DIMENSIONAL DATA STREAM OBJECTS
Authors:	Narsingh, Pardeshi Bharat
Keywords:	ELECTRONICS AND COMPUTER ENGINEERING;ELECTRONICS AND COMPUTER ENGINEERING;ELECTRONICS AND COMPUTER ENGINEERING;ELECTRONICS AND COMPUTER ENGINEERING
Issue Date:	2010
Abstract:	Clustering is an important data mining technique. It is an unsupervised learning process of grouping data objects meaningfully. Data streams are temporally ordered, fast changing, high-dimensional and potentially infinite volumes of data. Clustering of data streams is however a non-trivial task because of their dynamic, high-dimensionality and voluminous nature. Existing clustering algorithms are not able to accurately cluster such data streams. Thus, the existing data stream clustering algorithms must be improved so that they are able to mine data stream objects as they arrive. The proposed research work aims at the development of improved data stream clustering algorithm. The major objective of this work is to achieve improvement in terms of clustering purity considering the time complexity. Based on partitioning technique, an algorithm termed as Partitioning-based Improved Stream (PartIS) Clustering has been proposed. This algorithm merges or splits the clusters dynamically depending on the arriving data stream objects. Using Hierarchical Clustering methodology, an algorithm termed as Hierarchical-based Improved Stream (HIS) Clustering is proposed. By projecting data objects into a high-dimensional grid structure, this algorithm performs hierarchical clustering to obtain reasonable results. Using density based approach, an algorithm termed as Denisity-based Improved Stream (DenIS) Clustering is proposed. This algorithm is able to discover clusters of any arbitrary shape along with .proper discrimination of outliers. Finally DenIS Clustering algorithm is parallelized on CUDA to achieve computational speedup. The proposed work has been implemented on Linux platform using C Language. The parallel algorithm for exploiting CUDA technology is implemented using NVidia CUDA C on Quadro FX 3700 Graphics Card. All the experiments are performed on an Intel(R) Xeon(R) E5420 CPU having 16GB of RAM. iii
URI:	http://hdl.handle.net/123456789/8901
Other Identifiers:	M.Tech
Research Supervisor/ Guide:	Toshniwal, Durga
metadata.dc.type:	M.Tech Dessertation
Appears in Collections:	MASTERS' THESES (E & C)

Files in This Item:

File	Description	Size	Format
ECD20116.pdf		4.11 MB	Adobe PDF	View/Open

Show full item record