Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/2182
Title: DENSITY BASED CLUSTERING OF STREAMING DATA USING WEIGHTING SCHEME
Authors: Salim, Mohammad
Keywords: CLUSTERING
DATABASE
ALGORITHM
ELECTRONICS AND COMPUTER ENGINEERING
Issue Date: 2012
Abstract: Clustering has been widely researched in database, statistics, data mining, machine learning, biology, and marketing communities. It has been an important but difficult task in the domain of data streams. For analysis of stream data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. Limited memory availability and real time query response requirements poses great challenges to data stream clustering. Although a large number of clustering algorithms for data streams have been proposed but they do not offer complete solution to the special requirements implied by data streams. Some of these algorithms can find only spherical shaped clusters and some need the number of clusters as an initial input for the algorithm. Density based clustering algorithms can find arbitrarily shaped clusters and they do not need the number of clusters in advance as well. Most of these algorithms assume that all the dimensions of the streaming data have equal weight and so these algorithms treat all the dimensions equally in the process of the clustering. Practically, some of the dimensions of the data stream may play crucial role in the process of clustering while some of them may be just useless. Hence, assigning weights to the different dimensions of the streaming data based on the importance of the dimension may improve the results of the clustering. In this dissertation, method for clustering the data stream with weighted dimensions is proposed. The algorithm is based on DenStream which is a density based data stream clustering algorithm. In this algorithm a weight is assigned to each dimension of the data stream based on the importance of that dimension. The importance of a dimension is estimated by the density of data in that dimension. More is the density of data along a dimension more will be its importance. Then the distance measures and other measurements during the clustering process are performed by taking into account the weights of the dimensions also. Experimental results show that the purity of clusters improves by considering the dimensional weights.
URI: http://hdl.handle.net/123456789/2182
Other Identifiers: M.Tech
Appears in Collections:MASTERS' DISSERTATIONS (E & C)

Files in This Item:
File Description SizeFormat 
ECDG21948.pdf2.31 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.