Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/6575
Title: ADAPTIVE BAYESIAN APPROACH FOR CLASSIFICATION OF DATA STREAMS
Authors: Mishra, Parthsarthi
Keywords: ELECTRONICS AND COMPUTER ENGINEERING;ADAPTIVE BAYESIAN APPROACH;DATA STREAMS;SENSOR NETWORKS
Issue Date: 2011
Abstract: Data Streams are temporally ordered, fast moving, massive, and potentially infinite in nature. They may be generated at high rates as a result of measurements generated continuously by sensor networks, web logs, computer network traffic etc. The storage, querying and mining of data streams are highly computationally challenging tasks. Classification is a problem of supervised grouping of data in order to extract meaningful patterns. Data streams are too large to fit in main memory and are typically stored in secondary storage devices. Besides the considerations of running time and memory usage another important issue that is important in dealing with data streams is that of Concept Drift i.e. change in the underlying data distributions. Linear scans are the only cost-effective access method for data streams as random access is prohibitively expensive. Also, there is a need for an efficient summarization technique to maintain the past data leaving enough memory for processing of future data. The classification algorithm needs to be incremental in nature in order to account for the underlying changes in the data distributions (concept drift). Thus there should be a mechanism to update the summary in order to keep the classifier sensitive to such changes. These issues make classification of data stream a very challenging task. In our work, an adaptive classification model has been proposed that dynamically evolves with the data stream thus providing improved results. Our method is based on the Naive Bayesian classifier. Naive Bayesian classification is a probabilistic technique used to classify data based on Bayes' theorem. Supervised microclusters provide an efficient approach. to store the summary of past data. This summary is then used to determine probabilities for Naive Bayesian classifier. A novel class detection approach has also been proposed that determines new class points by delayed classification of data points in the chunk. This accounts for the concept drift in the data stream. The empirical results show that higher classification accuracy is achieved as compared to the static methods. In the present form, the proposed work is applicable to numerical data streams only.
URI: http://hdl.handle.net/123456789/6575
Other Identifiers: M.Tech
Research Supervisor/ Guide: Toshniwal, Durga
metadata.dc.type: M.Tech Dessertation
Appears in Collections:MASTERS' THESES (E & C)

Files in This Item:
File Description SizeFormat 
ECED G21018.pdf2.05 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.