Please use this identifier to cite or link to this item:
http://localhost:8081/xmlui/handle/123456789/11989
Title: | INCREMENTAL APPROACH FOR TEXT CLASSIFICATION |
Authors: | Modi, Shweta |
Keywords: | ELECTRONICS AND COMPUTER ENGINEERING;TEXT CLASSIFICATION;STEMMING OF WORDS;CLUSTERING |
Issue Date: | 2009 |
Abstract: | Text documents are generated from various businesses, research, government and other organizations as they store data in digital form. Classification is a supervised grouping of data. Text classification is a method of associating one (or more) predefined categories to a particular document. In many applications the data keeps getting generated over time. Under these circumstances, the traditional text classification methods may be incapable to deal with. Therefore,c are required in such cases. In this thesis, we propose an algorithm for incremental text classification. The text documents have been preprocessed before applying classification techniques to them. Preprocessing, involves stopwords removal and stemming of words. The porter's stemming analyzer has been used for the purpose of word stemming. After stopwords removal and stemming of words, the documents are converted into vectors on the basis of term frequencies and inverse document frequencies. To obtain the class labels for the classifier, we have applied k-means clustering to the dataset. The clustering also results into extraction of relevant terms for the dictionary. With each increment, new terms get added to the dictionary. The newly added terms are assigned unit weights whereas the weight for the terms.in the dictionary is reduced. This process continues as more and more data keeps getting generated. The idea of weights is used to show the incremental evolution of dictionary. The proposed algorithm is applied on real case data collected from Google sports news collected over fixed interval of 1 month |
URI: | http://hdl.handle.net/123456789/11989 |
Other Identifiers: | M.Tech |
Research Supervisor/ Guide: | Toshniwal, Durga |
metadata.dc.type: | M.Tech Dessertation |
Appears in Collections: | MASTERS' THESES (E & C) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ECDG14550.pdf | 5.88 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.