IMPROVING TEXT CLASSIFICATION ACCURACY USING BACKGROUND KNOWLEDGE

Bakshi, Neeraj

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/9886

Full metadata record

DC Field	Value	Language
dc.contributor.author	Bakshi, Neeraj	-
dc.date.accessioned	2014-11-21T04:51:21Z	-
dc.date.available	2014-11-21T04:51:21Z	-
dc.date.issued	2005	-
dc.identifier	M.Tech	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/9886	-
dc.guide	Garg, Kum Kum	-
dc.description.abstract	Text classification is the task of classifying documents into a certain number of pre-defined categories or classes. Automatic text categorizers use a corpus of labeled textual strings or documents to assign the correct label to previously unseen strings or documents. Often the given set of labeled examples, or "training set", is insufficient to solve this problem as text classification learning algorithms require a large number of hand-labeled examples (training examples) to learn accurately. Labeled data are expensive to collect, as a human must take the time and effort to label it. In this dissertation, we present an approach to this problem wherein readily available information is incorporated into the learning process to allow for the creation of more accurate classifiers. This additional information is termed as "background knowledge". A framework for the incorporation of background knowledge into three distinct text classification learners is provided. In the first. approach, the background knowledge is used as a set of unlabeled examples in a generative model called Expectation Maximization (EM). The second approach called Co-training used with SVMs is to build two classifiers and the better prediction of the two classifiers is added to the labeled set for achieving more accurate classification. Lastly, the text classification task is seen as one of integration. by information using WHIRL, .a tool that combines database functionalities with techniques from the information-retrieval literature. The results show that text classification accuracy is improved considerably by using background knowledge. The system runs on Linux Fedora Core-1 environment with Pentium-IV 2.40 GHz processor and 256MB RAM. The languages used for code development are C and. Pert.	en_US
dc.language.iso	en	en_US
dc.subject	ELECTRONICS AND COMPUTER ENGINEERING	en_US
dc.subject	IMPROVING TEXT CLASSIFICATION ACCURACY	en_US
dc.subject	BACKGROUND KNOWLEDGE	en_US
dc.subject	TEXT CLASSIFICATION	en_US
dc.title	IMPROVING TEXT CLASSIFICATION ACCURACY USING BACKGROUND KNOWLEDGE	en_US
dc.type	M.Tech Dessertation	en_US
dc.accession.number	G12380	en_US
Appears in Collections:	MASTERS' THESES (E & C)

Files in This Item:

File	Description	Size	Format
ECDG12380.pdf		4.66 MB	Adobe PDF	View/Open

Show simple item record