DSpace Repository

FRAMEWORK FOR WEB DOCUMENT CLASSIFICATION BASED ON NAIVE BAYESIAN CLASSIFIER USING VOTING METHOD

Show simple item record

dc.contributor.author Rajesh, G.
dc.date.accessioned 2014-11-28T06:24:08Z
dc.date.available 2014-11-28T06:24:08Z
dc.date.issued 2008
dc.identifier M.Tech en_US
dc.identifier.uri http://hdl.handle.net/123456789/11825
dc.guide Joshi, R. C.
dc.description.abstract Automatic web document classification is the process of assigning a web documents to one or more predefined category. With the continuous increase of the information available in the World Wide Web (WWW) the importance of the web page classification problem grows significantly. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined classes. In this approach, a human expert performs the classification task, and alternatively, supervised classifiers are used to automatically classify document. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place, thus we can reduce this human participation . In this dissertation we propose a framework for web document classification by solving the semantic and structured keywords. The proposed system is based on Naive Bayesian (NB) classifier using a voting method on two different feature selection methods. The system uses both latent semantic indexing (LSI) and structure-oriented weighting technique (SWT) for feature selection and, training and classification is performed using Naive Bayesian classifier. The latent semantic indexing method projects terms and documents into a Boolean term-document matrix to find latent information in the document. At the same time, we also use the structure-oriented weighting technique which project terms and documents into weighted term-document matrix. These two features are sent to the NB classifier for training and testing respectively. Based on the output of the NB classifier, a voting method is used to determine the suitable class of the web page. By using the Voting method, we are taking the advantages of both semantic relationship between terms and documents and structure of the html document to improve the classifier accuracy. The proposed framework describes training and learning the classifier on two different feature vectors. These methods have been evaluated using yahoo directories web pages based on three parameters — recall, precision and F-measure. The results show that the proposed method works significantly better than the considering LSI features and SWT features separately. iii en_US
dc.language.iso en en_US
dc.subject ELECTRONICS AND COMPUTER ENGINEERING en_US
dc.subject WEB DOCUMENT CLASSIFICATION en_US
dc.subject NAIVE BAYESIAN CLASSIFIER en_US
dc.subject VOTING METHOD en_US
dc.title FRAMEWORK FOR WEB DOCUMENT CLASSIFICATION BASED ON NAIVE BAYESIAN CLASSIFIER USING VOTING METHOD en_US
dc.type M.Tech Dessertation en_US
dc.accession.number G13919 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record