Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/12501
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAgarwal, Nikhil-
dc.date.accessioned2014-12-01T07:29:31Z-
dc.date.available2014-12-01T07:29:31Z-
dc.date.issued2011-
dc.identifierM.Techen_US
dc.identifier.urihttp://hdl.handle.net/123456789/12501-
dc.guideToshniwal, D.-
dc.description.abstractText classification is the task of assigning a given text document to one of the predefined categories depending on the contents of the document. It has found immense applications in fields as diverse as medicine, financial markets, information retrieval etc. Naive Bayes' is one of the most widely used algorithms for classification. However, the algorithm is significantly slow due to the large amount of calculations it has to perform. Thus, there is a need to parallelise the algorithm to reduce the time required for classification. The algorithm could be parallelised using grid computing, clusters, CPU threads or GPUs. Modem Graphics Processing Units (GPUs) have enabled high performance computing for general-purpose applications. GPUs are being used as co-processors in order to achieve a high overall throughput. CUDA programming model provides adequate C language like API, making it simpler to program for the GPU. In this dissertation, a CUDA based parallel implementation of Naive Bayes' text classification has been proposed. The classification step has been parallelised on GPU using different approaches each trying to exploit some property of the GPU. For example, use of shared memory against global memory, memory coalescing etc. The performance of the implen entation of Naive Bayes' text classification on GPUs has been compared with an efficient implementation of the same on a CPU. The semantic information of unstructured text can be used to improve the classification accuracy. WordNet and POS tagging have been used in this dissertation, to capture the semantic information in unstructured text. The dataset used for experiments is Reuters-21578, which is a collection of news articles that appeared on the Reuters newswire in 1987. The proposed parallel Naive Bayes' algorithm has been implemented on Nvidia's GTS 250 card with 128 processors and 512 MB GDDR3 RAM. The CPU used for the serial implementation consist of a Pentium P4 processor operating at 3 GHz and a DDR3 RAM of 4 GB. Experimental results show that the parallel implementation on GPUs is faster than the serial implementation.en_US
dc.language.isoenen_US
dc.subjectELECTRONICS AND COMPUTER ENGINEERINGen_US
dc.subjectPARALLELISATIONen_US
dc.subjectNAIVE BAYESen_US
dc.subjectTEXT DOCUMENTSen_US
dc.titlePARALLELISATION OF NAIVE BAYES CLASSIFICATION FOR UNSTRUCTURED TEXT DOCUMENTSen_US
dc.typeM.Tech Dessertationen_US
dc.accession.numberG21053en_US
Appears in Collections:MASTERS' THESES (E & C)

Files in This Item:
File Description SizeFormat 
ECDG21053.pdf3.88 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.