DSpace Repository

EXTRACTION AND SEGMENTATION OF TEXT FROM IMAGE DOCUMENTS

Show simple item record

dc.contributor.author Kumar, Vijay
dc.date.accessioned 2014-11-30T06:21:51Z
dc.date.available 2014-11-30T06:21:51Z
dc.date.issued 2010
dc.identifier M.Tech en_US
dc.identifier.uri http://hdl.handle.net/123456789/12216
dc.guide Sarje, A. K.
dc.description.abstract Document images are often obtained by digitizing paper documents like books or manuscripts. Document image analysis systems are becoming increasingly visible in everyday life. Accuracy of any Optical Character Recognition (OCR) heavily depends upon Text segmentation from image document and segmentation of text into line, word, and character. In this Dissertation we have studied and proposed a new method for text segmentation from image document using Daubechies wavelet and 2-mean classification. For morphology, we have used morphology operation like dilation and erosion. Dilation adds pixels to the boundaries of objects in an image, while erosion removes pixels from object boundaries. We have obtained good accuracy compared to other methods of text segmentation like haar wavelets, Naive Bayes Classifier method and decision tree method. We have used same input image for the above methods and illustrated the corresponding output images. The proposed method for text segmentation from image document has been implemented in MATLAB. We have also studied and modified the proposed algorithm for segmentation of text into lines, words and characters for Devanagari and Gurmukhi scripts in which we have described the line, word, character and top character segmentation for printed Hindi text in Devanagari script. We have also described the line and word segmentation for printed text in Gurmukhi script. Performance increases in various levels have been obtained. We have observed the performance of segmentation with the help of five documents in devanagari script and five documents in gurmukhi script en_US
dc.language.iso en en_US
dc.subject ELECTRONICS AND COMPUTER ENGINEERING en_US
dc.subject EXTRACTION en_US
dc.subject SEGMENTATION en_US
dc.subject IMAGE DOCUMENTS en_US
dc.title EXTRACTION AND SEGMENTATION OF TEXT FROM IMAGE DOCUMENTS en_US
dc.type M.Tech Dessertation en_US
dc.accession.number G20196 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record