Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/12216
Authors: Kumar, Vijay
Issue Date: 2010
Abstract: Document images are often obtained by digitizing paper documents like books or manuscripts. Document image analysis systems are becoming increasingly visible in everyday life. Accuracy of any Optical Character Recognition (OCR) heavily depends upon Text segmentation from image document and segmentation of text into line, word, and character. In this Dissertation we have studied and proposed a new method for text segmentation from image document using Daubechies wavelet and 2-mean classification. For morphology, we have used morphology operation like dilation and erosion. Dilation adds pixels to the boundaries of objects in an image, while erosion removes pixels from object boundaries. We have obtained good accuracy compared to other methods of text segmentation like haar wavelets, Naive Bayes Classifier method and decision tree method. We have used same input image for the above methods and illustrated the corresponding output images. The proposed method for text segmentation from image document has been implemented in MATLAB. We have also studied and modified the proposed algorithm for segmentation of text into lines, words and characters for Devanagari and Gurmukhi scripts in which we have described the line, word, character and top character segmentation for printed Hindi text in Devanagari script. We have also described the line and word segmentation for printed text in Gurmukhi script. Performance increases in various levels have been obtained. We have observed the performance of segmentation with the help of five documents in devanagari script and five documents in gurmukhi script
Other Identifiers: M.Tech
Research Supervisor/ Guide: Sarje, A. K.
metadata.dc.type: M.Tech Dessertation
Appears in Collections:MASTERS' THESES (E & C)

Files in This Item:
File Description SizeFormat 
ECDG20196.pdf2.05 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.