Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/13750
Title: IMAGE PROCESSOR CUM DOCUMENT ANALYZER FOR OPTICAL CHARACTER RECOGNITION SYSTEM FOR INDIAN LANGUAGES
Authors: Kumar, Vivek
Keywords: CDAC
IMAGE PROCESSOR
OPTICAL CHARACTER RECOGNITION SYSTEM
INDIAN LANGUAGES
Issue Date: 2003
Abstract: This work is an attempt to develop software for carrying out the various pre-processing operations on the scanned image of a Devanagari text document that are essential for Optical Character Recognition. An Optical Character Recognition system takes as input the scanned image of a text document and produces the text in a processable format, which can be edited and otherwise manipulated. There are various pre-processing operations that need to be carried out on the scanned image before an OCR engine can operate upon it, like skew correction, extraction of the image array, segmentation of the text lines and text components in the image, carrying out a connected component analysis, detection of tables, detection of multiple text columns and other elements that comprise the content of the image. Besides a little amount of user-interaction may be required in the process of performing these pre-processing steps in order to get the best possible output from the OCR engine. For example, the user may need to carry out the basic operations like rotating the scanned image at right angles, flipping the image about the rectangular axes, as well as cut, copy and paste operations. The user should also be able to view the results of pre-processing like the segmented lines and components, their bounding boxes, the connected components and the tables and the columns detected in the image. The software developed tries to take care of the above requirements and accomplishes a variety of pre-processing operations. The interface provides the user with the basic image editing operations and also presents the results of the pre- I processing in an easy-to-interpret manner. The software has been developed as one of the modules of `Chitraksharika', an Optical Character Recognition system for Devanagari script being developed at ER&DCI, Noida. OCR technology has long been used by libraries and government agencies to make lengthy documents quickly available electronically [4] and `Chitraksharika' will hopefully meet the OCR requirements of organizations dealing with documents in Devanagari script. 2
URI: http://hdl.handle.net/123456789/13750
Other Identifiers: M.Tech
Appears in Collections:Dissertation (C.Dec.)

Files in This Item:
File Description SizeFormat 
ERDCIG11103.pdf3.39 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Admin Tools