IMAGE PROCESSOR CUM DOCUMENT ANALYZER FOR OPTICAL CHARACTER RECOGNITION SYSTEM FOR INDIAN LANGUAGES

Kumar, Vivek

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/13750

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kumar, Vivek	-
dc.date.accessioned	2014-12-09T05:51:54Z	-
dc.date.available	2014-12-09T05:51:54Z	-
dc.date.issued	2003	-
dc.identifier	M.Tech	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/13750	-
dc.guide	Shukla, V. N.	-
dc.description.abstract	This work is an attempt to develop software for carrying out the various pre-processing operations on the scanned image of a Devanagari text document that are essential for Optical Character Recognition. An Optical Character Recognition system takes as input the scanned image of a text document and produces the text in a processable format, which can be edited and otherwise manipulated. There are various pre-processing operations that need to be carried out on the scanned image before an OCR engine can operate upon it, like skew correction, extraction of the image array, segmentation of the text lines and text components in the image, carrying out a connected component analysis, detection of tables, detection of multiple text columns and other elements that comprise the content of the image. Besides a little amount of user-interaction may be required in the process of performing these pre-processing steps in order to get the best possible output from the OCR engine. For example, the user may need to carry out the basic operations like rotating the scanned image at right angles, flipping the image about the rectangular axes, as well as cut, copy and paste operations. The user should also be able to view the results of pre-processing like the segmented lines and components, their bounding boxes, the connected components and the tables and the columns detected in the image. The software developed tries to take care of the above requirements and accomplishes a variety of pre-processing operations. The interface provides the user with the basic image editing operations and also presents the results of the pre- I processing in an easy-to-interpret manner. The software has been developed as one of the modules of `Chitraksharika', an Optical Character Recognition system for Devanagari script being developed at ER&DCI, Noida. OCR technology has long been used by libraries and government agencies to make lengthy documents quickly available electronically [4] and `Chitraksharika' will hopefully meet the OCR requirements of organizations dealing with documents in Devanagari script. 2	en_US
dc.language.iso	en	en_US
dc.subject	CDAC	en_US
dc.subject	IMAGE PROCESSOR	en_US
dc.subject	OPTICAL CHARACTER RECOGNITION SYSTEM	en_US
dc.subject	INDIAN LANGUAGES	en_US
dc.title	IMAGE PROCESSOR CUM DOCUMENT ANALYZER FOR OPTICAL CHARACTER RECOGNITION SYSTEM FOR INDIAN LANGUAGES	en_US
dc.type	M.Tech Dessertation	en_US
dc.accession.number	G11103	en_US
Appears in Collections:	MASTERS' THESES (C.Dec.)

Files in This Item:

File	Description	Size	Format
ERDCIG11103.pdf		3.39 MB	Adobe PDF	View/Open

Show simple item record

Admin Tools