Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/20447
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAli, Tofik-
dc.date.accessioned2026-04-20T06:33:08Z-
dc.date.available2026-04-20T06:33:08Z-
dc.date.issued2024-07-
dc.identifier.urihttp://localhost:8081/jspui/handle/123456789/20447-
dc.guideRoy, Partha Pratimen_US
dc.description.abstractInformation retrieval and localization in document images involve extracting, identi fying, and making accessible the relevant information within digital or digitized visual representations of traditional paper-based documents. This process is crucial for manag ing a vast and diverse range of documents encountered in various sectors such as legal, medical, academic, and corporate environments. Document images, which preserve the content, format, and sometimes the texture of the original documents, play a significant role in maintaining the integrity and authenticity of information. Effective retrieval and localization of information from these images require sophisticated techniques in image processing, machine learning, and deep learning to address challenges such as varying image quality, diverse document formats, and the need for accurate and efficient text recognition and interpretation. The goal is to transform static document images into dynamic, actionable data sources, enhancing their utility and accessibility in real-world applications. The digital transformation has revolutionized how information is stored, accessed, and managed across various sectors. Digitized documents offer numerous advantages over their physical counterparts, including ease of access, improved storage efficiency, and enhanced security. However, the real challenge lies in making this digitized information accessible and intelligible to users. Advanced technologies are required to bridge the gap between digitized information and its practical utility, necessitating the development of robust models and algorithms for efficient processing and interpretation. This research addresses the challenges inherent in document image analysis, such as i Abstract variability in image quality, diverse document formats, and complex layouts. It aims to develop advanced computational models for document image analysis to improve the accuracy and efficiency of character recognition, text segmentation, and image understanding. The study focuses on employing multi-task pre-training strategies to enhance the accuracy and efficiency of these technologies. The research methodology involves breaking down the problem into manageable components and systematically addressing each challenge using convolutional neural networks (CNNs), advanced text segmentation and recognition algorithms, and image understanding techniques. Key contributions of this research include the development of high-accuracy character recognition systems, particularly for handwritten scripts, leveraging advanced CNNs; the introduction of the Gated Multiscale Input Feature Fusion (GMIF) scheme for scale-invariant text detection; the development of Fast&Focused-Net (FFN) for small object feature encoding using the Volume-wise Dot Product (VDP) layer; and the introduction of a multi-task pre-training approach that combines text, image, and layout information to enhance document information analysis. The proposed models and techniques have been evaluated on various datasets, demonstrating significant improvements in the accuracy and efficiency of document image analysis tasks. The real-world applications of these advanced technologies are vast and varied, spanning academic institutions, corporate environments, legal industries, and the medical field. This research contributes to transforming static document images into dynamic, actionable data sources, supporting automated workflows, facilitating decision-making, and promoting knowledge discovery. Keywords: Document Image Analysis, Information Retrieval, Text Localization, Machine Learning, Deep Learning, Convolutional Neural Networks (CNNs), Multi-Task Pre-Training, Image Processing, Text Segmentation, Character Recognition, Gated Multiscale Input Feature Fusion (GMIF), Fast&Focused-Net (FFN), Volume-wise Dot Product (VDP) Layer, Entity Recognition, Relationship Extraction, Layout Analysis.en_US
dc.language.isoenen_US
dc.publisherIIT Roorkeeen_US
dc.titleINFORMATION RETRIEVAL AND LOCALIZATION IN DOCUMENT IMAGEen_US
dc.typeThesisen_US
Appears in Collections:DOCTORAL THESES (CSE)

Files in This Item:
File Description SizeFormat 
15911017_TOFIK ALI.pdf26.45 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.