Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/11843
Title: A NOVEL APPROACH TO INFORMATION EXTRACTION AND SENTENCE ORDERING IN H1W DOMMAENT
Authors: Gusain, Amit
Keywords: ELECTRONICS AND COMPUTER ENGINEERING;INFORMATION EXTRACTION AND SENTENCE ORDERING;MDS SYSTEM;TEXT SUMMARIZATION
Issue Date: 2008
Abstract: With the rapid growth of the World Wide Web and electronic information services, the amount of information is growing at an incredible rate. One problem that arises due to this exponential growth is the problem of information overload. No one has time to read everything, yet we often have to make critical decisions based on what we are able to assimilate. With summaries, we can make effective decisions in less time. Thus the technology of automatic text summarization is becoming essential to deal with the problem of information overload. Text summarization is the process of extracting the most important information from a single document or from a set of documents to produce a short and information rich summary for a particular user or task. Multi-document summarization is an automatic procedure for extraction of information from multiple texts written about the same topic. Most of the MDS systems have been based on an extraction method, which identifies key textual segments (eg sentences or paragraphs) in source documents and selects them for the summary. It is important for such MDS systems to determine a coherent arrangement (ordering) of the textual segments extracted from the source documents in order to reconstruct the text structure for summarization. In this dissertation work we have focused on the two key tasks of the summarization, information extraction and sentence ordering. A multi document summarization method based on frequency of bi-grams (window of size 2 words) is used for the information extraction task. As the sentences are selected based on their importance from the documents they lose the cohesion and the ordering of the information in the summary thus loosing the readability of the summary. To deal with this problem, we propose a new method for sentence ordering based on the types of the sentences. Our results show that the proposed multi document summarizer approach works significantly well in extracting important content units and improving the readability of the summary.
URI: http://hdl.handle.net/123456789/11843
Other Identifiers: M.Tech
Research Supervisor/ Guide: Joshi, R. C.
metadata.dc.type: M.Tech Dessertation
Appears in Collections:MASTERS' DISSERTATIONS (E & C)

Files in This Item:
File Description SizeFormat 
ECDG13921.pdf3.41 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.