Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/11730
Title: SELECTION AND ORDERING OF INFORMATION FOR MULTI ,DOCUMENT SUMMARIZATION
Authors: Pesala, Poornachandra Rao P.
Keywords: ELECTRONICS AND COMPUTER ENGINEERING;SELECTION-ORDERING-INFORMATION;DOCUMENT SUMMARIZATION;AUTOMATIC SUMMARIZATION
Issue Date: 2007
Abstract: In the recent years there has been high interest in news aggregation as the number and variety of online news sources are increasing drastically it is difficult for people to track the news concerning even a single event. Redundancy causes such tracking to be very time consuming as multiple news articles on the same event tend to contain similar information. So a summary of such news articles can present important information in short text and dramatically reduce reading time, thus the development and evaluation of automatic summarization systems has become not only research, but a very practical challenge. In this dissertation, we describe a general modular automatic summarizer that achieves state of the art performance, and proposes a new technique for information ordering. Automatic summarization is the distillation of important information from a source into an abridged form. Many current systems summarize texts by selecting sentences with important content based on various features in ad hoc manner. No specific study regarding the contribution of the frequency of the content units has been made in the past. Our investigations have helped us to understand the importance of frequency of content units in summarization. We describe a state of the art performance summarizer based on the frequencies of the content units with the help of single vector decomposition (SVD) technique in calculating the importance of the content units. As the sentences are selected based on their importance from the documents they lose the cohesion and ordering of the information in the summary thus losing the readability of the summary. Here we propose a new technique for information ordering using Hidden Markov Models (HMM), a supervised machine learning algorithm. HMM learns the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear and orders the sentences in the new unseen summaries. Our research shows that HMM based models can significantly improve the readability of the automatic summaries.
URI: http://hdl.handle.net/123456789/11730
Other Identifiers: M.Tech
Research Supervisor/ Guide: Sarje, A. K.
metadata.dc.type: M.Tech Dessertation
Appears in Collections:MASTERS' THESES (E & C)

Files in This Item:
File Description SizeFormat 
ECDG13412.pdf4.04 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.