Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/14439
Title: KEYPHRASE EXTRACTION AND ENRICHMENT FOR NEWS MEDIA
Authors: Jain, Nikita
Keywords: Keyphrase Extraction;Keyphrase Enrichment;Automated News Summarization;Keyphrase Ranking;Natural Language;Processing
Issue Date: 2016
Publisher: Department of Computer Science and Engineering,IITR.
Abstract: As newswire data is growing continuously at a very fast pace, the need for techniques generating instantly digestible and concise format news information is emerging. My research goal in dissertation thesis is to develop models that can automatically extract summarized and interesting news information. Aiming to solve the problem of low engagement time of news audience and several other news journalism problem. There has been great progress in automatically extraction and generation of facts, trivias and other interesting information from news media data such as trivia generation, event detection, headlines generation, sentiment analysis, questionanswering systems. However, in-spite of these approaches the news audience engagement time is still low. Also, these solutions are often based on different learning models. My goal is to develop general and scalable algorithms that can work over any language, any domain and any media format having textual content. The model (E3) in this thesis address these shortcomings. They provide effective and efficient keyphrases for multilingual and multi-format news data. They provide a set of features to rank the set of keyphrases. Furthermore, a method is provided to enrich the extracted keyphrases by finding the types and input query related information like role played by person entity. This kind of information is very helpful in cases where many people, multiple organization and multiple location are mentioned. As it is very difficult for a reader to keep track of all the mentioned entities. Henceforth, readers often losses interest in the news concept and the network traffic gets lost. Also, we have specifically chosen the keyphrase based summary as they provide a high-level overview of news data in a short span of time with little effort. We have evaluated our unsupervised system E3 on varying input queries, from general topics (E.g. Election) to specific topics (E.g. Bihar Election) to demonstrate the efficiency and effectiveness of our keyphrase extraction and keyphrase enrichment method over existing state-of-the-art. Our experimental results show that E3 performs significantly better than the defined baselines on seven different parameters. We also investigate the effect of the use of linguistic and syntactical features in keyphrase extraction, with an user case study and found that our system is fairly robust.
URI: http://hdl.handle.net/123456789/14439
metadata.dc.type: Other
Appears in Collections:DOCTORAL THESES (E & C)

Files in This Item:
File Description SizeFormat 
G26000-NIKITA_D.pdf7.32 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.