Abstract:
As newswire data is growing continuously at a very fast pace, the need
for techniques generating instantly digestible and concise format news
information is emerging. My research goal in dissertation thesis is to develop
models that can automatically extract summarized and interesting news
information. Aiming to solve the problem of low engagement time of news audience
and several other news journalism problem.
There has been great progress in automatically extraction and generation of
facts, trivias and other interesting information from news media data such as trivia
generation, event detection, headlines generation, sentiment analysis, questionanswering
systems. However, in-spite of these approaches the news audience
engagement time is still low. Also, these solutions are often based on different
learning models. My goal is to develop general and scalable algorithms that can
work over any language, any domain and any media format having textual content.
The model (E3) in this thesis address these shortcomings. They provide effective
and efficient keyphrases for multilingual and multi-format news data. They provide
a set of features to rank the set of keyphrases. Furthermore, a method is provided
to enrich the extracted keyphrases by finding the types and input query related
information like role played by person entity. This kind of information is very
helpful in cases where many people, multiple organization and multiple location
are mentioned. As it is very difficult for a reader to keep track of all the mentioned
entities. Henceforth, readers often losses interest in the news concept and the
network traffic gets lost. Also, we have specifically chosen the keyphrase based
summary as they provide a high-level overview of news data in a short span of
time with little effort.
We have evaluated our unsupervised system E3 on varying input queries, from
general topics (E.g. Election) to specific topics (E.g. Bihar Election) to demonstrate
the efficiency and effectiveness of our keyphrase extraction and keyphrase enrichment
method over existing state-of-the-art. Our experimental results show
that E3 performs significantly better than the defined baselines on seven different
parameters. We also investigate the effect of the use of linguistic and syntactical
features in keyphrase extraction, with an user case study and found that our
system is fairly robust.