A TECHNIQUE FOR DETECTING PARAPHRASES USING HYBRID SIMILARITY MEASURES

Deepak, T. Sai

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/11974

Title:	A TECHNIQUE FOR DETECTING PARAPHRASES USING HYBRID SIMILARITY MEASURES
Authors:	Deepak, T. Sai
Keywords:	ELECTRONICS AND COMPUTER ENGINEERING;SIMILARITY MEASURES;PARAPHRASES;HYBRID
Issue Date:	2009
Abstract:	Parsing, processing and understanding of natural languages like english, has always been challenging in Computational Linguistics. The main reason is that natural languages have large amounts of irregularities in their grammar. Also, there are many variations of how words are used in combinations to yield a meaning. One can express a situation in many different ways, using different grammar structures, using different words or word groups. These set of words or word groups which represent similar meanings are known as paraphrases. Detecting paraphrases plays a key role for many of the Natural language processing applications like such as Question Answering, Machine Translation, and Multi-text Summarization. Though a large number of techniques have been proposed and implemented for detecting paraphrases, a complete framework which considers all aspects like lexical similarity and semantic similarity measures is missing. Most of the existing techniques work independently and much research has not been done on the effect of combining all these techniques. In this thesis, we propose a technique for detecting paraphrases using hybrid similarity measures. A technique for unsupervised detection of paraphrases based on word to word similarities has been proposed. We have also developed a technique for supervised detection of paraphrases using semantic similarities. Finally, a hybrid technique for detecting semantic relatedness between two sentences is proposed by using both supervised and unsupervised similarity techniques. We have also explored the feasibility of using fact based similarity metric to detect paraphrases. We have tested all the above proposed metrics on a standard dataset, namely the Microsoft Research Paraphrase corpus. In order to obtain semantics of the words, we have used WordNet, a lexical database of english, as our background knowledge. We have also used Wiktionary as the back-end database for calculating fact based similarity. The results of the proposed schemes outperform their existing counterparts.
URI:	http://hdl.handle.net/123456789/11974
Other Identifiers:	M.Tech
Research Supervisor/ Guide:	Toshniwal, Durga
metadata.dc.type:	M.Tech Dessertation
Appears in Collections:	MASTERS' THESES (E & C)

Files in This Item:

File	Description	Size	Format
ECDG14521.pdf		6.06 MB	Adobe PDF	View/Open

Show full item record