DSpace Repository

SPOKEN LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKS

Show simple item record

dc.contributor.author Yadav, Manish Kumar
dc.date.accessioned 2025-05-11T15:03:24Z
dc.date.available 2025-05-11T15:03:24Z
dc.date.issued 2018-06
dc.identifier.uri http://localhost:8081/jspui/handle/123456789/16182
dc.description.abstract This project thesis work studies the use of deep neural networks(DNNs) to address au- tomatic language identication(LID). Recent success of DNNs in speech processing and pattern recognition, has motivated us to include them in Language Identi ction tech- nique using MFCC, Delta and Double Delta MFCC features. DNN architectures has properties that make them suitable for di cult tasks among which Automatic Language Identi cation (LID) can be highlighted. Their capability to model complex functions in high-dimensional spaces and to get a good representation of the input data makes these architectures and algorithms proper for processing complex signals. This Project thesis is intended to study various approaches that combine both deep learning and automatic language recognition elds, to improve the LID task by getting a better representation of voice signals for classi cation purposes so that it can be identi ed which language has been used in that voice signal. In order to do this, DNN, SVM and KNN LID systems have been studied thoroughly and experimentally implemented.For this a completely new dataset of 8 Indian and ve South asian languages has been collected since a formal speech corpus is not available for these languages. The total speech data collected is about 51.24 hours. In this thesis, four major improvement have proposed, over state of the art i-vector mechanism with GMM. First, we replace the GMM based LID classi er with a ve layer DNN. Second, we have used three Acoustic features of MFCC, Delta and Double Delta MFCC which reduces computing costs. Thirdly we proposed Direct approach of using DNN for both Feature extraction and classi cation and nally we have used long term speech sequence of 10 to 15 sec for improving accuracy, however frames are reduced to 21 only to avoid latency. The results of DNN when compared with SVM and KNN classi ers on the same dataset found that DNN outperforms all the other two classi ers. en_US
dc.description.sponsorship INDIAN INSTITUTE OF TECHNOLOGY ROORKEE en_US
dc.language.iso en en_US
dc.publisher I I T ROORKEE en_US
dc.subject Deep Neural Networks en_US
dc.subject language Identication en_US
dc.subject Indian en_US
dc.subject Five South Asian en_US
dc.title SPOKEN LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKS en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record