SPOKEN LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKS

Yadav, Manish Kumar

DSpace Home
→
ELECTRONICS AND COMMUNICATION ENGINEERING (FORMERLY ELECTRONICS & COMPUTER ENGINEERING)
→
MASTERS' THESES (E & C)
→
View Item

SPOKEN LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKS

Yadav, Manish Kumar

URI: http://localhost:8081/jspui/handle/123456789/16182

Date: 2018-06

Abstract:

This project thesis work studies the use of deep neural networks(DNNs) to address au- tomatic language identication(LID). Recent success of DNNs in speech processing and pattern recognition, has motivated us to include them in Language Identi ction tech- nique using MFCC, Delta and Double Delta MFCC features. DNN architectures has properties that make them suitable for di cult tasks among which Automatic Language Identi cation (LID) can be highlighted. Their capability to model complex functions in high-dimensional spaces and to get a good representation of the input data makes these architectures and algorithms proper for processing complex signals. This Project thesis is intended to study various approaches that combine both deep learning and automatic language recognition elds, to improve the LID task by getting a better representation of voice signals for classi cation purposes so that it can be identi ed which language has been used in that voice signal. In order to do this, DNN, SVM and KNN LID systems have been studied thoroughly and experimentally implemented.For this a completely new dataset of 8 Indian and ve South asian languages has been collected since a formal speech corpus is not available for these languages. The total speech data collected is about 51.24 hours. In this thesis, four major improvement have proposed, over state of the art i-vector mechanism with GMM. First, we replace the GMM based LID classi er with a ve layer DNN. Second, we have used three Acoustic features of MFCC, Delta and Double Delta MFCC which reduces computing costs. Thirdly we proposed Direct approach of using DNN for both Feature extraction and classi cation and nally we have used long term speech sequence of 10 to 15 sec for improving accuracy, however frames are reduced to 21 only to avoid latency. The results of DNN when compared with SVM and KNN classi ers on the same dataset found that DNN outperforms all the other two classi ers.

Show full item record