Abstract:
This project thesis work studies the use of deep neural networks(DNNs) to address au-
tomatic language identication(LID). Recent success of DNNs in speech processing and
pattern recognition, has motivated us to include them in Language Identi ction tech-
nique using MFCC, Delta and Double Delta MFCC features. DNN architectures has
properties that make them suitable for di cult tasks among which Automatic Language
Identi cation (LID) can be highlighted. Their capability to model complex functions in
high-dimensional spaces and to get a good representation of the input data makes these
architectures and algorithms proper for processing complex signals.
This Project thesis is intended to study various approaches that combine both deep
learning and automatic language recognition elds, to improve the LID task by getting a
better representation of voice signals for classi cation purposes so that it can be identi ed
which language has been used in that voice signal. In order to do this, DNN, SVM and
KNN LID systems have been studied thoroughly and experimentally implemented.For this
a completely new dataset of 8 Indian and ve South asian languages has been collected
since a formal speech corpus is not available for these languages. The total speech data
collected is about 51.24 hours.
In this thesis, four major improvement have proposed, over state of the art i-vector
mechanism with GMM. First, we replace the GMM based LID classi er with a ve layer
DNN. Second, we have used three Acoustic features of MFCC, Delta and Double Delta
MFCC which reduces computing costs. Thirdly we proposed Direct approach of using
DNN for both Feature extraction and classi cation and nally we have used long term
speech sequence of 10 to 15 sec for improving accuracy, however frames are reduced to 21
only to avoid latency. The results of DNN when compared with SVM and KNN classi ers
on the same dataset found that DNN outperforms all the other two classi ers.