Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/13153
Title: SPEAKER DEPENDANT ISOLATED HINDI WORD RECOGNITION
Authors: D, Shankar Babu
Keywords: ELECTRICAL ENGINEERING;SPEAKER DEPENDANT ISOLATED HINDI WORD RECOGNITION;SPEECH RECOGNITION;HIDDEN MARKOV MODELS
Issue Date: 2005
Abstract: Although speech recognition products are already available in the market at present, their development is mainly based on statistical techniques which work under very specific assumptions. This thesis examines how artificial neural networks can benefit a speaker dependent isolated speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state.of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism, while neural networks can readily be applied to acoustic modeling. Neural Network has several theoretical advantages over a pure HMM system, including better acoustic modeling accuracy, better context sensitivity,, more natural discrimination, and a more economical use of parameters. These advantages are confirmed experimentally by a NN that we developed, based on speaker dependent isolated Hindi words on the Resource Management database. Speech recognition involves recording the input speech signal, extracting the key features of the speech, converting the features into codes and finally classification of the codes. A speech recognizer system comprised of two distinct blocks, a Feature Extractor and a Recognizer. The Feature Extractor block uses a Mel-frequency cepstral analysis which translates the incoming speech into a feature vectors and recognizer block uses neural network. In the course of developing this system, we explored two different ways to use neural networks for audio modeling: prediction and classification. We found that predictive networks yield poor -results because of a lack of discrimination, but classification networks gave excellent results. Finally, this thesis reports how -we optimized the accuracy of our system with many natural techniques, such as expanding the input window size, normalizing the inputs, increasing the number of hidden units, converting the network's output activations to log likelihoods, optimizing the learning rate schedule by automatic search, backpropagating error from word level outputs, and using gender dependent networks.
URI: http://hdl.handle.net/123456789/13153
Other Identifiers: M.Tech
Research Supervisor/ Guide: Anand, R. S.
metadata.dc.type: M.Tech Dessertation
Appears in Collections:MASTERS' THESES (Electrical Engg)

Files in This Item:
File Description SizeFormat 
G12332.pdf4.42 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.