DEVELOPMENT OF SPEECH ENHANCEMENT  ALGORITHMS FOR REAL TIME APPLICATION

Chiluveru, Samba Raju

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/19772

Title:	DEVELOPMENT OF SPEECH ENHANCEMENT ALGORITHMS FOR REAL TIME APPLICATION
Authors:	Chiluveru, Samba Raju
Issue Date:	Aug-2021
Publisher:	IIT Roorkee
Abstract:	Speech is the most common way for humans to communicate information and convey feel ings. It is often mixed with nonstationary noises in real-world environments. Speech perception under low Signal-to-Noise Ratio (SNR) conditions is difficult for both normal and hearing-impaired listeners. Moreover, voice-based technologies such as hearing aids, human-machine communication, robust automatic speech recognition, etc., have recently gained popularity. These applications perform poorly in low SNR environments that neces sitating good sound quality for optimal performance. The Single-channel Speech Enhance ment Algorithms (SSEAs) can aid in the improvement of sound quality. Additionally, the applications mentioned above necessitate SSEAs implementation on hardware with good sound quality. Therefore, we are motivated to develop SEAs to improve the quality and intelligibility of noisy speech under a low SNR environment. After that, we focused on implementing high-performance SSEA on hardware. This thesis report is divided into two parts: the first part develops SSEAs for a low SNR environment, and the second part focuses on SSEA hardware implementation. The SSEAs can be implemented using either conventional/unsupervised or supervised algorithms. The traditional/unsupervised algorithms are simple to implement, however, their performance is limited under low SNR nonstationary noise environments. In contrast, the supervised speech enhancement algorithms employ a Deep Neural Network (DNN) as a learning machine and achieve comparable performance. They can be developed as mapping based algorithms or mask-based algorithms. Most of the mapping-based algorithms use the DFT-based absolute amplitude to train the DNN model. The speech resynthesis is per formed with estimated amplitude with the noisy input phase. Further, the mask-based algorithms use the DFT values for the target ratio mask and input feature set prepara tion. However, the DFT produces a fixed frequency resolution complex values for audio i input. A high-performance SSEA often requires two time-frequency masks, such as abso lute amplitude and phase or real and imaginary value. As a result, sound quality suffers due to the overall error. Moreover, SSEA employs a highly nonlinear regression model (i.e., DNN) using a nonlinear activation function. The inference model, which is a well-trained DNN model, is found to be appropriate for SSEA deployment on hardware. As a result, to maintain SSEA performance, the inference model requires an accurate nonlinear activation function approximation, which significantly impacts the inference model’s performance and hardware complexity. In the first part of the thesis report, SSEAs have been developed. Initially, a mapping based SSEA has been developed using two DNN models for absolute amplitude and phase training. The estimated amplitude and phase values are used for speech signal resynthesis. Wefound that the enhanced phase improves sound quality in low SNR environments. After that, we developed a mask-based SSEA, which uses a nonlinear ratio mask and a separate feature set. It is developed with the stationary wavelet transform. Experiments show that the proposed algorithm outperforms existing SSEAs in low SNR nonstationary noises in terms of speech quality and intelligibility. In the second part of the thesis report, the hardware implementation of DNN-based SSEA has been presented. Initially, an accurate piece-wise approximation scheme has been presented for the nonlinear activation function. It utilizes a precision-controlled recursive algorithm to predict a sub-range that uses the Remez algorithm to find the corresponding approximation function. After that, the inference model is developed, which uses the ap proximated activation function. The inference model performance is measured in terms of speech quality and intelligibility. Finally, the hardware implementation is performed with Synopsys Design Compiler with a TSMC 0.18-µm library. Experiments show that the pro posed hardware implementation preserves speech quality and intelligibility while consuming less area, gate count, delay, and power.
URI:	http://localhost:8081/jspui/handle/123456789/19772
Research Supervisor/ Guide:	Tripathy, Manoj
metadata.dc.type:	Thesis
Appears in Collections:	DOCTORAL THESES (Electrical Engg)

Files in This Item:

File	Description	Size	Format
SAMBA RAJU CHILUVERU.pdf		7.01 MB	Adobe PDF	View/Open

Show full item record