INDIAN SIGNLANGUAGERECOGNITION

Sharma, Prachi

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/19629

Title:	INDIAN SIGNLANGUAGERECOGNITION
Authors:	Sharma, Prachi
Issue Date:	Oct-2021
Publisher:	IIT Roorkee
Abstract:	Sign language is a visual language of Deaf people that involves the signs and gestures mainly performed by hands, unlike spoken language. Every country has its sign language, such as American sign language (ASL), used in the US and Canada, and Indian sign language (ISL) is the language for the Deaf people of India. According to the World Health Organization on April 1st, 2021, there are more than 400 million Deaf people worldwide, and 80% of this number resides in developing countries. Based on the Census of India 2011, there are approximately 5 million Deaf and 2 million mute people in the country. Much work has been done on recognizing American, Chinese, and such prominent sign languages, and many datasets are present in the literature for these sign languages. However, there is a scarcity of ISL datasets in the literature, indicating that not much work has been done to recognize the Indian signs. The works reported in the literature for ISL recognition mainly deal with uniform background and signs with no complex finger motion and put limitations of not wearing certain clothes and accessories on the signer to ease the recognition. Thus, the vast number of Deaf worldwide, especially in India, and the lack of significant work on the recognition of complex Indian signs in a cluttered background motivates us to develop a sign language recognition system that converts sign language into text or speech understandable by everyone and bridge the communication gap between the Deaf and hearing community. The first work of the thesis was carried out using the already present two public ASL f inger-spelling datasets in the literature. This work developed a novel algorithm to segment the hand palm and fingers from a depth image, improved and fused four hand-crafted fea tures and used four Bayesian optimized multi-class support vector machine (SVM) kernels to recognize the American signs using only depth data. The four hand-crafted features used were geometrical, local binary patterns, the distance between the hand palm centre and the fingertips and valleys concatenated with the number of fingers raised in a gesture ([Dis tance, Num]). The four kernels of SVM evaluated in this work were linear, cubic, quadratic and Gaussian. Through experimental results, the feature vectors [Distance, Num] and [Ge ometrical, Distance, Num] were found to outperform the other feature combinations with a recognition accuracy of 99% for Dataset 1.[Geometrical, LBP, Distance, Num] outper formed the other feature fusions with 95.7% recognition accuracy for Dataset 2. Both these results were achieved using the Gaussian SVM kernel. This work on ASL recognition using a single input helped work on the ISL recognition further in the thesis. The second work of this thesis addressed the paucity of ISL datasets in the literature showcasing the real-world signing. Thus, a new dynamic isolated ISL dataset, named BharatDSL, was developed that consists of 107 signs with 17 signers performing a sign 5 −20 times. Microsoft’s Kinect camera captured RGB, depth and skeleton data for the BharatDSL dataset in a well-lit room in front of a non-uniform and cluttered wall without any limitations on the signers. The developed dataset has ≈ 53k video sequences of the Indian signs. This dataset is scalable at any point in time using signs of more categories and more signers and thus can be helpful to the researcher’s community to recognize the Indian signs in new ways. Dynamic signs are relatively more challenging to classify than static ones due to complex hand shapes and motions. Thus, this thesis’s third and final work deals with recognizing complex signs in a non-uniform and cluttered background signed by signers with no limi tations on wearing a particular type of clothes and hand accessories. The RGB data of the newly developed BharatDSL dataset was utilized out of RGB, depth and skeleton data of the BharatDSL dataset for segmentation, feature extraction and classification of signs. A novel methodology was developed for dynamic ISL recognition and validation of the newly developed BharatDSL dataset. The developed methodology converted the raw RGB video sequences into three latest input representations − detected hands, star RGB and star mod RGB. The detected hands input is a novel input representation, star RGB is the latest input representation already existing in the literature and star mod RGB is the modified version of star RGB input. Five latest deep models − long short term memory, multi-layer perceptron, long recurrent convolution network, 3D convolution neural networks (C3D) and the light version of C3D (Conv 3D), were used for the classification of Indian signs. The first two models took input as a feature sequence extracted by a 2D-CNN from the input representations, and the remaining deep models took the raw input representations as input and were trained from scratch. The features extracted by the three latest pre trained 2D-CNNs − InceptionV3, ResNet152V2 and EfficientNetB7 were evaluated in this work. The performance evaluation of the 2D-CNNs, input representations and the classi f iers was reported on the RGB data of 43 single-handed Indian signs of the newly developed BharatDSL dataset. The experimental results showed that the LRCN model outperformed the other models with a recognition accuracy of 88% on single-handed Indian signs using detected hands input. EfficientNetB7-LSTM and EfficientNetB7-MLP achieved the high est recognition accuracy of 46% and 53.6% with star RGB and star mod RGB inputs, respectively. Out of the three input representations, detected hands input helped 2D-CNNs ii extract significant spatial and temporal features from the video sequence, thereby providing better recognition accuracy with all the deep models. The star mod RGB input was found to be better than star RGB input with all model combinations on the single-handed Indian signs. The significant takeaways from the work of this thesis can be summarized and listed as follows: ˆ Addressed the paucity of publicly available ISL datasets. ˆ Demonstrated the acceptable performance using only a single input for sign language recognition, thus encouraged the use of a single input-single-camera system to reduce the complexity of too much hardware. ˆ Illustrated the utility of EfficientNet and 3D-CNNs for feature extraction and recog nition of Indian signs with different types of inputs.
URI:	http://localhost:8081/jspui/handle/123456789/19629
Research Supervisor/ Guide:	Anand, R.S.
metadata.dc.type:	Thesis
Appears in Collections:	DOCTORAL THESES (Electrical Engg)

Files in This Item:

File	Description	Size	Format
PRACHI SHARMA 15914021.pdf		30.11 MB	Adobe PDF	View/Open

Show full item record