MULTIMODAL EMOTION ANALYSIS USING DEEP LEARNING TECHNIQUES

Kumar, Puneet

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/19726

Title:	MULTIMODAL EMOTION ANALYSIS USING DEEP LEARNING TECHNIQUES
Authors:	Kumar, Puneet
Issue Date:	Oct-2022
Publisher:	IIT Roorkee
Abstract:	The need to develop computational systems that can recognize the emotions por trayed in various modalities such as speech, text, and image is rapidly increasing. The experience of emotion, feeling, cognition, and behavioral processes is known as ‘Affect.’ Three fundamental methods are used in Affective Computing to analyze various affects: self-feedback-based analysis, behavior observation, and physiological studies. Affect analysis approaches are further divided into two types. The first is the intangible (directly observable) approach using computer vision, natural language processing, and speech processing techniques, and the second is the tangible (not directly observable but perceptible by touch) approach using sensors and other tools of physiological monitor ing. This thesis analyses intangibly expressed emotions through behavior observation, predominantly associated with speech, text, and image modalities. This thesis starts by introducing emotion analysis, its representations, modalities, applications, and the need for multimodal emotion analysis. Further, it surveys the research on multimodal emotion analysis, affective response generation, explainability, and interpretability. Moreover, the techniques for deep neural networks’ explainability and hyperparameter tuning have been proposed and used in the subsequent chapters. In the direction of end-to-end emotion recognition, it develops a speech-emotion recognition system using deep neural networks, residual learning, and triplet loss. The emotion-related information is learned from a labeled emotional speech dataset as embeddings and used for emotion recognition. Further, it proposes a novel text emotion recognition system and develops a cross-lingual translation-based method for Sanskrit text sentiment analysis. A deep-learning-based facial emotion recognition system has been proposed, which is further adapted to perform image emotion recognition using domain adaptation. The main and adapted models are trained simultaneously using the discrepancy loss, which enables the models to learn the distribution of IER datasets along with FER datasets’ distribution. To combine complementary information from multiple modalities, the insights from unimodal emotion recognition are used for multimodal emotion recognition. Further, a novel interpretable multimodal emotion recognition system has been proposed to classify an input containing the image, speech, and text modalities into discrete emotion classes. The proposed system reports the importance of each modality and its features leading to the classification of a particular emotion class. Four emotion classes, i.e., ‘happy,’ ‘sad,’ ‘hate,’ and ‘anger’ have been considered for emotion analysis because these are the common classes in various existing methods and datasets for unimodal and multimodal emotion analysis. Besides recognizing the affects portrayed by multimodal data, emotion analysis also aims to generate the affect according to the user’s emotional state. In that context, a novel task has been defined to synthesize contextually relevant feedback as a new modality from two given modalities, i.e., image and text. We have proposed a novel affective feedback synthesis system and compiled a new dataset containing images, text, Twitter user comments, and the number of likes (upvotes) for each comment. Finally, the conclusions of the thesis are presented along with the future scope for multimodal emotion analysis and affective content synthesis. Keywords: Affective computing, Emotion analysis, Deep learning, Speech emotion recognition, Text emotion recognition, Text sentiment analysis, Multimodal informa tion fusion, Facial emotion recognition, Image emotion recognition, Explainability, Interpretability, Feedback synthesis, Dataset construction, Hyperparameter tuning.
URI:	http://localhost:8081/jspui/handle/123456789/19726
Research Supervisor/ Guide:	Raman, Balasubramanian
metadata.dc.type:	Thesis
Appears in Collections:	DOCTORAL THESES (CSE)

Files in This Item:

File	Description	Size	Format
PUNEET KUMAR 18911007.pdf		29.89 MB	Adobe PDF	View/Open

Show full item record