HUMAN ACTION LOCALIZATION, TRACKING AND RECOGNITION USING DEEP LEARNING

Kumar, Naresh

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/15060

Title:	HUMAN ACTION LOCALIZATION, TRACKING AND RECOGNITION USING DEEP LEARNING
Authors:	Kumar, Naresh
Keywords:	Dominating Nature;Preserving and Propagating;Behavior;Human Action Localization
Issue Date:	Jun-2019
Publisher:	I.I.T Roorkee
Abstract:	The dominating nature of human being for preserving and propagating important information makes the human body under the study of high concerns. Human nature remains very complex topic to depict the behavior in any random situation. This thesis aims to develop an intelligent human action analysis model by applying deep learning methods on videos. The actions performed by human being are the outcomes of very complex phenomena of perceptual vision and neurology. We considered the issues only with vision based system for human action analysis. The main phases to analyze the human actions can be considered as tracking the human body and its spatiotemporal localization along with the actions being performed in any given video sequences. Each of research sub-problems, localization, tracking and recognition of the action instances in a given video sequence under the uncontrolled conditions can pertains itself a wide range of applications. Although, the research problem has been defined before 18th century in the form biological understanding of human nature which was not very successful due to less information available from human brain. From vision aspects, plenty of information can be achieved to get thorough the understanding of human nature. In terms of computer vision research, the stated problem is part of video understanding. Developing a system that can automatically retrieve desired information from any video is expected a challenging computer vision problem. Dealing with large scale data analytic during the training of a network is another overhead in processing the video samples from unconstraint media. The evolutions in deep neural networks results satisfactory solution of several research domains which deals with large scale data analytic. The thesis entitled Human Action Localization, Tracking and Recognition using Deep learning is organized into six chapters. Chapter 1, Introduction’ presents the general introduction to video understanding for human action analysis along with the motivations and challenging issues with stated research problem. The required experimental setup is also highlighted with the research objective and authors contributions. In chapter 2, Preliminaries the basics of deep network architectures are described which is necessary to understand before developing a deep network for human action analysis. Chapter 3, entitled by Weakly Supervised Deep Network Model for Spatiotemporal Human Action Localization with CNN and LSTM presents human action localization model in spatiotemporal domain. The architecture of the model exploits CNN to capture the spatial information whereas, temporal information is retrieved by LSTM. CNN is a famous deep network model based convolution operations at large scale. LSTM is a specific recurrent neural network which can recover the information from video sequence in both long and short term duration. In chapter 4, A Cascaded CNN Model for Multiple Human Tracking and Re-localization in Complex Video Sequences, we present a human body tracking and re-localization system. We used three CNN in cascaded fashion such that it can improve the overall performance of human tracking system. Chapter 5 entitled by Spatiotemporal Attention based Deep Network Model for Human Action Recognition with CNN and RNN presents a deep network model for human action recognition. The proposed human action recognition model utilized the information based on spatiotemporal attention and localized the action in the local neighborhood of initial frames. The model uses deep neural network framework which receives the information from 3D-CNN, LSTM vii and spatiotemporal attention based blocks in a cascaded fashion. Besides this, Long Short Term Memory (LSTM) model is used to capture more extensive temporal information. From the comparative studies of results with existing state-of-the-arts, proposed model improves the performance of human action system. The summary of the thesis, experimental results and future scope is presented in chapter 6, Conclusions and Future Scope. The thesis can be helpful to give a direction for developing an intelligent system that can simultaneously localize, track and recognize the objects or actions in real-time domain.
URI:	http://localhost:8081/xmlui/handle/123456789/15060
Research Supervisor/ Guide:	Sukavanam, N.
metadata.dc.type:	Thesis
Appears in Collections:	DOCTORAL THESES (Maths)

Files in This Item:

File	Description	Size	Format
G28801.pdf		13.37 MB	Adobe PDF	View/Open

Show full item record