UNSUPERVISED HUMAN ACTIVITY RECOGNITION AND GENERATION

Buckchash, Himanshu

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/19566

Title:	UNSUPERVISED HUMAN ACTIVITY RECOGNITION AND GENERATION
Authors:	Buckchash, Himanshu
Keywords:	unsupervised learning, action recognition, Siamese network, selfsupervised learning, unsupervised pose-sequence recognition, motion generation, 3D motion prediction, pose-manifold, Grassmann manifold, recurrent neural networks.
Issue Date:	Oct-2021
Publisher:	IIT Roorkee
Abstract:	This thesis proposes techniques for unsupervised human activity recognition (HAR) and generation (HAG). It rst presents methods for unsupervised and self-supervised action recognition. Later the ideas for pose-based human motion generation are explored for both single and two-person scenarios. Activity or Action Recognition (AR) is the problem of automatic labeling of the activities present in the spatio-temporal data. In this work, the term \activities" involves the activities related to or performed by humans. AR is an important area of research with a wide number of applications in machine perception like visual surveillance, human-machine interactions, elderly health-care, motion-retargeting, content-based video-search, summarization. Labeling videos or posesequences is a demanding task due to human e orts, cost and non-scalability. Moreover, there are many open-set problems like video anomaly recognition that naturally require a label-independent approach because exhaustive labeling of human activities is not possible. Given the potential of pose-based activity recognition and the scarcity of labeled data in this domain, it is imperative to have good unsupervised learning methods for wider applicability. Pose-based human motion generation is the perpetual task of future pose prediction for a given pose by leveraging the pose-sequence history. It involves a generative understanding of pose-dynamics and has multiple applications that require anticipation of interpersonal interactions, such as human-robot interaction, human behavior analysis, health care, surveillance and security, cybernetics, augmented and virtual reality. A critical drawback of the current motion generation methods is error accretion during long-term motion generation. The main challenges are error in ation and e cient regulation of stochasticity. Human pose has several kinematic constraints that are a ected by error accretion. State-of-the-art motion generation models use an autoregressive approach. Auto-regressive models work by routing current output as future input to the model. This feedback results in constant accretion of error (noise). This monotonic positive or negative growth, with time, leads to various issues such as pose-stagnation, stretching or shrinking of the pose, or a combination of both. This thesis addresses these challenges by making the motions last longer and by enforcing kinematic constraints to ensure the consistency of human pose. The work presented in the thesis is divided into eight chapters. To address the challenges in unsupervised human activity recognition and generation, ve methods are proposed in this thesis. The rst three chapters \| three, four, ve \| describe techniques for human activity recognition, whereas chapters six and seven present techniques for motion generation. The thesis starts with an introduction in chapter one. It provides a general introduction to the problem, motivation behind this research, research challenges, thesis objectives, and a brief overview of the thesis. Chapter two provides a comprehensive literature review. Chapter three describes methods for unsupervised activity recognition and their application to anomaly recognition. It uses Grassmann manifolds for decomposing and clustering the spatio-temporal data. It also proposes an online unsupervised active learning approach for clustering the anomalous events. The results are compared with state-of-the-art deep models and unsupervised approaches for anomaly recognition. Chapter four presents a manifold-based pose-sequence recognition (PSR) framework. The work describes an unsupervised approach for PSR by decomposing the posesequence data into fragments. This allows the sample to be de ned with a less rigid structure which is easier to project on Euclidean space. A dynamic view-invariant representation for pose-sequences is also being presented in this chapter. Extensive empirical evaluation and ablations on several challenging datasets under three categories con rm the superiority of the proposed approach in contrast to current methods. Chapter ve describes a method for self-supervised representation learning for activity recognition. It focuses on the unique temporal structure in the spatio-temporal data to extract representations that are used on downstream tasks for action recognition. Siamese networks are used for learning similarities between image pairs. The role of various motion descriptors is studied to simultaneously capture the appearance and motion information. The results were compared with previous approaches on HMDB51 and UCF101 datasets. Chapters six and seven propose data-driven generative methods for human motion generation. Chapter six presents a block-based RNN model with residual connections, along with variable conditioning length, under a variational setup. It also describes the collection and processing of the in-house duet dance dataset. The chapter shows that how the variational setup helps in capturing non-linear interdependence between di erent random-vectors, deriving from the pose-sequence in an auto-regressive recurrence relation. Chapter seven presents a mixture density-based model for generating twoperson human interactions. It studies the role of di erent parameters in building a sound motion generation system. Chapter eight concludes the work done in all the above chapters by highlighting the advantages, limitations, and suggestions for future research work.
URI:	http://localhost:8081/jspui/handle/123456789/19566
Research Supervisor/ Guide:	Raman, Balasubramanian
metadata.dc.type:	Thesis
Appears in Collections:	DOCTORAL THESES (CSE)

Files in This Item:

File	Description	Size	Format
HIMANSHU BUCKCHASH 15911014.pdf		14.39 MB	Adobe PDF	View/Open

Show full item record