Abstract:
Action recognition seems to be a very easy task for us humans but it requires
a lot of information processing in terms of recognizing patterns when
it comes to computer systems. Here, we try to devise a new way of action
recognition for intelligent systems by fusing the shallow and the deep
features from the data. Shallow feature extraction starts by identifying the
motion salient pixels first, thus eliminating unwanted information and then
extract the improved trajectory information from it. To get the deep features,
we make use of Convolutional Neural Network (CNN). There will be
separate classifiers for both the deep features and shallow features which
will be fused in order to result in an efficient classifier for the action recognition.
We are using HMDB-51[1] video dataset, one of the most challenging
datasets for action recognition which consists of various actions of different
kinds like clap, run, walk, box, etc taken from various sources like
YouTube, movies and Google videos under various illumination effects, occlusion,
camera angle variation and pose variation