Abstract:
Object detection and tracking are signi cant and challenging tasks in many computer
visualization applications such as surveillance, urban planning and navigation systems.
It is used to monitor security elds such as banks, tra c monitoring systems, depart-
mental stores, crowded public places and defense applications. Objects exhibit complex
interactions like partial and full occlusions, splitting, non-rigidness and surrounding of
objects. Therefore, Visual Tracking System should function in all kind of di erent situ-
ations. In this report, an approach is veri ed for multi-object detection and tracking in
dynamic scenario with full occlusion handling in stationary and moving camera videos.
The algorithm comprises of various steps, is mainly tested on detection of objects in the
current frame and prediction of new location of existing track in the upcoming frame.
Various detection techniques such as Adaptive Gaussian Mixture Model(AGMM), Frame
Di erencing, Background Subtraction etc are analyzed according to environment. These
detection techniques yield binary image that contain white foreground pixels and black
background pixels. Adaptive Background subtraction works well with gradually chang-
ing atmosphere. However, stationary objects not present in reference background, are
considered as foreground objects. Frame Di erences is highly adaptive with surround-
ings but gives holes in object region. It overcomes the problem of adaptive background
subtraction. Adaptive Gaussian Mixture Model with mean shift(MS) segmentation is an
e cient method to extract moving objects in gradually dynamic scenes in indoor and
outdoor surroundings and periodic motions present in background of stationary camera
video sequences. Entering of new objects into the eld of view, leaving of older objects
from eld of view, splitting and merging of objects are recognized by blob analysis. It is
used to nd the statistical properties of connected foreground pixels in binary image.
The objects are tracked by using a Mean-Shift method with AGMM based detection.
AGMM employs a Gaussian mixture representation of state and noise densities. Further,
CNN based detection technique followed by Kalman lter tracking is implemented to
avoid the drawbacks of AGMM. To represent data using convolutional layers, region of
interest(ROI)pooling is applied to the outputs of each layer on the object candidate re-
gions generated using object proposal generation which is further used by the FC networks
for the classi cation of objects. Experiment on PETS 2009 dataset and MOTChallenge
2015 2D benchmark datasets successfully implemented and the comparative study of re-
sult veri ed that our method performs favorably against the state-of-the-art methods
iii
in both single-camera and multi-camera multi-target tracking, while achieving close to
real-time running e ciency.