Abstract:
Due to huge increase in computer vison applications new algorithms are being developed.
The focus of every algorithm is to surpass human like peformance in practical applications.
Object Detection has been the key centre problem of computer vision in the past few
years due to its demand in industry based application like self driving cars which is
the single largest reason for recent development in object detection, recognition and
tracking algorithms. With self autonomous vehicles as the application of focus for this
implementation improving speed of detection without loss in accuracy is primary. The
need of algorithms that can process detect and process information from an image/video
feed in real time is critical for this work.
There are a huge number of algorithms that have near real time speeds for object
detection but they all vary from one another in terms of architecture which ultimately
decides their accuracy of classi cation and localization and speed of implementation.
An extensive study of variety of these algorithms such as VGG-16, Single Shot Detector
(SSD), Deep Mask, Sharp Mask, AlexNet, Zieler-Fergus Nets (ZF-Nets), Feature Pyramid
Networks (FPN), Residual Networks (ResNets), GoogLeNets, Generative Adversarial
Networks (GAN's) and Spatial Transformer Networks is done and their merits and
demerits are analyzed in the literature review for this work.
This approach focuses on modifying the existing architecture of Mask-RCNN with
FPN and ResNets backbone to increase speed of detection. The bounding boxes are
generated on the go and the bounding box that came with the dataset are not used for
a more generalized approach. The bounding box is decided to be the smallest box that
encapsulates all the pixels of generated mask. This simpli es the implementation and
makes easier to apply image augmentations for a more robust network.