REAL TIME OBJECT DETECTION

Yadav, Dinesh Kumar

DSpace Home
→
ELECTRONICS AND COMMUNICATION ENGINEERING (FORMERLY ELECTRONICS & COMPUTER ENGINEERING)
→
MASTERS' THESES (E & C)
→
View Item

REAL TIME OBJECT DETECTION

Yadav, Dinesh Kumar

URI: http://localhost:8081/jspui/handle/123456789/16041

Date: 2019-06

Abstract:

Due to huge increase in computer vison applications new algorithms are being developed. The focus of every algorithm is to surpass human like peformance in practical applications. Object Detection has been the key centre problem of computer vision in the past few years due to its demand in industry based application like self driving cars which is the single largest reason for recent development in object detection, recognition and tracking algorithms. With self autonomous vehicles as the application of focus for this implementation improving speed of detection without loss in accuracy is primary. The need of algorithms that can process detect and process information from an image/video feed in real time is critical for this work. There are a huge number of algorithms that have near real time speeds for object detection but they all vary from one another in terms of architecture which ultimately decides their accuracy of classi cation and localization and speed of implementation. An extensive study of variety of these algorithms such as VGG-16, Single Shot Detector (SSD), Deep Mask, Sharp Mask, AlexNet, Zieler-Fergus Nets (ZF-Nets), Feature Pyramid Networks (FPN), Residual Networks (ResNets), GoogLeNets, Generative Adversarial Networks (GAN's) and Spatial Transformer Networks is done and their merits and demerits are analyzed in the literature review for this work. This approach focuses on modifying the existing architecture of Mask-RCNN with FPN and ResNets backbone to increase speed of detection. The bounding boxes are generated on the go and the bounding box that came with the dataset are not used for a more generalized approach. The bounding box is decided to be the smallest box that encapsulates all the pixels of generated mask. This simpli es the implementation and makes easier to apply image augmentations for a more robust network.

Show full item record