Abstract:
Software fault prediction is the procedure to foresee whether a module in software is faulty or not by utilizing the past information and some learning models. From the previous version of the software past information is collected. The performance of a classifier depends upon various factors and quality of dataset is one among them. Real world datasets often contains noise which degrades the classifier’s performance. So, to remove the noise in dataset we propose a two stage pre-processing, which combines rough set theory followed by oversampling and denoising auto encoder to extract the noise robust version of original dataset. In first stage we collect the certain instances from dataset using rough set theory followed by oversampling for handling class imbalance. In second stage, we extract the robust to noise version of original dataset with the help of denoising auto encoder. The proposed approach is being evaluated on NASA MDP dataset and Eclipse dataset in order to show the effectiveness of proposed approach. Further this work tries to study various denoising techniques present in literature.