Please use this identifier to cite or link to this item:
http://localhost:8081/jspui/handle/123456789/18560| Title: | GENERATING AND CLASSIFYING DNA SEQUENCES WITH DEEP LEARNING & GENERATIVE MODELS |
| Authors: | Jain, Siddhant |
| Issue Date: | Jun-2024 |
| Publisher: | IIT, Roorkee |
| Abstract: | Grouping DNA sequences is crucial as it directly influences the identification of viruses, containment of outbreaks, and development of new medicines. Although faced with challenges such as feature selection in datasets with a large number of dimensions and the absence of explicit characteristics in DNA sequences, conventional machine learning methods have achieved success in this endeavor. Recent breakthroughs in deep learning (DL) have enabled the automatic extraction of intricate features from input data, revolutionizing data analysis. This study investigates the use of deep learning models, especially Convolutional Neural Networks (CNN), and other Gated Neural Networks architectures, to classify DNA sequences. Classification of DNA sequences (Nucleic acid) uses K-mer encoding technique in which we slide the K-length window over the whole DNA sequence and extract features. DNA Sequences carries meaningful information of an organisms also they carries genetic information about their ancestors and genetic mutations if any. Using (GAN) model to make nucleic acid sequences with certain properties, especially in the case study of the influenza virus, helps us understand how sequences are made. Furthermore, the application of deep learning models in genomics has shown promising results in predicting gene functions and identifying genetic variations. Putting GAN models together with CNN and LSTM architectures might make DNA sequence classification tasks more accurate and faster. The GAN model we deployed combines a predictor and an evaluator to optimize synthetic sequences for realistic characteristics. Utilizing a discriminator to eliminate undesirable outputs helps achieve this. The evaluation metrics, including latent interpolation, latent complementation, and motif-matching, play a crucial role in evaluating the model’s performance and effectiveness. This high correlation percentage of 93.2 % |
| URI: | http://localhost:8081/jspui/handle/123456789/18560 |
| Research Supervisor/ Guide: | Toshniwal, Durga |
| metadata.dc.type: | Dissertations |
| Appears in Collections: | MASTERS' THESES (CSE) |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 22535031_SIDDHANT JAIN.pdf | 1.31 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
