Please use this identifier to cite or link to this item:
http://localhost:8081/jspui/handle/123456789/20359| Title: | RAINFALL PREDICTION: A COMPARATIVE ANALYSIS OF MACHINE LEARNING CLASSIFIERS |
| Authors: | Mishra, Rajat |
| Issue Date: | Apr-2022 |
| Publisher: | IIT, Roorkee |
| Abstract: | Rainfall prediction has gained popularity and attention due to its complexities and persistent applications such as flood forecasting, drought forecasting, monitoring of pollutant concentrations, etc. The study area is Idukki, as Idukki is one of the most flood and landslide-prone areas in Kerala. The weather data collected are – Daily data (1982-2021) and Hourly data (2002-2021). Both the datasets are labeled; Daily data is used in bi classification (0: No Rain, 1: Rain), and Hourly data is used in multi-classification (0: No Rain, 1: Light Rain, 2: Moderate Rain, 3: High Rain). Both datasets were found to be imbalanced, so data balancing is needed. Scaling is done after balancing, so every feature has been on the same scale. Feature selection is an essential step after data pre-processing. Correlation is checked among all the features. Then, the required features that show good relations to a specific threshold are selected. ML algorithms like - DT, LR, SVM, etc., and EML techniques like – Voting, Bagging, Boosting, etc., are used. For comparing these models to others, the evaluation metrics used are – Accuracy, Recall, Precision, and F1 score. In bi-classification, RF is performing the best as it leads in 4 out of 6 evaluation metrics, and the Light GBM is the worst among all as it is lagging in 4 out of 6 evaluation metrics. In multi-classification, for predicting class 3 or High rains, GNB is the worst model, and RF is the best of all. Prediction of class 2 or moderate rains is worst among all types by ML models. A pipeline for real-time rainfall prediction may be created using these models in the future scope. Also, deep learning algorithms can be applied as the data increases; the accuracy of ML models tends to be constant, but DL models' accuracy tends to increase. The approach used in this research demonstrates how reanalysis data from global mathematical models may be used to create regional models that are less computationally expensive. |
| URI: | http://localhost:8081/jspui/handle/123456789/20359 |
| Research Supervisor/ Guide: | Arya, Dhyan S. |
| metadata.dc.type: | Dissertations |
| Appears in Collections: | MASTERS' THESES (Hydrology) |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 20537018_Rajat Mishra.pdf | 4.06 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
