Please use this identifier to cite or link to this item:
http://localhost:8081/jspui/handle/123456789/18945| Title: | GANGA WATER-QUALITY DATA IMPUTATION, ASSESSMENT OF POLLUTION LEVELS AND FACTORS: THE MACHINE LEARNING APPROACH |
| Authors: | Tekile, Ararso Beshea |
| Issue Date: | May-2024 |
| Publisher: | IIT, Roorkee |
| Abstract: | River water quality monitoring, modeling and assessment of environmental factors are essential to develop effective strategies to reduce river pollution. However, water quality research is limited globally, especially in developing countries such as India, due to challenges related to the scarcity and incompleteness of water quality parameters (WQPs) data and their high variability with environmental variables. In this study, we addressed these challenges by evaluating several imputation methods (including R MICE, KNNImputer, and IterativeImputer) and machine learning models integrated with SHAP (SHapley Additive exPlanations) for imputing and modeling 21 years of WQPs data from the Ganga River, spanning from 2001 at 13 monitoring stations. Each imputation method was evaluated based on imputation quality and its impact on downstream ML predictions of the Overall Index of Pollution (OIP) using AutoML. The OIP, a key indicator in this study, combine various WQPs to assess pollution levels in the Ganga River. Our results indicated that the MICE CART model with XGBoosting regressor showed satisfactory performance (R2 scores: training data = 0.66, test data = 0.52). Analysis of 21 years of OIP values revealed varying pollution levels along the river. Prior to 2017, some stations exhibited better water quality, but since then, there has been a general trend of slight to moderate pollution at almost all stations, except better water quality observed at initial station (Rishikesh) with predominant forest coverage. SHAP analysis highlighted that while other factors have maintained consistent impact on river pollution, but factors such as agricultural activities, increasing precipitation and river flow are raised the overall pollution level of the river between 2001 to 2021. This was shown by the increase in OIP value from 2.675 to 3.736 is positively correlated with impact importance SHAP values of these factors within this period. In addition, the approach proposed in this study will help water resources researchers improve water quality data, enabling policymakers and environmentalists to make informed decisions to preserve the health and ecosystem of the Ganga River. |
| URI: | http://localhost:8081/jspui/handle/123456789/18945 |
| Research Supervisor/ Guide: | Kasiviswanathan, K.S. |
| metadata.dc.type: | Dissertations |
| Appears in Collections: | MASTERS' THESES (WRDM) |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 22571002_ARARSO BESHEA TEKILE.pdf | 2.4 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
