ADVANCED DATA ANALYSIS AND MACHINE LEARNING TECHNIQUES FOR SOLAR PV FORECASTING

Gupta, Priya

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/20420

Full metadata record

DC Field	Value	Language
dc.contributor.author	Gupta, Priya	-
dc.date.accessioned	2026-04-13T06:36:48Z	-
dc.date.available	2026-04-13T06:36:48Z	-
dc.date.issued	2024-05	-
dc.identifier.uri	http://localhost:8081/jspui/handle/123456789/20420	-
dc.guide	Singh, Rhythm	en_US
dc.description.abstract	The work in this thesis draws upon the developments in machine learning (ML) techniques and modern data analysis or decomposition methods for solar PV forecasting. Moreover, it focuses on developing decomposition-based hybrid models by balancing performance accuracy and time complexity. The primary objective of integrating decomposition techniques with ML models is to simplify their learning process by dividing complex time series data into simple subseries. One of the challenges associated with using decomposition techniques is their time-consuming nature. Following a detailed study of the relevant literature, this thesis presents various solutions in terms of the selection of ML techniques (i.e., distance-based ML, kernel-based ML, tree-based ensemble, traditional neural networks, and deep learning (DL)), utilization of dimensionality reduction methods (such as Principal Component Analysis (PCA)), and choice of the appropriate decomposition approach (i.e., Empirical Mode Decomposition (EMD) and its updated variants). The work performed in this thesis involves univariate and multivariate Global Horizontal Irradiance (GHI) forecasting, spanning across hour-ahead forecasting, hourly day-ahead forecasting, and forecasting applications in microgrid. The performance of the developed models has been tested for different Indian locations lying under four distinct climatic zones: hot-dry, composite, cold, and warm-humid. Given the scope of work, first a univariate GHI forecasting model for a forecast horizon of 1-11 h has been developed, comparing (i) traditional EMD with its updated univariate variant (EEMD), and (ii) tree-based ensembles with kernel-based ML. For a forecast horizon of 1 h, both EMD and EEMD demonstrated appropriateness; however, the latter outperformed the former with an average error reduction of 25.35 % while combining with the best-performing ML model. In contrast, with the increase in forecast horizon from 1 to 11 h, EMD didn't fit well, while EEMD exhibited superiority. Next, multivariate GHI forecasting has been explored in this thesis. For this purpose, univariate EMD and EEMD have been replaced by their multivariate versions, viz., Multivariate Empirical Mode Decomposition (MEMD) and Noise-assisted Multivariate Empirical Mode Decomposition (NA-MEMD). MEMD has been combined with a stack of simple ML models (model 1) and with a combination of Principal Component Analysis (PCA) and modern Gated Recurrent Unit (model 2). For predicting an hour ahead of GHI (single-step forecasting), the average root mean square errors (RMSE) of 41.83 W/m² and 36.85 W/m² have been obtained for model 1 and model 2, respectively, across four studied locations. However, the reduced forecast error with model 2 compared to model 1 is achieved at the expense of high computational time complexity ((model 1): 535 sec and (model 2): 326 sec). This depicts a tradeoff between performance accuracy and computational time complexity for these two models. Further, the potential of decomposition-based ML/DL forecasters for multi-step (hourly day-ahead) PV power forecasting has been investigated following their assessment in single-step forecasting. This analysis demonstrated the superiority of Long Short-Term Memory (LSTM) (temporal feature extraction-based DL model) over Convolutional Neural Network (CNN) (spatial feature extraction-based DL model) and Extreme Gradient Boosting (XGBoost) (Boosting ensemble model) models. An average RMSE of 65.08 W/m² is found for hourly day-ahead PV power forecasting with the proposed NA-MEMD-LSTM model. This study also suggests replacing MEMD with NA-MEMD, as the latter consumes less time in decomposing the data while giving higher performance accuracy. As mentioned earlier, this thesis considers the time complexity of the forecasting models. A computer with specifications of a 64-bit operating system, 16 GB RAM, and an Intel Core i7-2600CPU@3.40GHz processor is used to run all the models. With the given computer specifications and for a data size of 20000 samples, the considered data analysis techniques can be arranged in ascending order of decomposition time as follows: EMD (≈ 20 sec) < EEMD (≈ 200 sec) < NA-MEMD (≈ 250 sec) < MEMD (≈ 300 sec). This thesis also examines the impact of the disparity between predicted and actual PV power generation on microgrid frequency. For the considered combinations of two forecasting models and three secondary controllers, the standard deviation (SD) of frequency is the lowest for the LSTM forecaster and Particle Swarm Optimization- Proportional Integral Derivative (PSO-PID) controller. The corresponding reduction of SD, after replacing Persistence: PSO-PID with LSTM: PSO-PID, in combination with \|clear: cloudy\| day is \|28.43 %: 32.12 %\| for overshoot and \|11.87 %: 18.36 %\| for undershoot frequency deviation.	en_US
dc.language.iso	en	en_US
dc.publisher	IIT Roorkee	en_US
dc.subject	Global horizontal irradiance, Solar PV, Time complexity, Machine learning, Deep learning, Data decomposition, Frequency control	en_US
dc.title	ADVANCED DATA ANALYSIS AND MACHINE LEARNING TECHNIQUES FOR SOLAR PV FORECASTING	en_US
dc.type	Thesis	en_US
Appears in Collections:	DOCTORAL THESES (HRED)

Files in This Item:

File	Description	Size	Format
2024_19901003_PRIYA GUPTA.pdf		9.77 MB	Adobe PDF	View/Open

Show simple item record