CDLSTM: A Novel Model for Climate Change Forecasting

: Water received in rainfall is a crucial natural resource for agricul-ture, the hydrological cycle, and municipal purposes. The changing rainfall pattern is an essential aspect of assessing the impact of climate change on water resources planning and management. Climate change affected the entire world, specifically India’s fragile Himalayan mountain region, which has high significance due to being a climatic indicator. The water coming from Himalayan rivers is essential for 1.4 billion people living downstream. Earlier studies either modeled temperature or rainfall for the Himalayan area; however, the combined influence of both in a long-term analysis was not performed utilizing Deep Learning (DL). The present investigation attempted to analyze the time series and correlation of temperature (1796–2013) and rainfall changes (1901–2015) over the Himalayan states in India. The Climate Deep Long Short-Term Memory (CDLSTM) model was developed and optimized to forecast all Himalayan states’ temperature and rainfall values. Facebook’s Prophet (FB-Prophet) model was implemented to forecast and assess the performance of the developed CDLSTM model. The performance of both models was assessed based on various performance metrics and shown significantly higher accuracies and low error rates. of long-term of and for India’s states. A DL-based LSTM model on rigorous hyper-parameters tuning to the The MAPE, NSE, to evaluate the CDLSTM model All the twelve Himalayan states showed increasing after Arun and showed for Himalayan ◦ present investigation found a strong correlation (0.98) A&M

very fragile and sensitive to climate change parameters such as temperature and rainfall. The criticality of rainfall analysis is evident from the June 2013 Uttarakhand disaster. The UK state received 847% extra rainfall in the third week of June, resulting in the Kedarnath flood disaster that took more than 5000 lives.
In current times, the utility of machine learning (ML) approaches has a significant presence in almost every area, and the successful utilization of deep learning (DL) has opened new dimensions for efficient time series forecasting. ML/DL has been used extensively for climate analysis and forecasting [4][5][6][7][8][9]. A study by [10] provides a comprehensive report on climate change in the Himalayas and suggests modeling future climate status, especially for the Himalayan region. [11,12] applied Dl techniques such as DNN and RNN for weather forecasting; however, the scope and data used in their investigation were limited. However, significant issues with these studies are rigorous parameter tuning, cross-validation of the model on different data, and computational efficiency. Additionally, while previous studies have addressed annual climate issues for the entire expanse of India, very few studies have explicitly focused on all the Himalayan states due to the complexity of data for the Himalayan region [13][14][15]. Understanding the climate variability for Himalayan states on a monthly temporal scale is crucial for hydrological and climatic models [16]. The present study focuses on all the Himalayan states to provide comprehension of changing temperature and rainfall patterns. The present investigation contributes to three significant aspects: detecting rainfall and temperature trends, analyzing the correlation between temperature and rainfall, and forecasting the temperature and rainfall using a novel CDLSTM model and Facebook Prophet (FB-Prophet) Model.

LSTM
LSTM networks are a type of Recurrent Neural Network (RNN) that uses special units (cells) and standard units to overcome the limitation of traditional RNN [17][18][19][20]. There are three gates, which are contained by a cell in LSTM. The first gate is the input gate, the second is termed the forget gate, whereas the third is the output gate. The LSTM network composition function's description is based on the input node, and the three gates are contained by a cell, cell state, and output layer. Eqs. (1)- (7) are as follows [20,21].
Input node Input gate Forget gate Output gate Cell state Hidden gate Output layer

Facebook's Prophet Model
Facebook's Prophet is an open-source forecasting tool based on a decomposable additive model, similar to a generalized additive model (GAM). Prophet can fit nonlinear time series with seasonality. The Prophet forecast model can be expressed as Eq. (8).
where, F(t) = forecast, L(t) = long term trend, S(t)= short term trend, E(t) = error, and X(t) = any other influencing variable to forecast.
Prophet has two models: logistic growth model (LGM) and piece-wise linear (PWL) model. The selection of the model depends on the time series data. The LGM model can be used if the time series shows non-linearity, saturation, and no change after reaching the saturation point. If the time series exhibits linear tendency and a previous track of shrink and growth, then PWL is a better option. The LGM can be expressed as Eq. (9).
where CC = carry capacity, g = growth rate, and o is an offset parameter. The PWL can be expressed as Eq. (10).

Rainfall
The monthly rainfall dataset was obtained from more than 3000 rain-gauge stations spread over India, covering 115 years (Jan 1901-Dec 2015). The dataset was released by Indian Meteorological Department (IMD) (https://www.imdpune.gov.in/).

Temperature
The Berkeley Earth monthly average data from Jan 1796-Aug 2013 was procured from https://gui des.lib.berkeley.edu/publichealth/healthstatistics/rawdata. It was generated based on a variety of data, including bias-corrected station data, regional data. The data was developed from various sources with quality control, and monthly averages were created from daily data. A standard temporal observation period of Jan 1901-Aug 2013 was considered to understand the relationship between temperature and rainfall.

Data Preprocessing
For in-depth analysis of seasonal patterns of temperature and rainfall, the data was divided into four seasons based on India's meteorological and international standards, i.e., Dec-Feb as winter, March to May as spring, June-Sep as monsoon, and Oct-Nov as post-monsoon or autumn.
Temperature and rainfall data were used; therefore, the term monsoon was used instead of summer for rainfall analysis purposes. Data transformation is crucial before implementing any ML model. Three data transformations were applied in the current investigation. The first transformation was removing missing values and replacing them with average values from the respective records. The second step was transforming time-series data into input and output so that the output of a step could become the input for the next step to forecast the value of the current time step. As described earlier, the total common data in the time series covered 1352 monthly values. The first 980 months' dataset for all Himalayan states was taken for the training, while testing took 240 months, and validation used the dataset of 120 months of the LSTM model; the remaining twelve months of data were kept separate from the training process for the unbiased external validation of the LSTM prediction. The third transformation was the scaling of time series data from -1 to 1. These three transformations were inverted after the prediction step to get the values at the original scale so that the uncertainty calculation could be adequately assessed.

Trend Analysis
Mann-Kendall tests [22,23] were carried out for trend analysis, detecting trends and changes in temperature and rainfall over the years of analysis. Sen's slope values [24] were used to understand the trend of GWSC change for all Himalayan states from Jan 1901 to Dec 2015. Statistics of Mann-Kendall S value [22,23] were evaluated for chronologically placed observations in the time series Eq. (11). The observations VAR(S) variance in the time series was also estimated as per Eq. (13). Standardized test Z Eq. (14) [25] for the statistical analysis was also performed.
Here, X i and X j are chronologically placed values of variables in the time series, n represents the total count of observations, ties for pth value is shown as t p , and tied values number is shown as q. When Z is positive, it means an increasing trend in the climatic variable and vice versa.

Correlation Analysis Between Variables
An attempt was made to study the correlation analysis based on Moment Correlation Coefficient (MCC) among temperature and rainfall values for all the twelve Himalayan states from Jan 1901 to Aug 2013, as per the availability of a common temporal dataset.
The MCC summarizes the direction and degree of linear relations between actual and modeled datasets. The correlation coefficient can take values between -1 (perfectly negative correlation) through 0 (no correlation) to +1 (perfectly positive correlation). The MCC formula to compute the correlation coefficient is given in Eq. (15).
Here, N represents the number of pairs of data. The terms X and Y are parameters.

Development and Tuning of CDLSTM Model
Keras library with TensorFlow and Python version was used to develop the LSTM models in the current study. The libraries used in the current investigation were Plotly, NumPy, Seaborn, Pandas, Matplotlib, and scikit-learn. A four-step procedure was applied to develop the LSTM model.
The first step was to define the LSTM network to aid LSTM model development. Eight LSTM layers were used in the current investigation, in which four layers were dense, and three were the dropout layers, see Fig. 1. The dropout layer "drops out" inputs to a layer, which may be input variables from a previous layer. A value of 0.5 was chosen with two dropout layers. The second step was the network compilation. It required several parameters, such as an optimization algorithm to train the network and the loss function to evaluate the network. Several optimizers were tested based on their performances. The third step was the fitting of the LSTM model. The fourth and essential step was the prediction using the LSTM model. We forecasted the output step by step for the test data. The model fed the current forecasted value back into the input window by moving it one step forward to forecast the next step using the moving-forward window technique [26]. Here we used a moving forward window of size 12. We forecasted the average temperature individually for all Himalayan states from Sep 2012 to Aug 2013 using one step ahead regression based on window size.
Several hyper-parameters such as optimizer, number of units, learning rate, momentum, and activation functions must be chosen a priori and then tuned based on the RMSE values. Tuning the hyper-parameters of any neural network model is essential for evaluating the performance and stability of the DL model. The first configuration tuned was the number of nodes, which affected the LSTM model's learning capability. A higher number of nodes ensures excellent learning ability for complex data at the cost of computation time and can cause overfitting. Different nodes (2,4,6,8, and 10) were tested for different configurations. A lesser average RMSE value of 1.4 and the lowest variance based on 20 experimental runs were obtained with four nodes. However, since it could indicate overfitting, dropout was applied to prevent overfitting, where the neurons were randomly chosen and ignored during model training to address the issue of overfitting. The number of epochs (10,20,30,40, and 50), optimization algorithm (Adam, RMSProp, Adagrad, SGD), and individual learning rate (1e-2 to 1e-6) were also rigorously tuned. It was observed that the tuned LSTM model with eight nodes, trained for 20 epochs with an ADAM optimizer having LR of 1e-2, showed the best performance based on RMSE and computational efficiency in the current investigation.

Facebook's Prophet Model
The Prophet forecast model looks straightforward; however, the computation can be complex due to the selection of parameters. The selection of the LGM or PWL model depends on the time series data. The LGM model was applied for rainfall data due to the time series; however, PWL model was applied for temperature forecasting as it exhibits linear tendency.
The uncertainty in the forecasting values can be obtained by forwarding the GAM model, which can be expressed as Eq. (16).
where ∂ is a rate scale parameter.
The FB-Prophet model was imported. The Prophet model was fitted with training data, and forecasting was implemented based on 12 periods and month start (MS) as frequency.

Uncertainty Assessment of the LSTM Model
The MCC, Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and Nash-Sutcliffe coefficient (NSE) were utilized to evaluate the uncertainty of the LSTM model output. The Mean Absolute Deviation (MAD) was also considered to analyze the LSTM model's accuracy between measured and predicted values.
RMSE is a method to calculate the error or accuracy in predicting models based on standard deviation Eq. (17). The final output is given in the form of the standard deviation of the error's magnitude, as per Eq. (17); the individual calculations are outputted as residuals based on [27].
Here, P_i is the i th LSTM predicted value, and O i is the i th original value.
The MAPE method was used to calculate the prediction accuracy of the LSTM forecast. The calculation was based on the difference between the original values and values forecasted by the LSTM and dividing the original value difference. It was then multiplied by the number of observations and 100 to obtain the percentage error (18) [28].
Here, A t represents actual value. Similarly, symbol F t represents the forecasted value or the predicted value. MAD was used to calculate the dispersion of LSTM forecasted values, as per Eq. (19). A lower value of MAD indicates that the forecasted data values are closely concentrated.
where P i is the ith data value,ĉ is the mean value, and n is the number of samples.
The NSE or efficiency coefficient test determines the magnitude between the residual time series and variance of actual data, and its value ranges from -∞ to 1, see Eq. (20). An output near one indicates higher model quality and reliability, while a value below zero suggests unreliable model. NSE test has been utilized in LSTM and FB-Prophet forecasting models [29].
where y, y, and y f are the actual time series, mean of the actual time series, and forecasting series, respectively.

Results and Discussions 4.1 Trend Analysis of Temperature and Rainfall
The monthly precipitation was found to have decreased over the period 1901-2015 in the Himalayan states. A 20 mm decrease was observed from 180 mm to 160 mm. The decrease in precipitation occurred after July 1995. The highest monthly precipitation (742 mm) was received in July 1948. Feb 2005 has the lowest average temperature for the UK, Him, J&K, A&M. The highest avg temperature was 28.28 o C for A&M for June 2013, followed by NMMT (27.87 o C) the same month. The highest rainfall (1347.2 mm) occurred for NMMT among all Himalayan states in Aug 1969, followed by A&M (995.2 mm) for July 1984. Higher temperatures were increased after the year 2000, and the occurrences of high rainfall were decreased after the 1990s. It was evident that climate is changing rapidly, especially in Himalayan states. Results with a confidence factor ≥ of 90% indicate a significant trend in the rainfall averaged over the Himalayan states, see Tab. 1. Fig. 2   The mean annual temperature of Himalayan states was observed to have increased around 1.07 • C between 1796-2013. Remarkably, it increased only by 0.98 • C for the entire of India for the same period. The temperature of the Himalayan states is increasing faster than in the rest of the country. The average winter temperature rose by 1.27 • C over the past century, while post-monsoon temperature increased by 1.03 • C, see Tab. 2. The intense increase in temperature occurred after July 1998. The temperature of monsoon and spring did not show a significant difference for West Bengal from 1796-2013.

Correlation Analysis Between Temperature and Rainfall
The MCC was performed to understand the relationship between temperature for the twelve Himalayan states from Jan 1901 to Aug 2013 as per the common temporal dataset availability Eq. (16). There was a strong correlation (0.98) between the average temperature of all Himalayan states, see Fig. 3. Because the average temperature showed an increasing trend in all Himalayan states. It was necessary to understand the influence of temperature on rainfall. The correlation coefficient between temperature and rainfall was significantly strong for Northeastern Himalayan states A&M (0.80), WB&S (0.78), NMMT (0.76), and Arun (0.62); however, it was weak for Northwestern Himalayan states UK (0.5) and Him (0.39) and J&K (0.18). The stronger correlation in the northeastern states is due to an increase in temperature and decrease in rainfall; however, northwestern states such as J&K and UK showed an increase in rainfall and temperature in an inconsistent pattern. The primary reason for the increase in rainfall is the complex assimilation of monsoon and westerlies in the northwestern Himalayan region.

Temperature Forecasting Based on CDLSTM and FB-Prophet
There was a possibility while predicting the future values that the LSTM and FB-Prophet models' output may be uncertain as the model's output was fed back into it as input. Therefore, we forecasted the temperature from

Rainfall Forecasting Based on CDLSTM
The developed CDLSTM model was used to forecast the rainfall values for all Himalayan states. The training and testing performance of the CDLSTM model is shown in Fig. 6, it represents the forecasted rainfall for all 12 Himalayan states, with subplots shown as (a) J&K; (b) Him; (c) UK; (d) A&M; (e) WB&S; (f) Arun; (g) NMMT. It is worth noting that the CDLSTM based forecasted rainfall shown good matching with actual rainfall for all Himalayan states except J&K and Him. J&K and Him showed less coherence with forecasted rainfall due to the high complexity of the snowfall and rainfall pattern due to Indian monsoon in the summer season and westerlies in the winter months. The CDLSTM model forecasted the temperature for the entire time series as training and testing data from Jan 1901 to Dec 2015 and compared them with the actual values based on the coefficient of determination, RMSE, MSE, MAPE, MAD, and NSE, see Tab. 5. The CDLSTM model for rainfall forecasting showed less accuracy for J&K, which may be due to the inconsistent rainfall pattern for the J&K state. Best forecasting values were obtained for A&M state using the developed CDLSTM model based on all performance metrics, see Tab. 5. An interesting observation was that the performance of the developed CDLSTM model was significantly better for temperature forecasting than rainfall forecasting. The primary reason behind this difference was the higher fluctuation in rainfall data than temperature data.

Comparison with Other Studies
The forecasting performance of the current study was compared with other benchmark studies based on R 2 and RMSE values, see Tab. 6. Based on performance metrics, the present study's models showed better results than previous studies. The FB-Prophet model implemented in the present investigation with the PWL algorithm showed remarkably efficient performance based on accuracy metrics, see Tabs. 3 and 4. As per available literature, the current investigation's performance achieved by the FB-Prophet model for temperature and rainfall forecasting is the highest, based on accuracy metrics. The developed CDLSTM model has lower accuracy than the FB-Prophet model; however, the CDLSTM model showed better performance than models applied in previous studies. In the present investigation, the seasons in the Himalayan area were defined as per IMD and international standards. It is an important criterion to put the correct months as per the respective seasons. [4] defined summer as Jan and Feb, which might be an error. For India, the monsoon season overlaps with summer; therefore, studies focusing on rainfall trends and forecasting should use monsoon instead of summer rainfall to reduce ambiguity. Models based on ML require intense hyperparameter tuning to achieve performance with model stability. ML models might provide higher accuracy without proper optimization; however, this accuracy might be illusionary and unstable. The study was done by [4] to show a lack of hyperparameters tuning. The present investigation attempted rigorous hyperparameter optimization to ensure efficient model performance with model stability for the developed CDLSTM. Additionally, the CDLSTM model developed on temperature dataset was applied and assessed on a different dataset, i.e., rainfall dataset. In order to evaluate one model, it is imperative to conduct a comparative analysis with a different model for secular evaluation. The present investigation compared the developed CDLSTM model with the popular FB-Prophet model and showed significant performance; however, [4] did not compare the ANN model with another model. Data preprocessing steps such as removing missing values, data transformation, etc., are vital to building an efficient ML forecasting model. The study [4] observed that the rainfall data from 1901-2015 has no missing values; however, the present investigation found that the same dataset for the entire India had 1036 missing values. Therefore, the present investigation replaced the missing data values with the mean value of the respective parameter. Figure 6: Performance of CDLSTM model for rainfall forecasting, training, and testing forecasted rainfall, shown using red, green, and blue, respectively

Computational Efficiency of the Present Investigation
After optimizing the CDLSTM model, it took 45 s 21 ms/step for 20 epochs, i.e., a total of 907 seconds or 15 min 12 sec to complete the training. Optimization saves computation cost by selecting the best number of parameters, including the number of epochs. The optimized model took only 40% computational time compared with 50 epochs in 40 minutes. The imported FB-Prophet model took three minutes to perform the results, only 20% of the computational processing time.

Conclusions
The present investigation provides an understanding of the long-term historical and forecasted data of temperature and rainfall for India's Himalayan states. A DL-based LSTM model was developed based on rigorous hyper-parameters tuning to forecast the temperature and rainfall. The correlation coefficient, MSE, RMSE MAPE, NSE, and MAD were obtained to evaluate the CDL-STM model performance. All the twelve Himalayan states showed increasing temperatures after 2000 and a decrease in rainfall after 1990. Arun and NMMT showed decreasing trends for rainfall; however, rainfall over J&K, UK, WB&S showed an increasing trend. The Himalayan state with the highest average rainfall was Arun, while the lowest average rainfall was for J&K. Mean annual temperature of the Himalayan states increased around 1.07 • C between the last two centuries; interestingly, it has increased 0.98 • C for entire India for the same period. The Himalayan states are experiencing more severe impacts of global warming. The present investigation found a strong correlation (0.98) between the average temperature trend for all the Himalayan states. The correlation coefficient between temperature and rainfall was significantly strong for Northeastern Himalayan states A&M (0.80), WB&S (0.78), NMMT (0.76), and Arun (0.62); however, it was weak for Northwestern Himalayan states UK (0.5), Him (0.39) and J&K (0.18).
The present investigation developed the CDLSTM model containing eight LSTM layers, where four layers were dense, and three were the dropout layers. The CDLSTM model was optimized based on rigorous parameters tuning. The developed CDLSTM model showed promising performance based on various metrics such as R2, MSE, RMSE, MAPE, MAD, and NSE. The developed CDLSTM model was likely to estimate the possible future values of temperature and rainfall accurately, given its reliability. The FB-Prophet model implemented in the present investigation with the PWL algorithm showed remarkably efficient performance based on accuracy metrics. As per available literature, the current investigation's performance achieved by the FB-Prophet model for temperature and rainfall forecasting is the highest, based on accuracy metrics. The developed CDLSTM model has lower accuracy than the FB-Prophet model; however, the CDLSTM model showed better performance than other models applied in previous studies. Both CDLSTM and FB-Prophet model's performance showed good forecasting values for all months, including Jan 2013, where the temperature was low due to the peak winter season. The future scope of the present investigation is to add more data on snow retreat, glacier melt, agricultural yield, and demographics to assess the complete cycle of climate change for the Himalayan region. Another future scope of the present investigation is to implements and assimilate the latest state of the art models for climate modeling and forecasting [30][31][32][33][34].

Limitations and Learning Points of the Present Investigation
The significant limitations of the present study include (1) Although the performance of the developed CDLSTM model was significantly higher than previous studies, the imported FB-Prophet model with PWL algorithm performed better than the developed CDLSTM model. (2) The computation of the tuned CDLSTM model took 15 minutes for 20 epochs, so an improvement in computational efficiency is required. (3) The reasons to choose LSTM in the present investigation are its capability to deal with the vanishing gradient problem and better control, flexibility, and performance than traditional RNN. (4) The LSTM model has limitations such as the requirement of high memory bandwidth due to linear layers; also, it is more prone to overfitting and is too complex to apply dropout, (5) The effect of Gulfstream weakening on climate change on agricultural productivity will be a future scope as parts of the US and Europe are influenced by the Gulf Stream.