Deep Learning Based Modeling of Groundwater Storage Change

: The understanding of water resource changes and a proper projec-tion of their future availability are necessary elements of sustainable water planning. Monitoring GWS change and future water resource availability are crucial, especially under changing climatic conditions. Traditional methods for in situ groundwater well measurement are a significant challenge due to data unavailability. The present investigation utilized the Long Short Term Memory (LSTM) networks to monitor and forecast Terrestrial Water Storage Change (TWSC) and Ground Water Storage Change (GWSC) based on Gravity Recovery and Climate Experiment (GRACE) datasets from 2003– 2025 for five basins of Saudi Arabia. An attempt has been made to assess the effects of rainfall, water used, and net budget modeling of groundwater. Analysis of GRACE-derived TWSC and GWSC estimates indicates that all five basins show depletion of water from 2003–2020 with a rate ranging from − 5.88 ± 1.2 mm/year to − 14.12 ± 1.2 mm/year and − 3.5 ± 1.5 to − 10.7 ± 1.5, respectively. Forecasting based on the developed LSTM model indicates that the investigated basins are likely to experience serious water depletion at rates ranging from − 7.78 ± 1.2 to − 15.6 ± 1.2 for TWSC and − 4.97 ± 1.5 to − 12.21 ± 1.5 for GWSC from 2020–2025. An interesting observation was a minor increase in rainfall during the study period for three basins.


Introduction
The arid and semi-arid regions of the world have historically suffered from depleting freshwater resources, including TWS (Terrestrial Water Storage) and GWS (Ground Water Storage), apart from low rainfall and rising water demands. The depletion of these water resources primarily depends on climatic parameters (precipitation and temperature) and the rising water demand for municipal, agricultural and industrial purposes. The population of Saudi Arabia increased by 15.48% from 27.4 million in 2010 to 32.4 million in 2016, and the volume of extracted groundwater also rose by 27% from 15.8 × 109 cubic meters in 2010 to 21.6 × 109 cubic meters in 2016 [1]. Monitoring GWS change and future water resource availability are crucial, especially in  [23][24][25][26]. There are three gates, which are contained by a cell in LSTM. The first gate is the input gate, the second is termed as the forget gate, whereas the third is the output gate. The LSTM network composition function's description is based on the input node, and the three gates are contained by a cell, cell state and output layer. Eqs. (1)- (7) are as follows [27,28].

Rainfall
The monthly rainfall dataset of Saudi Arabia from January 2009 to December 2018 was procured from the General Authority for Statistics [1]. This dataset contained rainfall (in mm) from 26 PME MET Stations without significant gaps. The Tropical Rainfall Measuring Mission (TRMM) data was procured from January 2009 until December 2014, and Global Precipitation Measurement (GPM) data was procured from January 2015 to November 2020 from NASA's Precipitation Processing System (https://pps.gsfc.nasa.gov/). The precipitation data of MET stations and precipitation data from TRMM and GPM's satellite mission have been abbreviated as METRain and SatRain, respectively.

GRACE TWS and GWS Estimation
TWS was estimated using GRACE and GRACE-FO satellites from Jan 2003 to June 2020 at 0.5 • spatial and monthly temporal resolutions. GRACE, on its own, is incapable of separating anomalies from several elements of TWS (e.g., surface water storage, canopy water, and soil moisture content). Therefore, it is essential to subtract non-groundwater components. When this process is carried out for TWS data, the GWS can be obtained. The current study area being the arid and semi-arid part of Saudi Arabia, the contribution of surface water storage and canopy water was likely to be insignificant in the overall calculation. However, the soil moisture content (SMC) had to be subtracted from the change in TWS (TWSC) to get the change in GWS (GWSC), see Eq. (8).
In the equation, GWSC and SMCC showed changes in groundwater storage and changes in soil moisture content, respectively. Information related to soil moisture was obtained using the Global Land Data Assimilation System (GLDAS) [44,45]. The GRACE TWS data was re-gridded to 1 • to make the resolutions identical to GLDAS-modeled data based on the approach of [46]. The GLDAS-modeled data were chosen over other LSMs since it provides practical approximation related to the soil moisture content in arid areas [17]. Since the difference between GLDAS models can cause uncertainty in GWS estimates, the ensemble mean of GWS was estimated based on the three LSMs for our analysis.

GRACE TWSC and GWSC Uncertainty Estimation
The calculation of total TWSC uncertainties (∂TWSC) was performed using the methodology described by [17,47]. TWSC annual and semi-annual trends and first residual (r1) were calculated. The lag value of 13 months was used to remove the annual trend from the time series and second residual (r2), and its standard deviation (r3) was calculated. The values of r1, r2, and r3 were added to get the total uncertainty from the TWSC. The uncertainty in SMCC (∂SMCC) was estimated as the mean monthly standard deviation from the three GLDAS models [21,46,47]. The total uncertainty in the GWSC (∂GWSC) was calculated using quadratic addition related to TWSC and SMCC values, see Eq. (9).

Trend Analysis
Mann-Kendall tests [48,49] were carried out for trend analysis to detect trends and changes in GWSC over the years of analysis. Sen's slope values [50] were used to understand the trend of GWSC change for five sub-basins of KSA from Jan 2003 to June 2020. Statistics of Mann-Kendall S value [48,49] were evaluated for chronologically placed observations in the time series Eq. (10). The variance of the observations VAR(S) in the time series was also estimated as per Eq. (12). Standardized test Z Eq. (13) [51] for the statistical analyses was also performed.
Here, X i and X j are chronologically placed values of variables in the time series, n represents the total count of observations, ties for pth value is shown as tp, and tied values number is shown as q. When Z is positive, it means an increasing trend in GWSC, and vice versa.

Correlation Analysis Between Variables
An attempt was made to study the correlation analysis among GWSC, TWSC, SatRain and METRain for five basins from Jan 2009 to Dec 2016 as per the availability of a common temporal dataset [15]. compared the mean values of rainfall each month, obtained using records of rain gauges and global weather data, and reported correlation coefficients ranging from 0.72 to 0.85. We have used SatRain and METRain for correlation analysis for all five basins. A significant association between the records of SatRain and METRain cannot be anticipated as SatRain corresponds to measurements over significantly wide-ranging areas (0.5 o × 0.5 o ).

Data Pre-Processing for LSTM-Based Forecasting
Data transformation is crucial before implementing the LSTM model. Three data transformations were applied in the current investigation. First, a lag variable of 1 was applied to remove the decreasing trend in the dataset. The second step was the transformation of time series data into input and output so that the output of a step becomes the input of the next step to forecast the value of the current time step. As described earlier, total data in the time series were 186 monthly values. The first 150 months' dataset for all five basins were taken for the training (126 months) and testing (24 months) of the LSTM model; the remaining 36 months of data (July 2015-June 2020, data gaps of 2016 and 2017) were kept separate from the training process for the unbiased external validation of the LSTM prediction. The third transformation was the scaling of time series data from −1 to 1. All these three transformations were inverted after the prediction step to get the values at the original scale so that the uncertainty calculation could be adequately assessed.

Implementation of LSTM Model
Keras library version 2.3.0 with TensorFlow version 2.0 at the backend and Python version 3.8.0 was used to build the LSTM models in the current study. The libraries used in the current investigation were NumPy, Pandas, Matplotlib, and scikit-learn. A 4 step procedure was applied to implement LSTM.
The first step was to define the LSTM network, which is organized as a sequence of layers contained by a sequential model. LSTM layer requires that the number of inputs in a threedimensional shape consists of samples, time steps and features; therefore, reshape function was used to convert our data into a three-dimensional shape based on 130 samples of training, monthly time-step, and features represented by five basins. Two LSTM layers were used in the current investigation. In the first LSTM layer, the return_sequences were set to true, and it indicated that the output of each neuron's hidden state was used as an input to the next LSTM layer. Several hyper-parameters such as optimizer, learning rate, number of units, momentum, and activation functions have to be chosen a priori. Activation functions are required for functional mapping of an input value to an output signal; now, this output becomes the input in the next layer. The dropout layer was added between two LSTM layers, and outputs of the prior layer were fed to the subsequent layer to prevent overfitting. It works by "dropping out" or probabilistically removing inputs to a layer, which may be input variables from a previous layer. The reason to choose dropout over L1 and L2 regularization was that dropout provides regularization, including robustness to the network, allowing it to evaluate different networks. It has a float value between 0 and 1 with a default value of 1. A value of 0.5 was chosen with two dropout layers.
The second step was compiling the network. It required several parameters, such as an optimization algorithm to train the network and the loss function to evaluate the network. Several optimizers were tested based on their performances. The third step was the fitting of the LSTM model. The objective of model fitting is to adapt the weights based on a training dataset. It requires the training data to be specified for inputs and corresponding outputs. The initial value of epochs was given as 100 with a validation split value of 0.2 (i.e., 30 values for internal validation of the model) and input and output values.
The fourth and essential step was the prediction using the LSTM model. We forecasted the output step by step for the test data, and the model fed the current forecasted value back into the input window by moving it one step forward to aid forecasting at the next time step using the moving-forward window technique [23]. Here we used a moving forward window of size 24, which means we used the first 24 data inputs to forecast the 25th data point. Next, we used the window between 1 to 25 data inputs to forecast the 26th data point and so on. We used the pandas shift function that shifts the entire column by the specified number for moving the window. In this, we shifted the column up by one and then concatenated that to the actual data. After fixing the window's size, the 25th column in the table becomes the target y, and the first 24 columns become our input x1, x2,. . ., x24 features. Using this method, we forecasted the GWSC individually for 36 months from July 2015-June 2020 (data gaps of 2016 and 2017), or three years. As mentioned earlier, the model prediction's external evaluation was based on a separate dataset of 36-time steps data that had been kept separate from the training and testing months.

Tuning the LSTM Model
Tuning the hyper-parameters of any neural network model is essential for evaluating and appraising the model. The major hyper-parameters of LSTM tuned in the present investigation were (1) number of nodes, (2) number of training epochs, (3) choice of optimizer, and (4) learning rate. A walk-forward model evaluation using the hyper-parameters mentioned above was attempted for different configuration values. The prediction on the test dataset was evaluated based on root mean squared error (RMSE).
The first configuration tuned was based on the number of nodes, which influences the learning capability of the LSTM model. Generally, more nodes can learn more complex mapping at the cost of training time and sometimes cause overfitting. Different nodes (1, 2, 4, 6, and 8) were tested for different configurations. By using four numbers of nodes, a lesser average RMSE value of 1.4 and the least variance based on 20 experimental runs were obtained. However, since it could be an indication of overfitting, dropout was applied to prevent overfitting. The box and whisker plot ( Fig. 2A) suggested that eight nodes showed an average RMSE value of 1.85 and the highest variance.
The optimization algorithm was the third tuned hyper-parameter of the present LSTM model. Various optimization algorithms such as Stochastic Gradient Descent (SDG), Adagrad, Adam, AdaDelta, and RMSProp, were tested. The Adam (Adaptive Moment Estimation) algorithm is an extension of SGD that calculates each parameter's learning rate, such as alpha, beta1, beta2 and epsilon [52,53]. The experimental cases based on ten runs showed a better average RMSE value of 0.95 while using the ADAM optimizer's default parameters in Keras 2.3. AdaDelta showed an average RMSE value of 1.76 with high variance (Fig. 2C).
Learning rate (LR) was the fourth tuned parameter in the current study. The effects of different learning rates for the ADAM optimizer were evaluated; the default value of parameter LR in Keras is 0.001. Similarly, the default value for β1 is 0.9. In the same way, β2 is specified to be 0.999 and is specified to be 1e-07. An attempt was made to tune the learning rate with values of 0.1, 0.01, 0.001, 0.0001, and 0.00001. The box plot indicated that LR of 0.1, 0.01, and 0.001 showed good RMSE values of 0.9 and less variance without any significant difference. On the contrary, LR of 0.0001 and 0.00001 showed slightly lower RMSE values with higher variances (Fig. 2D). It was observed that the tuned LSTM model with four nodes, trained for 2000 epochs with ADAM optimizer having lr of 0.1, showed promising performance based on RMSE and computational efficiency in the current investigation.

Autoregression Model
An AutoRegression (AR) based model was used in the present investigation to compare the forecasting performance of the LSTM model (Fig. 3). Autoregression worked on the dataset without trend and seasonality, and therefore, the time series data preprocessed in Section 3.5 was utilized for autoregression. The AR package was used from the statsmodel library using Python to develop Model 3. The tuning of n, trend parameter, and the seasonal parameter was crucial to obtain the optimized forecasting model. The k values ranging from 1 to 24 months were given to model using a loop to obtain the optimized autoregression model. The optimized value for k was 12 steps or previous months value. The trend parameter and seasonal parameter were n and True, respectively. The average RMSE and R 2 values for the autoregression model were 0.65 and 0.76, respectively for all the 5 basins, which was significantly good.  The MCC summarizes the direction and degree of linear relations between actual and modeled datasets. The correlation coefficient can take values between −1 (perfectly negative correlation) through 0 (no correlation) to +1 (perfectly positive correlation). The MCC formula to compute the correlation coefficient is given in Eq. (14).
Here, N represents the number of pairs of data. The terms X and Y are parameters.
RMSE is a method to calculate the error or accuracy in predicting models based on standard deviation Eq. (15). The final output is given in a standard deviation of the error's magnitude, as per Eq. (15); the individual calculations are outputted as residuals based on [15].
Here, P_i is the i th LSTM predicted value, and Oi is the i th original value.
The MAPE method was used to calculate the prediction accuracy of the LSTM forecast. The calculation was based on the difference between the original values and values forecasted by the LSTM and dividing the original value difference. It was then multiplied by the number of observations and 100 to obtain the percentage error, Eq. (16) [19].
Here, A t represents actual value. Similarly, symbol Ft represents the forecasted value or the predicted value. MAD was used to calculate the dispersion of LSTM forecasted values, as per Eq. (17). A lower value of MAD indicates that the forecasted data values are concentrated closely together.
where Pi is the ith data value,P is the mean value, and n is the number of samples.
The NSE or efficiency coefficient test determines the magnitude between the variance of residual time series and variance of actual data, and its value ranges from -∞ to 1, see Eq. (18). An output near to one indicates higher model quality and reliability. A value below zero suggests that the model is not reliable. NSE test has been utilized in various GWS forecasting models [35,36,38,39]. (18) where y,ȳ, and y f are the actual time series, mean of the actual time series and forecasting series, respectively.

Trend Analysis of GWSC, TWSC and Rainfall
As described earlier, the total uncertainty based on the first residual, second residual and the standard deviation of the second residual was computed for TWSC (±1.2 mm/year). The uncertainty in SMC based on the mean monthly standard deviation from the three GLDAS models was computed as ±0.2 (mm/year). Therefore, the total uncertainty for GWSC was 1.5 (mm/year). Results with a confidence factor ≥ of 90% indicate decreasing annual TWSC and GWSC change for all the five sub-basins (Tab. 1). The analysis of rainfall data from 26 MET stations from 2009 to 2018 showed an interesting trend. Areas 1, 3 and 5 showed a slight increase in rainfall from 2009-2018; however, areas 2 and 4 showed no rainfall change for the same observation years. The rainfall period showed two peaks, one in winter [November-January] and another in summer [March-April]. A positive increasing trend of 0.2 mm/year was observed with a confidence factor ≥ of 80% when combined areas were taken into account. Based on METRain data, for Saudi Arabia, the average annual rainfall value increased from 52 mm for the year 2000 to 60 mm for the year 2015 [15] and 73.34 mm for the year 2018 in the present investigation. A positive trend in the long-term forecasted rainfall was also reported by [22,54].

Correlation Analysis Between SATRain, METRain, TWSC and GWSC
The MCC was performed to understand the relationship between GWSC, TWSC, SatRain and METRain for the five basins from January 2009 to December 2016 as per the common temporal dataset availability (Eq. (14)). There was a strong correlation between TWSC and GWSC; however, there was no correlation between TWSC and rainfall (SatRain and METRain) or GWSC and rainfall (SatRain and METRain). It is essential to mention that a significant association between the magnitudes of GWSC and rainfall cannot be expected. This is because the rainfall might be reallocated as surface runoff and evapotranspiration, thus altering the allocation of rainfall in terms of space and time and, consequently, the groundwater storage as the allocated volume [21,35]. The correlation coefficient between METRain and SatRain was significantly suitable for basin 1(0.72), basin 3(0.85) and basin 5(0.65); however, it was weak for basin 2(0.41) and basin 4(0.46).

Comparison with Other Studies
We have processed GRACE TWSC and GWSC for the KSA; therefore, we have compared the current investigation results with other studies (Tab. 2). It was evident based on comparison with other studies [15,17,18,21] that the TWSC and GWSC values observed in the present investigation were in suitable conformity with outcomes from other investigations regarding the scale of calculated uncertainties. The forecasting performance of the current study was compared with other benchmarks studies based on R2 and RMSE values [34][35][36][39][40][41] (Tab. 3). It was observed based on performance metrices that the present study showed better results.  [17] Current study [21] Current study [15] Current study [18] Current study

LSTM-Based Forecasting of GWSC and TWSC from Jul 2020 -Dec 2025
There was a possibility while predicting the future values that the LSTM model output may be uncertain as the model's output was fed back into it as input. Therefore, we initially forecasted TWS and GWS from July 2015 to June 2020 and compared them with the actual values based on the values of coefficient of determination, RMSE, MSE, MAPE, and MAD (Tab. 4). The developed LSTM model was likely to accurately estimate the possible future values of GWS and TWS from July 2020 to Dec. 2025, given its reliability (Fig. 4). The area of the basins with the respective volume of TWSC and GWSC from January 2003 to June 2020 (Tab. 5) shows that basin 1 has shown the maximum TWS volume withdrawn and GWS volume withdrawn at −5.387 (km 3 /year) and −3.206 (km 3 /year), respectively, from 2003 to 2020. While comparing the rate of historical (2003-2020) extracted groundwater (GWS) with the forecasted (2020-2025) rate of extracted groundwater (GWS), it was observed that basin three and basin one had shown higher rates −29.68% and −29.58%, while the other three basins had shown rates ranging from −8.89% to 13.66%.
In the absence of the data on groundwater wells in Saudi Arabia, the annual extracted groundwater dataset [55] was used to validate the GRACE-derived GWS for Saudi Arabia (2010-2016). The selection of this annual data for Saudi Arabia was justified by its distinctiveness and accessibility within the analysis duration. The correlation of GRACE GWS was performed with net extracted water based on MCC (see Eq. (14)). It was observed that the GWS GRACE estimations were significantly correlated (R = 0.87) with extracted groundwater data from [55], see Tab. 6.   However, the present investigation found that the depletion rate did not show steady or increasing trends until June 2020. LSTM-based forecasting until 2025 also suggested that there is a requirement for sustainable water resources planning (Tab. 7). Another interesting observation was that when the volume of extracted groundwater based on GRACE GWS data of Jan-June (2019) was compared with GRACE GWS data of Jan-June (2020), a slight decrease of 0.5% was observed in groundwater extraction. In other words, groundwater extraction was lesser in 2020 than in 2019 (Tab. 8). It might be related to the slowdown or lockdown effect due to the COVID-19 pandemic.

Computational Efficiency of the Present Investigation
During the model tuning, there were clear differences between the number of iterations and associated computational time required. It was often found that the best suited LSTM model in the present investigation used 3 sec/epochs, whereas the LSTM model developed by [35] consumed 85 sec/epoch. The optimized model took only 25% computational time compared to the model with 8000 epochs in 400 min.

Conclusions
The present investigation provides an understanding of the historical and projected GWS changes at a large-scale using GRACE data for Saudi Arabia. Deep Learning LSTM models were developed based on rigorous hyper-parameters tuning to forecast the water storage changes based on GRACE-derived GWS. The major hyper-parameters of LSTM tuned in the present investigation were (1) number of nodes, (2) number of training epochs, (3) choice of optimizer, and (4) learning rate. A walk-forward model evaluation using the hyper-parameters mentioned above was attempted for different configuration values. The optimized model took 25% computational time compared to the model, which took 8000 epochs in 400 min. The correlation coefficient, MAPE, MAD, NSE and RMSE values were applied to test the LSTM models' outcomes. GRACE-derived GWS indicated that all the five investigated basins were experiencing groundwater depletion rates of 18.17 km 3 /year. Future forecasting based on the developed LSTM model indicates that, from July 2020 to Dec 2025, the investigated five basins are likely to experience depletion of water at rates ranging from −7.78 ± 1.2 to −15.6 ± 1.2 for TWSC and −4.97 ± 1.5 to −12.21 ± 1.5 for GWS. The future scope of the present investigation is to add in-situ data when available to assess and improve the model performance.

Limitations and Learning Points of the Present Investigation
The significant limitations of the present study can be discussed with respect to the following points: (1) Data continuity and temporal availability of different datasets. GRACE data was available from 2002 to 2020 with data gaps in 2002 and 2017-2018 due to no data or no coverage.
(2) Different resolutions of the dataset, e.g., GLDAS resolution was 1.0 o , and GRACE-derived GWS and TWS were gridded to 1.0 o . Similarly, SatRain resolutions were 0.25 o and 0.10 o for TRMM and GPM, respectively. (3) GWS and TWS were the change rate values, whereas SatRain and METRain were the time series data. (4) In the absence of data about groundwater wells for entire Saudi Arabia, the annual extracted groundwater dataset [1] were used to validate the GRACE-derived GWS for the study area, which was available only from 2010 to 2016. (5) The computation of the LSTM best fit model requires 100 min for 2000 number of epochs, and an improvement to use a lower number of epochs is required in the future so that model computational efficiency can become better. (6) The reasons to choose LSTM in the present investigation are its capability to deal with the vanishing gradient problem and better control, flexibility and performance than traditional RNN. (7) The LSTM model has limitations such as requirement of high memory-bandwidth due to linear layers, more prone to overfitting, and too complex to apply dropout.