Rough set theory has been widely researched for time series prediction problems such as rainfall runoff. Accurate forecasting of rainfall runoff is a long standing but still mostly significant problem for water resource planning and management, reservoir and river regulation. Most research is focused on constructing the better model for improving prediction accuracy. In this paper, a rainfall runoff forecast model based on the variable-precision fuzzy neighborhood rough set (VPFNRS) is constructed to predict Watershed runoff value. Fuzzy neighborhood rough set define the fuzzy decision of a sample by using the concept of fuzzy neighborhood. The fuzzy neighborhood rough set model with variable-precision can reduce the redundant attributes, and the essential equivalent data can improve the predictive capabilities of model. Meanwhile VFPFNRS can handle the numerical data, while it also deals well with the noise data. In the discussed approach, VPFNRS is used to reduce superfluous attributes of the original data, the compact data are employed for predicting the rainfall runoff. The proposed method is examined utilizing data in the Luo River Basin located in Guangdong, China. The prediction accuracy is compared with that of support vector machines and long short-term memory (LSTM). The experiments show that the method put forward achieves a higher predictive performance.

Accurate rainfall runoff prediction is of great significance for the protection and management of water resource. As the hydrologic evolution process holds the properties of nonlinearity and uncertainty, exact rainfall runoff prediction is extremely difficult. Currently, much attention for predicting rainfall runoff is still paid to establishing feasible and accurate model. Many machine learning methods have been exploited for time series forecasting, such as artificial neural networks [

Rough set theory, proposed by Pawlak [

The preservation of neighborhood structure and order structure is very important for feature extraction and knowledge discovery. For the numerical data processing, such as rainfall runoff prediction, discretization only can deal with individual data, but ignores the internal relationship among data. Neighborhood rough set (NRS) fully considers the neighborhood attributes contained within the data and extends the application scope of rough set. The hydrological rainfall runoff, a special time series data studied in this paper, bears the characteristics of long time span, incomplete and long duration, which bring a high degree of difficulty to the investigation. In the paper, NRS is applied to the rainfall runoff prediction for the first time, which achieves the tradeoff between prediction accuracy and prediction efficiency.

Both feature selection and pattern discovery depend on the scope of data exploitation. Small scope data contains the local feature or pattern, while the large scope data includes its global equivalents. The selection of local or global data depends on the application requirement. The rainfall runoff prediction should follow the double requirements, the forecast should be consistent with the historical data, and it should be accurate in future periods. These two requirements are interrelated. The consistency with historical data enhances the future trend prediction, and accurate future prediction enriches the process of historical data processing. In detail, the rainfall runoff forecast in this paper is to forecast the trend of future multiple periods and use the local data of multiple period to forecast the rainfall runoff.

The contributions of the paper are as follows:

The remainder of the paper is organized as follows: in Section 2 the relevant theories are discussed, then a novel rainfall runoff prediction model based on VPFNRS is proposed, next the experimental results and analysis are given, while the last section contains the conclusion and future work proposals.

Rough set theory has been widely used in time series prediction [

Time series forecasting plays an important role in the field of hydrology. Recent trends of time series forecasting are based on data-driven techniques such as Artificial Neural Networks and rough sets. Reference [

The discrete rough set classifier was used to ascertain the threshold of each attribute contributing to landslide occurrence, based upon the knowledge database [

Rough set theory provides us with another important method of data preprocessing. However, the application of the rough set theory in rainfall runoff forecasting has not been widely studied. In addition, classical rough set theory is based on the equivalence relation, so it is only applied to the data sets with symbolic attributes. However, in practice, many data sets are numerical, so it is necessary to discretize the numerical data. This can lead to the loss of a large amount of information, leading to a decline in knowledge discovery ability. Neighborhood rough set model and fuzzy rough set model are two important methods to resolving this problem. Both models have their own advantages in rough approximation. On this basis, Cheng et al. [

This paper discusses a rainfall runoff prediction method based on the variable-precision fuzzy neighborhood rough set. In order to verify the performance of the models, SVM and LSTM model are introduced for comparison.

Rough set theory introduced by Pawlak [

For

where

According the _{i}

The fuzzy decision _{i}

Obviously,

where

For

Also,

where

It can be deduced that the following expression holds. If

_{i}

where

_{i}

Apparently,

Long Short-Term Memory network (LSTM) is a special type of Recurrent Neural Network (RNN). LSTM compensates for the deficiency of RNN in gradient diffusion and explosion. The LSTM also alleviates the insufficient for long short-term memory. The LSTM model replaces the RNN cells in the hidden layer with the LSTM cells, so that they remain in the long-term memory cells. The structure of a standard LSTM is shown as

In the paper, LSTM is used for the prediction. The first step in LSTM is to determine which information should be discarded from the cell state. This task is accomplished by the forgetting gate layer. The forgetting gate reads the output of the previous cell _{t −1}, the input of the current cell _{t}_{t −1}, where “1” stands for complete retention and “0” shows complete abandonment.

where _{t}_{f}

where _{t}_{t}

Finally, output values are achieved based on the current cell state. The following equations represent this step.

where _{t} determines which parts of the cell state are exported by running a sigmoid layer. _{t}_{t} and a

SVM has been introduced as a classification method of solving linear and non-linear problems in [_{i}_{i}

where

The classification ability of SVM is decided by the training error and classification boundary. SVR achieves the minimization of objective regression function, as

where

Subject to

Non-linear SVM regression need estimate

The main study area (see ^{2}. The experimental data come from 4 hydrological control stations along the Nangao reservoir in the Luo River Basin. The original data included daily rainfall, evapotranspiration and runoff observed at four hydrological stations between 1994 and 2003. The original data spanning 10 years were divided into 2 groups. The data from the first 8 years were used as the training sample set, and the data from the latter 2 years were the test sample set. Annual average rainfall is about 2330 mm (during the flood season from April to September, rainfall is about 1890 mm, 81% of the total annual precipitation). The variance of annual precipitation is about 1090 ^{2}. The average streamflow into the Nangao Reservoir is 8.76 ^{3}/

The main goal of this paper is to develop a rainfall runoff prediction model for forecasting the streamflow of the Nangao Reservoir. It is well known that the appropriate input variables contain important features about the complex autocorrelation among data set. In general, rainfall(precipitation), previous flows, evaporation, temperature, etc. are associated with the rainfall runoff model. Most studies used rainfall and previous flow as inputs. In this study, the precipitation, previous flows and evaporation are selected as input variables, and the discharge

In this paper,

In order to ensure that all variables receive equal weighting during the training process, it is necessary to normalize the raw data (precipitation) to the interval from −1 to 1 or from 0 to 1. Therefore, the presented method processes the scaled data, and the output data are returned to their original scale. The data are normalized between 0.1 and 0.9. The scaling and reverse scaling equations are as follows:

where

In this study, a VPFNRS model is developed to simulate the streamflow at the Nangao Reservoir. The streamflow responds to the precipitation and runoff from the rainfall-runoff process. The spatial distribution of precipitation is not considered. The average rainfall data were calculated using the Thiesssen polygon method. This data was used as the input data of the VPFNRS model. In the VPFNRS model, we use a concept,

where the function

The work of the presented model includes two stages. Firstly, the variable-precision fuzzy neighborhood rough set theory is used to reduce the input data of the model, so as to simplify the input and improve the efficiency. Secondly, taking the reduction set as the input, the decision rules are extracted based on fuzzy decision making. The reasonable prediction of rainfall runoff is realized. The processing workflow of the VPNFRS prediction model is sketched as pseudocode in Algorithm 1.

In this section four stages, including data preprocessing, training the proposed model, calibrating model and testing are used for developing a rainfall runoff prediction model based on VPFNRS. The implementation process is shown in Algorithm 1. The code of the VPFNRS model is written in the Python language, and the VPFNRS model is trained on 8 years of data (Year, 1994–2001) for every case. The training data is divided into training set and verification set according to the ratio of 7:3. The original data are normalized according to the formula 22–23 before training the model. Then, all data are discretized to n intervals. Referring to literature [

Obtaining the optimal prediction depends on having suitable model parameters. The “

N | RMSE | R | MBE | CE |
---|---|---|---|---|

2 | 7.238 | 0.9850 | −0.4852 | 0.9695 |

3 | 8.9315 | 0.9767 | −0.3713 | 0.9536 |

5 | 3.4328 | 0.9974 | 0.2731 | 0.9931 |

7 | 6.2035 | 0.9889 | −0.6037 | 0.9776 |

To test the performance of the model, this experiment simulates the rainfall-runoff process from DOY (Day of year) 206 (July 25, 2002) to DOY 288 (October 15, 2002) using the VPFNRS model, the SVM model and the LSTM rainfall runoff model. The simulated results of the rainfall-runoff process for the three models are compared, and the advantages and disadvantages of the three models are analyzed. ^{3}/^{3}/^{3}/

Method | RMSE | R | MBE | CE |
---|---|---|---|---|

LSTM | 20.8218 | 0.9283 | 10.9724 | 0.6307 |

SVM | 30.6097 | 0.8764 | 7.2916 | 0.2019 |

VPFNRS | 10.1161 | 0.9722 | 2.5036 | 0.9128 |

^{2}. In addition, in DOY 216, the precipitation is 107.7 ^{2}, and the absolute error of the three models is respectively 168.9 LSTM, 117.6 SVM and 28 VPFNS model. It was observed that the two days were heavy rainfall days. In other words,the error of the three models is relatively large in the days of heavy rainfall, but the error of the VPFNRS model is still the smallest. The reason for the increased error may be that the underlying surface factor is not considered. Runoff is formed only when precipitation falls on the underside of the basin. The difference of the underlying surface will directly affect the streamflow. Therefore, the underlying surface and climate factors will be considered to improve the prediction efficiency of the model in the later research.

Accurate rainfall runoff prediction is a critical issue in the area of hydrological information processing. In the paper the fuzzy neighborhood rough set is introduced for predicting the rainfall runoff. The proposed method is able to forecast the future rainfall runoff by providing a deep simulation of the essential hydrological factors. The experiments show that the approach presented here could accurately predict the rainfall runoff.

The rainfall and runoff data of different historical length are exploited for predicting the runoff of future variable-length. So the given algorithm has an adjustable predictability, under one framework the same historical data is employed in multiple ways. Meanwhile, the streamflow in different future times can be accurately predicted.

The rainfall runoff prediction method can be extended to similar climactic zones. For different hydrological conditions, it needs to be rectified for predicting the runoff in the new zone. Additionally, the breadth of available historical data for prediction is limited, in the experiment, days beyond 5 would lead to a deterioration in the predictive performance.