Atmospheric temperature forecast plays an important role in weather forecast and has a significant impact on human daily and economic life. However, due to the complexity and uncertainty of the atmospheric system, exploring advanced forecasting methods to improve the accuracy of meteorological prediction has always been a research topic for scientists. With the continuous improvement of computer performance and data acquisition technology, meteorological data has gained explosive growth, which creates the necessary hardware support conditions for more accurate weather forecast. The more accurate forecast results need advanced weather forecast methods suitable for hardware. Therefore, this paper proposes a deep learning model called BL-FC based on Bidirectional Long Short-Term Memory (Bi-LSTM) Network for temperature modeling and forecasting, which is suitable for big data processing. BL-FC consists of four layers: the first layer is a Bi-LSTM layer, which is used to learn features from continuous temperature data in forward and backward directions; the other three layers are fully connected layers, the second and third layers are used to further extract data features, and the last layer is used to map the final output of temperature prediction. Based on the meteorological data of 19822 consecutive hours provided by Belmalit Mayo Weather Station in Mayo County, Ireland, the data set is established by using the sliding window method. Compared with other three different deep learning models, including Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN), the BL-FC model has higher short-term temperature prediction accuracy, especially in the case of abnormal temperature.
The definition of temperature in meteorology is a physical quantity that expresses the degree of coldness and heat of the air. The statistics and prediction of temperature are essential in agriculture [
Traditional meteorological predicting research mainly focuses on establishing temperature predicting models based on several specific meteorological factors. For example, Wang et al. [
Meteorological data is typical sequential data. So, time series prediction methods have been used to deal with many weather factors prediction problems such as wind speed [
Deep learning network is suitable for big data learning by its iterative and deep structure. There are many sorts of deep learning, such as, RNN, GRU, and LSTM etc. RNN uses a loop structure to capture the feature of sequential data. The special loop structure can reserve information from a long context window. GRU and LSTM are improved versions of RNN, using a special gated structure to overcome the shortcomings of RNN. In recent years, with the development of computing power, modification of network architecture, LSTM, and its variants have made a big breakthrough in the prediction of time series and can handle a lot of tasks. For example, Chen et al. [
Although LSTM overcomes the defects of RNN, it only processes time series in forward direction. Temperature data fluctuates slowly over time, the temperature of a single time point is affected by both past and future temperatures. Therefore, this paper adopts a model based on Bi-LSTM to solve the problems above. The structure of Bi-LSTM ensures that it can process data in both forward and backward directions. The experiments were carried out on the data provided by the Mayo Weather Station. Four time-series prediction models are used to compare the model performance. The prediction results indicate the Bi-LSTM-Based model outperforms the other three models in every metric we adopted.
The data used in this paper is the continuous 198112 hours of meteorological data provided by the Moyo Weather Station. We have selected 15 meteorological features as input data. They are latitude, longitude, precipitation amount, temperature, wet bulb temperature, dew point temperature, vapour pressure, relative humidity, mean sea level pressure, mean hourly wind speed, predominant hourly wind direction, sunshine duration, visibility, cloud ceiling height and cloud amount.
Data preprocessing is an essential step for deep learning models, proper data preprocessing can not only improve the accuracy of predictions, but also speed up the training process. Visibility, cloud ceiling height, and cloud amount have missing values. Therefore, we fill the missing values with the mean of visibility, cloud ceiling height, and cloud amount. Besides, different features are in different magnitudes, which will increase the difficulty of model training. In this paper, we use normalization to solve the problem. The formula of normalization is shown in
After the normalization, standardization is used to fit the data to a normal distribution, which can accelerate the speed of gradient descent during the backpropagation. The specific formula is shown in
Bi-LSTM is a variant of LSTM, so the input data needs to be time series. The data provided by Moyo Weather Station is discrete and cannot be directly inputted into the model. Here we adopt a sliding window method to transform the discrete data into time series. Two windows are used here, the first one is for sample data, and the second one is for target data. The length of the sliding windows can be adjusted as needed. Generally speaking, the longer the sample data is, the more accurate the predicted result is. In this paper, we uniformly use a 4:1 sample target ratio to construct datasets. The step length of the sliding window is set to 1, which means that after the sample set and result set are constructed, the window moves forward by one hour. It not only ensures the sufficiency of the dataset but also conforms to the continuous regularity of time series. The structure of two sliding window: sample data window
In
By using the sliding window method above, we can transform the raw data into different datasets according to different sample target ratios.
The dataset provided by Moyo Weather Station contains 198112 consecutive hours of weather and temperature data. Two visualizations of the dataset are made based on the dataset.
From
According to
In conformity with the previously analyzed temperature characteristics, we propose a short-term prediction model called BL-FC based on Bi-LSTM. Common LSTM-Based [
To better understand BL-FC model, we briefly introduce LSTM and its improved version Bi-LSTM.
Traditional RNN [
Based on RNN, LSTM uses a special gate structure to solve the problem of vanishing or exploding gradient. The structure of an LSTM cell is shown in
where
From the formula above, we can conclude that by using these three gates, LSTM can preserve or discard pre-order data as needed. However, LSTM only preserves data in the forward direction, so the feature of sequence data can only be extracted and learned in one direction. This method ignores the important information from the backward direction. Therefore, when faced with abnormal temperatures, these LSTM-Based models often perform not that well. Bi-LSTM is the combination of two LSTM in opposite directions. It can preserve information of time series in two directions and enable additional training by traversing the input data twice. The final output of Bi-LSTM is also the combination of the forward and backward LSTM, which fully reflects the correlation between time series. Additional training and bidirectional feature extraction ensure that Bi-LSTM has better performance in the prediction of fluctuating temperature.
The training and prediction procedure of BL-FC is described as follows:
Training input: Normalized and standardized data set
Step 1: Use the sliding window method to construct the input data set into sample data set like matrix
Step 2: Determine parameters like units, learning rate, epoch, batch size, optimization algorithm, and the length of target and sample sequence, then build models on Keras.
Step 3: Train the model and validate it using the train set and validation set.
Step 4: Evaluate the prediction result of the model.
Training output: A trained temperature prediction model.
Prediction input: Test sample
Prediction output: The predicted value
In this section, we will introduce our four experiments based on different time series prediction models and explain our performance metric and parameter settings.
Model evaluation refers to the process of using different performance metrics to evaluate the performance of models according to the specific situation. Appropriate performance metrics reveal the accuracy of the prediction result thoroughly, which is instructive to the construction and refinement of the model. Common performance metrics are RMSE, MSE, MAE. This paper utilizes MSE and MAE as performance metrics.
Note the mean value of the values to be fitted is
Mean Squared Error (MSE) is the average of the squared error. The specific formula is shown in
Mean Absolute Error (MAE) is the average of all absolute errors. The formula is shown in
In simple terms, MSE is easy to calculate, and MAE is more robust to abnormal points.
According to the analysis in the experiment dataset, we can conclude that time series length and forecast hours are determined by the sample target ratio. In this paper, we adopt a 4:1 ratio uniformly. We designed four sets of experiments by changing the length of the time series in four different time series prediction models. The specific ratio of time series length and forecast hours are 16h:4h, 24h:6h, 32h:8h, and 40h:10h.
Learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. An appropriate learning rate is crucial for finding the optimal weights of the model. A small learning rate may result in a long training process, whereas a large learning rate may lead to an unstable training process. In this paper, the initial learning rate is set to be 0.01 and automatically adjusted during the training process.
There are also several parameters to be determined in the BL-FC model. The number of units in the Bi-LSTM module and fully connected module represents the dimensions of output data. Through experiments, the number of units in the first three layers are set to be 30, 20, 10 respectively.
The activation function is the basis for an artificial neural network to extract and learn complex features. These functions can bring non-linear properties to the network, which allow models to fit all kinds of data. Relu and Tanh are the two most commonly used activation functions. In general, Relu has a wider application range and a better performance. However, through experiments, our model performs best when using tanh on the Bi-LSTM layer and Relu on the fully connected layers.
Model optimization is the process of adjusting hyperparameters in order to minimize the cost function by using optimization algorithms. A good optimization algorithm can speed up the process of training and can even get a better result. In our experiment, the Adam algorithm with the fastest converge speed [
Other parameters are determined through experiments. Epoch is set to be 30, the batch size is 100, and the ratio of the train set, validation set, and test set is 8:1:1.
In order to show the superiority of our model, four experiments with different time series length and forecast hours are carried out on RNN, GRU, LSTM, and BL-FC model.
The prediction results of the four experiments are shown in
Model | MSE | MAE | ||||||
---|---|---|---|---|---|---|---|---|
16:4 | 24:6 | 32:8 | 40:10 | 16:4 | 24:6 | 32:8 | 40:10 | |
BL-FC | 0.9761 | 1.0560 | 1.3078 | 1.4875 | 0.7147 | 0.7489 | 0.8372 | 0.8956 |
LSTM | 1.0762 | 1.1301 | 1.3447 | 1.4975 | 0.7535 | 0.7777 | 0.8430 | 0.8967 |
GRU | 1.2365 | 1.1327 | 1.3944 | 1.5317 | 0.7958 | 0.7777 | 0.8707 | 0.9123 |
RNN | 1.8058 | 1.7205 | 2.0098 | 2.5963 | 1.0108 | 0.9730 | 1.0605 | 1.2028 |
Just as we analyzed before, when facing long-term dependencies, RNN performs the worst among the four temperature prediction models. When the ratio of time series length and forecast hours is 40:10, the MSE and MAE of the temperature prediction result reach 2.5963 and 1.2028 respectively, which is far from the 1.4875 and 0.8956 predicted by BL-FC. It indicates that the model based on RNN is not suitable for temperature prediction.
Contrary to RNN-Based model, our BL-FC model outperforms other experimental models in every metric we adopt, followed by LSTM and GRU. When the time series length and forecast hours is set to 16:4, MSE and MAE reach 0.9761 and 0.7147 respectively, which is the best prediction result among all the experiments. As the ratio of time series length and forecast hours increases, although the advantage gradually decreases, BL-FC model still better than RNN, LSTM, and GRU-Based models.
In order to display the prediction results of the model intuitively, we randomly select the predicted value for 500 h and compare it with the actual temperature.
From
This paper proposes BL-FC short-term local temperature deep learning modeling method for complex meteorological system, which can make full use of massive meteorological data to build a more accurate model. In order to obtain the required sample data, the sliding window method is used to process 1981122 consecutive hour data of 15 meteorological factors provided by Mayo Weather Station. Then, the processed data is divided into training set, testing set, and validation set according to the ratio of 8:1:1. Based on these data, four typical deep learning models are studied, which are BL-FC, LSTM model, GRU model and RNN model. The experimental results show that the MSE and MAE of BL-FC are always superior to the other three models. Additionally, by comparing the prediction results of 500 consecutive hours of actual temperature data randomly selected, we can also see the advantages of BL-FC model: faster response to temperature changes and more accurate prediction of abnormal temperature. After determining the appropriate time series length and prediction hours, the BL-FC model can get accurate prediction results, and respond quickly when the temperature changes. Therefore, the temperature modeling method BL-FC can make use of meteorological big data to structure a more accurate local temperature prediction method, and make the model more accurate through continuous learning of data, which has a good application prospect in regional temperature prediction.
The length of prediction sequence selected in this study is 10. In further work, attention mechanism is introduced into our BL-FC model to further improve the accuracy of the model and predict regional long-term temperature. Moreover, the research method of this paper can be further applied to the prediction of other meteorological factors, so as to improve the accuracy of other factors and make the deep learning method better applied to the meteorological field.