Time series forecasting plays a significant role in numerous applications, including but not limited to, industrial planning, water consumption, medical domains, exchange rates and consumer price index. The main problem is insufficient forecasting accuracy. The present study proposes a hybrid forecasting methods to address this need. The proposed method includes three models. The first model is based on the autoregressive integrated moving average (ARIMA) statistical model; the second model is a back propagation neural network (BPNN) with adaptive slope and momentum parameters; and the third model is a hybridization between ARIMA and BPNN (ARIMA/BPNN) and artificial neural networks and ARIMA (ARIMA/ANN) to gain the benefits of linear and nonlinear modeling. The forecasting models proposed in this study are used to predict the indices of the consumer price index (CPI), and predict the expected number of cancer patients in the Ibb Province in Yemen. Statistical standard measures used to evaluate the proposed method include (i) mean square error, (ii) mean absolute error, (iii) root mean square error, and (iv) mean absolute percentage error. Based on the computational results, the improvement rate of forecasting the CPI dataset was 5%, 71%, and 4% for ARIMA/BPNN model, ARIMA/ANN model, and BPNN model respectively; while the result for cancer patients’ dataset was 7%, 200%, and 19% for ARIMA/BPNN model, ARIMA/ANN model, and BPNN model respectively. Therefore, it is obvious that the proposed method reduced the randomness degree, and the alterations affected the time series with data non-linearity. The ARIMA/ANN model outperformed each of its components when it was applied separately in terms of increasing the accuracy of forecasting and decreasing the overall errors of forecasting.

Forecasting refers to the process of examining the behavior of a particular phenomenon in the past to predict what can happen for it now and in the future based on events from the past and present [

The time-series prediction is among the critical areas where artificial neural networks (ANNs) and conventional neural networks (CNNS) are used heavily as a substitute for the statistical methods that are applied for the time-series prediction [

Recently, scholarly attention focused on predicting time-series in many statistical models [

Generally, artificial neural networks (ANNs) are an important method in Artificial Intelligence (AI), particularly in machine learning [

In many cases, it has been found that it is appropriate to apply hybrid models to deal with the linear and non-linear qualities. In [

For economic indicators, CPI can be a means of regulating income. It serves to predict future value indices to ensure accuracy of the data in order to imitate the purchase patterns of the consumers in the Yemeni market. Cancer is one of the very dangerous and malignant diseases which is one of the main causes of death all over the world including approximately 14 million deaths related to cancer in 2014 [

In this research, a novel forecasting time series method based on a hybrid model between BPNN and statistical models is proposed. The proposed method is applied to predict the consumer's indices in the Republic of Yemen. The prices cover the period from January 01, 2005 until December 01, 2014. The proposed methods is also used to predict the number of people inflicted with cancer diseases in Ibb governorate, Yemen, during the period January 01, 2010 to December 01, 2016.

Many studies have used statistical methods and ANNs forecasting of time series [

A popular hybrid model is presented in [

For sales forecasting, hybridization between ARIMA and BPNN models is proposed in [

Another hybrid-forecasting model for a short-term prediction is presented in [

In the current study, a forecasting method based on ARIMA/BPNN and ARIMA/ANN models is proposed. Momentum parameters and adaptive slope with basic BPNN to accelerate learning to update weights are used. The main contribution is the hybridization of the ANN model and ARIMA model. This shows great improvement in the forecasting accuracy due to the use of the network's output as feedback to the input of the neural network along with the actual output values. The inputs come from the ANN and the ARIMA models, in addition to the use of parallel architecture for all the inputs, which is the basis for training neural networks. This paper is organized as follows. The proposed forecasting method is given in section 2, the computation results and comparative study are outlined in sections 3 and 4 respectively, and the conclusion and future work are discussed in section 5.

This section outlines the details of the proposed hybrid forecasting method. The hybridization between BPNN and ARIMA models is presented in the following sub-sections.

The layers number, neurons number for every layer, and the weighted connection between neurons determine the neural network topology. The determination of the topology is among the highly important steps in the development of a model for any given problem [

Network architecture consists of three layers. These are (i) the hidden layer (ii), the input layer, and (iii) the output layer. These layers are completely linked together through interfaces that carry weights. The proposed architecture of the BPNN model is determined through testing several different compositions and trade-offs between them through several statistical standards, including mean absolute error (MAE), MSE, root mean square error (RMSE), and MAPE, between inputs and outputs as illustrated in

Architectural BPNN | MAE | MSE | RMSE | MAPE |
---|---|---|---|---|

3-2-1 | 0.8163305 | 0.0088731 | 0.0941974 | 0.0268588 |

3-3-1 | 1.3150517 | 0.0142940 | 0.1195576 | 0.0713349 |

3-4-1 | 1.9474456 | 0.0211678 | 0.1454918 | 0.12322 |

3-5-1 | 1.3173369 | 0.0143188 | 0.1196615 | 0.08127 |

3-6-1 | 1.3285108 | 0.0144403 | 0.1201679 | 0.05837 |

3-7-1 | 1.3066810 | 0.014203 | 0.1191765 | 0.0552605 |

3-8-1 | 1.9075543 | 0.0207342 | 0.1439940 | 0.10835 |

3-9-1 | 2.1373260 | 0.0232318 | 0.1524198 | 0.13091 |

4-2-1 | 1.5412934 | 0.0167531 | 0.1294341 | 0.08254 |

4-3-1 | 1.2795217 | 0.0139078 | 0.1179315 | 0.06858 |

4-4-1 | 0.5090978 | 0.0055336 | 0.0743886 | 0.03057 |

4-5-1 | 0.3515621 | 0.0038213 | 0.0618168 | 0.0144807 |

4-6-1 | 0.7515189 | 0.0081686 | 0.0903807 | 0.0435242 |

4-7-1 | 0.6480978 | 0.0070445 | 0.0839317 | 0.03322 |

4-8-1 | 1.3743152 | 0.0149382 | 0.1222219 | 0.08069 |

4-9-1 | 1.6005158 | 0.0173969 | 0.1318973 | 0.0706621 |

5-2-1 | 1.0009172 | 0.0108795 | 0.1043050 | 0.0369448 |

5-3-1 | 1.1786630 | 0.0128115 | 0.1131881 | 0.049 |

5-4-1 | 0.4794347 | 0.0052112 | 0.0721889 | 0.01573 |

5-5-1 | 0.2924130 | 0.0031784 | 0.0563773 | 0.01193 |

5-6-1 | 1.0031652 | 0.0109039 | 0.1044220 | 0.0356168 |

5-7-1 | 1.3945108 | 0.0151577 | 0.1231167 | 0.08307 |

5-8-1 | 2.0428695 | 0.0222051 | 0.1490137 | 0.13295 |

5-9-1 | 1.3861925 | 0.0150673 | 0.1227489 | 0.071249 |

The proposed network includes 5 neurons for the input layer and one neuron for the output layer. _{k−4}, A_{k−3}, A_{k−2}, A_{k−1}, A_{k}, denote the inputs of the network, and A_{k+1} denotes the output of the network. An appropriate hidden layer number in this architecture by continuous statistical experimentation is 5 elements.

The Box-Jenkins method is a popular time series forecasting. It is also called the ARIMA model [

The flexibility because of the inclusion of autoregressive and moving average terms.

Based on the world decomposition theorem, the ARIMA model can approximate a stationary process.

Practically, finding the approximation may not be an easy task.

On the other hand, the construction of the ARIMA model needs a high-level of experience more than statistical methods such as regression.

Box-Jenkins analysis indicates a methodical process of identifying, fitting, checking, and using ARIMA time series models [

Residuals or prediction errors are the real values subtracted from the estimated values of what is called the White Noise series. The SPSS statistical package is utilized to recognize the suitable model for the data. SPSS uses autocorrelation function and partial autocorrelation function. ARIMA (0,1,0) is the identified model for this data as it successfully estimated parameters of the significance test, in addition to its success in the residual analysis test.

Ljung-Box Q (18) | Model Fit statistics | |||||||
---|---|---|---|---|---|---|---|---|

Statistics | DF | Sig. | Normalized BIC | R-squared | RMSE | MAPE | MAE | Stationary R-squared |

23.202 | 18 | 0.183 | 0.898 | 0.998 | 1.533 | 1.019 | 1.104 | 1.000E-013 |

Data standard cancer patients shows that model ARIMA (1,0,0) achieved fewer measurement values of model fit statistics, and lower values of these metrics whenever the model is used in more accurate prediction. From the sample of ACF and PACF between cancer patients series model, it is noticed that residuals follow the white noise pattern, which is a confirmed value parameter by autocorrelation and partial autocorrelation functions of residuals within a period of confidence 95%. It means that it is independent and naturally distributed with an arithmetic mean of (0) and variance of (2σ).

Ljung-Box Q (18) | Model Fit statistics | |||||||
---|---|---|---|---|---|---|---|---|

Statistics | DF | Sig. | Normalized BIC | R-squared | RMSE | MAPE | MAE | Stationary R-squared |

24.952 | 16 | 0.071 | 3.783 | 0.197 | 6.122 | 27.450 | 4.630 | 0.675 |

A hybrid model is any combination of two or more independent models. The purpose of hybridization is to raise the prediction accuracy of the model. The Box-Jenkins model deals with linear characteristics of the time series, while neural networks deal with the nonlinear characteristics. The ARIMA/BPNN hybrid model is used to find an efficient way of predicting and defined as in

As the reason for constructing a hybrid model is to have better forecasting, the main point here is to find out how to combine independent models to produce the best possible results. The proposed model is classified into two-hybrid model schemes as follows:

The success of both the ARIMA and the BPNN models has been proven to tackle linear and nonlinear domains. Nevertheless, none of them is considered a universal model suitable for all circumstances [

The proposed methodology includes two main phases. In the first phase, the linear part of the problem is analyzed based on time series data as input of the ARIMA model as the ARIMA model cannot capture the nonlinear data structure. The residuals of the linear model will enclose nonlinearity information. Therefore, in the second step, the BPNN model is developed and the inputs of BPNN are a product of constructed ARIMA model. This product of the ARIMA model can include residuals, outputs estimations, or predictions. The BPNN model produces the final hybrid model output.

In the hybrid ARIMA/ANN model scheme, the input of the ANNs model and ARIMA model are time-series data. Besides, the output of these two constructed models enters a new hybrid ANNs model [3.5.1] with feedback input (y-1) coming from the output (y). The final output of this hybrid model scheme is produced from the new hybrid ANNs model as shown in

_{1} represents the input that comes from the ANN model, Z_{2} represents the second inputs that comes from the ARIMA model, and y-1 represents the feedback data.

This section presents the dataset description that was used for conducting the experiments. Several experiments and comparative analyses were performed to evaluate the performance of the proposed forecasting method. The obtained results and related discussions are presented below.

Two datasets are used to exhibit the effectiveness of the proposed forecasting methods. The first is the CPI dataset in Yemen from January 01, 2005 until December 01, 2014. The second is a newly collected cancer patient's dataset from different hospitals in Ibb governorate, Yemen, from January 01, 2010 to December 01, 2016. The time series have different statistical characteristics.

No. | Coefficient of variation | Variability | Std deviation | Mean | Minimum | Maximum |
---|---|---|---|---|---|---|

120 | 0.318464 | 1134.584 | 38.12723 | 119.7222 | 66.532 | 194.181 |

No. | Coefficient of variation | Variability | Std deviation | Mean | Minimum | Maximum |
---|---|---|---|---|---|---|

84 | 0.403483 | 84.87698 | 11.34164 | 22.8333 | 6 | 54 |

The assessment of the performance of forecasting for different models takes into consideration the fact that each dataset is divided into two samples of training and testing. The input and output datasets are real values and elementary weights are chosen randomly.

The proposed hybrid model is built in two hybrid models. The results obtained using the first hybrid model for both training and testing are presented in

Phase type | The architecture of the BPNN model | MAE | MSE | RMSE | MAPE |
---|---|---|---|---|---|

[5‐5‐1] | 1.17421 | 0.0114 | 0.10677 | 1.03845 | |

[5‐5‐1] | 0.83383 | 0.15839 | 0.39799 | 0.00463 |

The actual values and the forecast values for the CPI dataset are compared to each other as illustrated in

Phase type | The architecture of the BPNN model | MAE | MSE | RMSE | MAPE |
---|---|---|---|---|---|

[5‐5‐1] | 0.8202 | 0.0089 | 0.0944 | 0.7681 | |

[5‐5‐1] | 0.6340 | 0.094 | 0.3067 | 0.0035 |

The results for the prediction are shown in

The obtained results of both training and testing phases for the cancer patient's dataset are given in

Phase type | The architecture of the BPNN model | MAE | MSE | RMSE | MAPE |
---|---|---|---|---|---|

[5‐5‐1] | 1.33303 | 0.14811 | 0.38486 | 0.06669 | |

[5‐5‐1] | 1.6956 | 0.2992 | 0.547 | 0.05756 |

In addition, the prediction results produced by the hybrid ARIMA/BPNN model of the cancer patients’ data set are given in

The results obtained using the hybrid ARIMA/ANN model scheme for the cancer patients’ dataset for both the training and the testing are presented in

Phase type | The architecture of the BPNN model | MAE | MSE | RMSE | MAPE |
---|---|---|---|---|---|

[5‐5‐1] | 0.6483 | 0.01440 | 0.12003 | 0.0280 | |

[5‐5‐1] | 1.11182 | 0.2058 | 0.4537 | 0.040369 |

Prediction results of the cancer patients’ dataset using the hybrid ARIMA/ANN model scheme are shown in

Comparative analysis of individual models was performed to demonstrate the efficiency of the proposed models. The MAE, MSE, RMSE, and MAPE are selected to be the measures for the accuracy of forecasting. CPI data for the period from February 01, 2013 to December 01, 2014 and cancer patients’ data for the period January 01, 2015 to December 01, 2016 are used in this study.

Data | Actual values | The Proposed BPNN | ARIMA (0.1.0) | Hybrid ARIMA/BPNN model scheme | Hybrid ARIMA/ANN |
---|---|---|---|---|---|

01/02/2013 | 165.711 | 165.002 | 165.42 | 165.201 | 165.319 |

01/03/2013 | 166.580 | 165.900 | 167 | 166.379 | 166.779 |

01/04/2013 | 167.547 | 167.106 | 168.59 | 167.592 | 168.094 |

01/05/2013 | 168.106 | 168.321 | 170.19 | 168.617 | 169.123 |

01/06/2013 | 169.515 | 169.545 | 171.81 | 169.628 | 169.892 |

01/07/2013 | 170.056 | 170.468 | 173.45 | 170.550 | 170.728 |

01/08/2013 | 171.263 | 171.397 | 175.1 | 171.432 | 171.633 |

01/09/2013 | 172.364 | 172.330 | 176.76 | 172.258 | 172.554 |

01/10/2013 | 172.802 | 173.269 | 178.45 | 173.046 | 173.603 |

01/11/2013 | 174.036 | 174.212 | 180.14 | 173.794 | 174.437 |

01/12/2013 | 175.932 | 175.161 | 181.86 | 174.507 | 175.286 |

01/01/2014 | 176.050 | 176.115 | 183.59 | 175.190 | 176.555 |

1/2/2014 | 176.818 | 177.381 | 177.656 | 177.105 | 177.354 |

1/3/2014 | 179.171 | 179.581 | 179.277 | 179.267 | 179.108 |

1/4/2014 | 179.956 | 180.740 | 180.913 | 180.383 | 180.425 |

1/5/2014 | 177.905 | 180.838 | 182.564 | 181.789 | 180.171 |

1/6/2014 | 180.301 | 181.923 | 184.23 | 182.888 | 179.960 |

1/7/2014 | 185.196 | 184.008 | 185.911 | 184.228 | 184.172 |

1/8/2014 | 188.336 | 186.071 | 187.607 | 185.452 | 187.317 |

1/9/2014 | 186.994 | 186.120 | 189.319 | 186.623 | 186.623 |

1/10/2014 | 187.468 | 187.159 | 191.047 | 187.8823 | 186.588 |

1/11/2014 | 190.558 | 188.182 | 192.790 | 189.051 | 189.694 |

1/12/2014 | 194.181 | 190.194 | 194.549 | 191.26 | 191.379 |

Results indicate that when the BPNN model is applied alone, it can increase the accuracy of the forecasting over the ARIMA model by capturing all of the data patterns. The results also show that the hybrid model that combines two models can reduce the errors of forecasting significantly. More precisely, the hybrid ARIMA/ANN model scheme outperforms all other three models with the lowest forecasting errors as indicated by the results. Similarly, the comparison results of cancer patients’ data are given in

Data | Actual values | Proposed BPNN | ARIMA (0.1.0) | Hybrid ARIMA/BPNN model scheme | Hybrid ARIMA/ANN |
---|---|---|---|---|---|

01/01/2015 | 25 | 25.337 | 21 | 22.977 | 21.901 |

01/02/2015 | 16 | 13.466 | 20 | 19.585 | 22.951 |

01/03/2015 | 27 | 27.071 | 20 | 32.314 | 28.281 |

01/04/2015 | 24 | 26.375 | 19 | 23.414 | 24.125 |

01/05/2015 | 38 | 27.861 | 19 | 20.942 | 32.531 |

01/06/2015 | 41 | 35.567 | 19 | 20.051 | 30.019 |

01/07/2015 | 31 | 24.449 | 19 | 35.317 | 31.867 |

01/08/2015 | 33 | 24.542 | 19 | 38.743 | 36.460 |

01/09/2015 | 28 | 26.217 | 19 | 35.563 | 27.498 |

01/10/2015 | 27 | 33.626 | 25 | 28.006 | 28.046 |

01/11/2015 | 30 | 31.767 | 23 | 30.882 | 29.974 |

01/12/2015 | 26 | 32.553 | 22 | 30.303 | 25.879 |

01/01/2016 | 54 | 30.800 | 21 | 49.158 | 52.173 |

01/02/2016 | 36 | 37.249 | 21 | 38.220 | 37.489 |

01/03/2016 | 34 | 34.168 | 21 | 34.944 | 35.175 |

01/04/2016 | 24 | 22.561 | 21 | 25.5499 | 25.689 |

01/05/2016 | 33 | 30.045 | 21 | 32.930 | 32.142 |

01/06/2016 | 31 | 31.796 | 21 | 31.783 | 33.498 |

01/07/2016 | 31 | 33.654 | 20 | 32.331 | 31.022 |

01/08/2016 | 25 | 32.344 | 20 | 27.029 | 24.471 |

01/09/2016 | 29 | 28.878 | 20 | 29.454 | 29.005 |

01/10/2016 | 43 | 31.701 | 20 | 39.726 | 39.777 |

01/11/2016 | 41 | 35.737 | 20 | 40.972 | 41.317 |

01/12/2016 | 37 | 35.365 | 20 | 38.189 | 37.120 |

The hybrid model gains the benefits of the ARIMA and the BPNN strength in linear and nonlinear modelling. The hybridization method is proven to improve forecasting performance. The results show that the ARIMA/ANN model scheme outperformed all the other three models used in this research. The more changes occurred in the time series models, the less efficient we can be by using forecasting models in isolation compared to the hybrid models

For many decision-makers the accuracy of time series forecasting is fundamentally important. In this research, two hybridization models were proposed to increase the forecasting accuracy. These models are the ARIMA/BPNN and the ARIMA/ANN models. A new dataset collected from Yemeni's hospital for cancer patients in Ibb province is used to evaluate the proposed models in addition to the CPI dataset. The proposed models were used jointly for linear and nonlinear models aiming to capture different relationship patterns in the data of time series. For each model, the results are given and analyzed based on statistical standard measures including MAE, MSE, RMSE, and MAPE. The results revealed that the hybrid prediction models reduced the randomness degree, the changes affecting the time series, and the data non-linearity. The results of two real-datasets confirmed the strength of the ARIMA/ANN model over other hybrid and single models introduced in this research. ARIMA/ANN model outperformed each component model used separately by increasing the accuracy of forecasting and decreasing the overall errors. On the other hand, modeling time series using the BPNN demands performing plenty of experiments since BPNN includes a huge number of parameters. These parameters that need to be set up include learning speed, hidden layers numbers, input neurons, iterations number, size of the training set, size of validation, and updating weights.

For future research work, we highly recommend involving the application of the proposed methods in this research to other real-world datasets of a bigger size. In this case, techniques that are more sophisticated need to be explored, such as a deep neural network.