Construction Industry operates relying on various key economic indicators. One of these indicators is material prices. On the other hand, cost is a key concern in all operations of the construction industry. In the uncertain conditions, reliable cost forecasts become an important source of information. Material cost is one of the key components of the overall cost of construction. In addition, cost overrun is a common problem in the construction industry, where nine out of ten construction projects face cost overrun. In order to carry out a successful cost management strategy and prevent cost overruns, it is very important to find reliable methods for the estimation of construction material prices. Material prices have a time dependent nature. In order to increase the foreseeability of the costs of construction materials, this study focuses on estimation of construction material indices through time series analysis. Two different types of analysis are implemented for estimation of the future values of construction material indices. The first method implemented was Autoregressive Integrated Moving Average (ARIMA), which is known to be successful in estimation of time series having a linear nature. The second method implemented was Non-Linear Autoregressive Neural Network (NARNET) which is known to be successful in modeling and estimating of series with non-linear components. The results have shown that depending on the nature of the series, both these methods can successfully and accurately estimate the future values of the indices. In addition, we found out that Optimal NARNET architectures which provide better accuracy in estimation of the series can be identified/discovered as result of grid search on NARNET hyperparameters.
Construction Industry operates relying on different economic indicators ranging from construction material prices to sales volumes and prices. Nearly all economic indicators have temporal (time-dependent) nature. The stakeholders in the industry either employers/investors or contractors/suppliers keep close eye on economic indicators to decide whether to start a new project, to complete a project in a planned or longer time period or abandon a project completely (to prevent bankruptcy). As the construction industry related economic indicators change over time, forecasts regarding these indicators are made using different econometric models, and mostly with time series analysis. For instance, indicators such as cost indices, price indices and sales volumes can be estimated using time series analysis [
To increase the foreseeability of the costs of construction materials, this study focuses on estimation of construction material indices. Time series analysis has been chosen as the estimation approach as the indicators of the indices have a time-dependent nature. Two different types of analysis are implemented for estimation of the future values of the indicators of material indices. The first approach was Box-Jenkins (ARIMA) method which is known to be successful in estimation of time series having a linear nature. The second method implemented was Non-Linear Autoregressive Neural Network (NARNET) which is known to be successful in modeling and estimation of series with non-linear components along with a linear nature. In addition, in this study, we have developed a grid search algorithm (for identifying best hyperparameters for the network) and an accompanying software tool to explore an optimal NARNET architecture for realizing most accurate estimation with NARNETs. Following the background section on the use of time series analysis techniques in construction industry, an exploratory analysis of the data is provided. This is followed by an elaboration on the details of ARIMA and NARNET based estimation procedures. In the final sections the results of the analysis are presented and discussed.
According to the literature, ARIMA models were used in many fields for different purposes including economics, to model and predict the exchange rates [
The Association of Turkish Construction Material Producers (IMSAD) was founded in 1984 acts as an organization that represents Construction Materials Industry both in Turkey and internationally. IMSAD has 85 industrial (company) and 52 industry association members. The association follows developments in the domestic market closely, and also keeps close track of foreign markets for increasing the continuity of success in material exports. IMSAD is well known with its Construction Material Industry Indices which are published on a monthly basis. The main index is known as the Compound Index and is composed of 3 main index groups, Activity Index Group, Expectation Index Group, and Trust Index Group. The Activity Index Group is composed of 6 indictors (A1.Domestic Sales, A2.Production, A3.Exports, A4.Endorsement, A5.Collection Rate, A6.International Sales Price) the Expectation Index Group is composed of 7 indicators focusing on expectations regarding next 3 months (E1.Expection from Economy in General, E2.Expectation from Construction Material Industry, E3.Expectation of Domestic Orders, E4.Expectation of Export Orders, E5.Expectation of Production, E6.Expectation of New Production Capacity Investments, E7.Expectation for Employment) and Trust Index Group is composed of 5 indicators (T1.General Course of Economy, T2.General Course of Construction Industry, T4.General Trend in Domestic Markets, T3.General Trend in Construction Materials Industry, T5.General Trend in Export Markets. The value of each indicator is determined on a monthly basis, based on responses to indicator questions (which are sent to members of the association periodically each month). The values of the indicators for a specific month have been calculated by taking 100 as the reference (base) value which refers to the indicator value of August 2013 (base year/month).
In this study, we collected monthly data that cover the period from 2013:8 to 2021:3 regarding all indicators A (1–6), E(1–7) and T(1–5).
Following the examination of the time series plots, in order to identify the stationarity of the time series at level (i.e., diff = 0), Autocorrelation Plots of all indicators were generated and examined. The autocorrelation plots indicated that all series show signs of non-stationarity at Level as the values do not tend to degrade to zero quickly (e.g., in 3–4 lags) in all of the graphics (
In the following phases of the research the applicability of Box-Jenkins (ARIMA) method and Optimized NARNETs in making future predictions of the indicators are tested. The tests started with a proof-of-concept exercise to demonstrate the applicability of Box-Jenkins Method for a selected (solo) indicator. Later a toolbox is developed and tested to facilitate future predictions of all indicators by exploring, finding, and utilizing Optimized NARNETs.
In the start of the modeling process, to efficiently validate the results, the data is divided into training and testing sets. The training set covered the period between 08.2013–06.2019 (71 obs.) and the test set covered the period between 07.2019–03.2021 (21.obs). The Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are used as the performance metrics of the models in all training and validation stages.
Following the exploratory analysis of the data we employed the Box-Jenkins methodology for forecasting the values of a selected (solo) indicator. The indicator to be estimated is determined as E1, as a result of examination of Autocorrelation Function (ACF) (
ARIMA models are used to model linear relationships in data. In fact, most time series are characterized by high variations and rapid transient periods, thus a nonlinear approach should be used to model these type of time series. As indicated in [
The equation explains how a NARNET can be utilized to predict the value of series y at time t, y(t), using the (p) past values of the series. The function h(.) is unknown in advance, and the training of the NARNET aims to approximate the function by means of the optimization of the weights and biases. The error term ε(t) stands for the error of the approximation of the series y at time t. NARNET is a multilayer, recurrent, dynamic network, with feedback connections. In a NARNET the terms y(t − 1), y(t − 2), . . . , y(t − p), are known as feedback delays (
The most commonly used learning rule for the NAR Network (NARNET) is the Levenberg-Marquardt backpropagation procedure (LMBP) [
In this study we propose and implement a grid search algorithm to find the most accurate NARNET (NARNET with minimum errors) through optimizing its hyperparameters. The training function used to train the network was chosen as LMBP as it the most commonly used function in the training of NARNETs. The hyperparameters used as the input of our objective function were i.) the size of hidden layer and ii.) the number of feedback delays. The training can be repeated N times to achieve better accuracies, and the number of epochs in each training are adjusted/determined automatically by the LMBP based training of NARNET in MATLAB. The accuracy of the model in our algorithm is calculated on the basis of test set. The pseudocode of the grid search algorithm is given in Listing 1.
Listing 1. Grid-search algorithm
best_rmse ← high_positive_value
best_mae ← high_positive_value
bestnet, best_record, best_rsqr
[training_record, trained_net] = train_network (trainset)
predictions = trainednet.predict (testset)
[rmse, mea, rsqr] = calculate_metrics (testset, predictions)
bestnet ← trained_net
best_record ← training_record
best_rmse ← rmse
best_mea ← mae
best_rsqr ← rsqr
The grid search algorithm has been implemented in MATLAB and embedded in a tool developed by the authors. The tool developed is a MATLAB App and provides a Graphical User Interface (GUI) which can be used in data preparation, entry of hyperparameter options, training, and visualization of the results. The tool consists of 3 parts. The first set of parameters (Prepare Data Tab) focuses on data preparation for optimal network search. The data can be loaded in form of an Excel or CSV file by using the “Load Data” command from the toolbar menu. Once loaded, the unprocessed/raw data is visualized in a Data Table at the lower part of the window. The column in focus (i.e., the series in focus when there are multiple series) can be determined as data column using this interface, and once determined the data column is illustrated with blue color. A cut-point row can be determined also here. The cut point row, once determined, indicates where to separate the data into train and test sets. The last row of the training set to be prepared is colored in red, while the first row of the test set to be prepared is colored in yellow. Graph View switch is used to provide a time series plot of the data. Once this switch is On, any click on the data column would provide a time series plot of the data as an image. Differencing (Δ) can be applied to the data if the data has trend or seasonality to remove these effects. The data preparation interface allows the user to apply first (Δ1) or second (Δ2) order differencing to the dataset. (
The second set of parameters is for inputting the hyperparameter boundaries that would be used in the grid search to find the optimal NARNET, in parallel with the algorithm provided in Listing 1. In Listing 1, two hyperparameters form the search space (number of feedback delays and hidden layer size), and each configuration of combination of these two hyperparameters can be run N times. The user can input Feedback Delays parameter as an integer value (f), which indicates that the grid search will be conducted between 1 and (f) feedback delays for all different hidden layer sizes. Secondly the user can provide an array of “Hidden Layer Sizes”, which indicates that the grid search will consider each hidden layer size provided in this array, for instance if this array is [5 10 15], this indicates that the grid search will take single layer NARNETs with 5, 10, and 15 neurons into account (for each feedback delay) during the search for the optimal model. A training configuration (a single combination of a feedback delay and hidden layer size e.g., [1, 5] or [5, 10]) can be run N times in order to repeat the LMBP based training process of the NARNET N times. Each of these runs would result in similar networks in terms of architecture (number of neurons, feedback delays) but with different weights and biases determined in each run, and thus having different accuracies (RMSE, MAE scores). Running a training configuration multiple times increases the chance of finding the optimal (most accurate) NARNET for each configuration. Thus, this step contributes to the grid search not as an hyperparameter, but by deeply searching the best weight-bias combination for each hyperparameter configuration. The user can input number of times the training configuration will be run, i.e., run of each config, among the second set of parameters (
The third set of parameters provides the accuracy metrics calculated for the optimal NARNET. These parameters are RMSE/MAE (and R2, the supplementary measure). These are calculated for each iteration using the test set during the optimal model search (see Listing 1). Once the training is complete, and the optimal (most accurate) model is determined, its accuracy metrics are displayed in the third parameter set (Results Tab). At the completion of the training, the training record is logged by MATLAB and the optimal network is saved, both as MATLAB environment variables. The optimal network saved can later be used for further validation studies with different training and test sets to further assess its performance (
Once all the training and tests are complete, we evaluated the results achieved through Box-Jenkins (ARIMA) and Optimal NARNET search methods, the next section of the paper elaborates on the numerical results and provides a discussion on applicability of these techniques for estimation of Construction Material Indices.
As a result of 32 rounds of analysis with EViews 10 explained previously, the estimated coefficients of the best fit ARIMA model is shown in
Dependent variable: E1 | ||||
---|---|---|---|---|
Variable | Coefficient | Standard error | t-statistic | Probability |
Constant | −0.59227 | 0.08802 | −6.72825 | 0.0000 |
AR(1) | 1.24341 | 0.11351 | 10.95343 | 0.0000 |
AR(2) | −0.34803 | 0.10451 | −3.32988 | 0.0014 |
MA(1) | −0.99979 | 0.00023 | −4218.626 | 0.0000 |
Following this, we estimated the i.) Training Set through Static (in-of-sample) and Dynamic (out-of-sample) estimation methods, and ii.) Test Set through Static (in-of-sample) and Dynamic (out-of-sample) estimation methods. The accuracy metrics for all these estimations are provided in
Date range | Estimated set | No. of. obs | Forecast type | RMSE | MAE |
---|---|---|---|---|---|
09.2013–06.2019 | Training | 70 | Static | 1.1256 | 0.8531 |
09.2013–06.2019 | Training | 70 | Dynamic | 1.2108 | 0.9083 |
07.2019–03.2021 | Test | 21 | Static | 1.0157 | 0.8917 |
07.2019–03.2021 | Test | 21 | Dynamic | 0.8826 | 0.7264 |
Following the Box-Jenkins (ARIMA) modeling process of E1, all indicators have been modeled and estimated with the optimal NARNETs (discovered by the MATLAB App developed during this study). As the Autocorrelation Plots of Indicators at Level (
Estimated set | Optimal network hidden layer size | Optimal network feedback delays | RMSE | MAE | σ test | RMSE/σ | MAE/σ |
---|---|---|---|---|---|---|---|
E1-test | 15 | 1 | 0.710 | 0.585 | 0.806 | 0.881 | 0.726 |
E2-test | 15 | 1:2 | 0.929 | 0.638 | 0.983 | 0.945 | 0.649 |
E3-test | 15 | 1:2 | 1.461 | 1.066 | 1.507 | 0.969 | 0.707 |
E4-test | 15 | 1 | 3.311 | 2.336 | 3.396 | 0.975 | 0.688 |
E5-test | 15 | 1 | 5.675 | 3.746 | 5.399 | 1.051 | 0.694 |
E6-test | 20 | 1 | 1.327 | 0.983 | 1.614 | 0.822 | 0.609 |
E7-test | 15 | 1 | 1.917 | 1.586 | 2.205 | 0.869 | 0.719 |
A1-test | 15 | 1:2 | 4.102 | 3.350 | 4.559 | 0.900 | 0.735 |
A2-test | 20 | 1 | 5.068 | 4.185 | 5.160 | 0.982 | 0.811 |
A3-test | 15 | 1 | 3.543 | 2.712 | 3.796 | 0.933 | 0.714 |
A4-test | 25 | 1:3 | 5.533 | 4.606 | 5.687 | 0.973 | 0.810 |
A5-test | 20 | 1 | 0.522 | 0.416 | 0.483 | 1.081 | 0.861 |
A6-test | 15 | 1 | 1.107 | 0.890 | 2.006 | 0.552 | 0.444 |
T1-test | 25 | 1 | 0.902 | 0.647 | 0.812 | 1.111 | 0.797 |
T2-test | 20 | 1:2 | 1.036 | 0.750 | 0.905 | 1.145 | 0.829 |
T3-test | 15 | 1 | 1.153 | 0.905 | 1.028 | 1.122 | 0.880 |
T4-test | 20 | 1 | 0.858 | 0.582 | 0.836 | 1.026 | 0.696 |
T5-test | 15 | 1 | 2.727 | 1.687 | 3.088 | 0.883 | 0.546 |
In order to compare the performance of ARIMA and NARNET models, the estimation accuracies for E1 were checked. For the test set of E1, the static forecast of the ARIMA model results in RMSE:1.0157, and MAE: 0.8917, while the static forecast of the NARNET model results in RMSE:0.710 and MAE:0.585. The NARNET model has significantly lower error rates when compared with ARIMA model, RMSE: 0.881 σ
When all estimations with NARNET models complete, it is observed that the model accuracies in terms of RMSE range between 0.552 σ–1.145 σ, and in terms of MAE, they range between 0.444 σ–0.880 σ. The results indicate that when optimal network delays were found more than one (i.e., 1:2 or 1:3) the series tend to have a relatively high variance (e.g., A1, A4, E2, E3), but not all series with high variance (e.g., E5, A2) have been modelled optimally with networks having delays more than one. Thus, it is not possible to argue that a correlation in optimal models exists between series variance and number of network delays input into the system. The majority of the number of neurons in the hidden layer of optimal networks were 15 (lowest alternative evaluated). This might be related to the size of training sets (70 obs.), where low complexity in networks provide better estimation results of small datasets (following the law of parsimony). The results also demonstrated that regardless of the nature and complexity of the time series data, NARNETs are able to model the time-dependent relationship in data with success, which would not be the case for solo use of linear models such as ARIMA.
Cost is a key concern in the operations of the construction industry, material costs has a huge impact on the overall cost of construction. In order to foresee the trends of the cost of construction materials, this study concentrated on estimation of construction material indices. The literature indicates that in modeling time dependent indicators of the construction industry there is no one-size-fits-all solution or model, and different modeling techniques have to be employed and tested for series of different nature. In our study, the estimation of construction material indices is accomplished through ARIMA and Non-Linear Autoregressive Neural Network (NARNET) methods. The estimation with Box-Jenkins (ARIMA) Methodology has been done for the E1 indicator, where an ARMA model is fitted to the first difference of the E1 series. The static (in of sample) estimation of the test set of E1 resulted with RMSE 1.26 σ and MAE:1.106 σ, which can be considered as a good accuracy. Following this, optimal NARNET architectures for all indicators have been identified through a grid search algorithm (developed for identifying best hyperparameters for the network) and utilizing an accompanying software tool. The optimal NARNET models provided better accuracies, for instance, the static (in of sample) estimation of the test set of E1 resulted with RMSE: 0.881 σ and MAE:0.726 σ. The algorithm and accompanying MATLAB App developed demonstrated that grid search can be efficiently used in finding the NARNETs with optimal hyperparameters. The study results have demonstrated that depending on the nature of the series, both methods can successfully and accurately estimate the future values. The proposed approach presents a new direction and method in estimation of construction material price indicators. The developed tool can be used by construction industry professionals and cost managers to efficiently estimate trends related to material prices which would lead to take more effective estimation of material prices and this in turn would enhance the foreseeability of the material costs in the construction industry.