iconOpen Access

ARTICLE

Interpretable AI Hybrid Model for Electricity Demand Forecasting: Combining TFT and XGBoost in Smart Grid Data

Sobhan Manjili1, Saeid Jafarzadeh Ghoushchi1, Mohammad Reza Maghami2,*, Mazlan Mohamed3,*

1 Faculty of Industrial Engineering, Urmia University of Technology, Urmia, Iran
2 Strategic Research Institute (SRI), Asia Pacific University of Technology & Innovation (APU), Technology Park Malaysia, Bukit Jalil, Kuala Lumpur, Malaysia
3 Faculty of Artificial Intelligence and Cyber Security (FAIX), University Technical Malaysia Melaka, Melaka, Malaysia

* Corresponding Authors: Mohammad Reza Maghami. Email: email; Mazlan Mohamed. Email: email

Computer Modeling in Engineering & Sciences 2026, 147(1), 23 https://doi.org/10.32604/cmes.2026.076217

Abstract

Accurate electricity load forecasting is crucial for optimizing power distribution networks, especially in rapidly growing cities like Tabriz (annual consumption growth of 7.2%). This study presents a hybrid AI framework integrating the Temporal Fusion Transformer (TFT) and XGBoost for residual error correction. The model is trained and evaluated using actual consumption data from Tabriz’s distribution network (2021–2023). Compared to a baseline TFT model, the proposed framework demonstrates a 11.2% reduction in RMSE (from 0.1249 to 0.1109) and a 10.7% decrease in MAE (from 0.0998 to 0.0891). Attention mechanism analysis reveals temperature (importance coefficient = 0.32), weekly patterns (0.18), and industrial activity (0.21) as key factors influencing electricity consumption in Tabriz. Achieving a MAPE of 4.2%, the framework provides actionable insights into consumption drivers. This research demonstrates the effectiveness of the proposed model in managing load fluctuations characteristic of medium-sized cities and offers potential for adaptation to similar urban contexts. The dual capability for accurate prediction and interpretable feature analysis establishes a new benchmark for smart grid analytics in emerging smart city environments.

Keywords

Electricity load forecasting; hybrid AI models; Temporal Fusion Transformer (TFT); XGBoost; interpretable machine learning; smart grid analytics; urban energy management; attention mechanisms

1  Introduction

Currently, traditional energy sources such as oil and coal remain the primary suppliers of energy worldwide. With the increasing consumption of energy, the levels of carbon dioxide emissions have also risen significantly, leading to serious environmental consequences, climate change, and threats to human security. On the other hand, energy consumption in buildings (regardless of their function) accounts for approximately 30% of the world’s total electricity consumption and continues to grow at a notable rate [1]. The continuous rise in building energy consumption, due to its detrimental effects on the environment, has garnered widespread attention globally [2]. In 2015, residential and commercial buildings were responsible for nearly 73% of electricity consumption and 41% of primary energy consumption in the United States, and these figures are projected to increase over the next 20 years [3]. There is a growing emphasis on the development and implementation of smart grids and smart buildings to address these electricity needs in a cost-effective manner while minimizing greenhouse gas emissions [4,5]. In this context, given the importance of energy consumption management, short-term hourly or half-hourly forecasting of Building Energy Consumption (BEC) has become a key tool in regulating real-time demand and optimizing the energy efficiency of buildings. This process not only helps reduce energy consumption but also plays a significant role in the efficient management of distribution networks and the sustainable utilization of energy resources [6].

Understanding the behavioral patterns of occupants is crucial for achieving high productivity and reducing energy consumption in buildings, whether in commercial or residential sectors. Research has demonstrated that household energy consumption is significantly influenced by the behavior of residents, and this phenomenon is global in nature, not confined to any specific geographical location [7]. One of the primary research areas in power system management and planning is energy consumption analysis. Data analysis and extraction play a vital role in providing insights into electricity usage [8]. Several studies have concluded that household behavior can significantly influence building energy consumption. Menezes et al. argue that energy-saving practices in households have the potential to reduce greenhouse gas emissions in the United States, with minimal or no impact on household well-being. Furthermore, behavior prediction-based efficiency programs have proven to be one of the most cost-effective energy efficiency strategies in the building industry. Research has focused on the impact of occupant behavior on building energy consumption, aiming to predict consumption patterns in building energy simulation tools to enhance their accuracy [9]. Forecasting production and consumption loads enhances the optimal management of energy networks by maintaining a balance between supply and demand, reducing production costs, estimating fair energy pricing, and scheduling capacity and future planning. These forecasts are critically important for energy providers and other stakeholders in power systems, as well as for industries involved in electricity generation, transmission, and distribution [10]. Forecasting consumption patterns emphasizes reducing energy consumption through savings and efficiency, thereby lowering costs while delivering better services. Accurate prediction of consumption patterns (consumer behavior) plays a pivotal role in this process, as it contributes to the sustainability and security of the system by optimizing energy management, reducing reliance on reserves, enhancing grid efficiency, and facilitating trade in energy markets [1113].

Based on the forecasting time horizon, Mocanu et al. classified electricity demand forecasting into three categories: (a) short-term forecasts ranging from one hour to one week, (b) medium-term forecasts spanning one week to one year, and (c) long-term forecasts with a time frame exceeding one year [14]. Short-term forecasts are generally useful for capacity planning and short-term maintenance, assessing short-term energy storage consumption, as well as real-time control of building energy systems and optimizing fuel procurement strategies [15,16]. On the other hand, medium- and long-term forecasts are used for decision-making regarding the installation of new distributed generation and storage systems [17], as well as for developing appropriate demand response strategies [14]. At the regional level, forecasting total electricity consumption over medium- to long-term horizons can be beneficial for planning and trading in electricity markets [18]. Approaches to forecasting electricity demand in buildings can be broadly categorized into two main types: physics-based and data-driven [19,20]. Physics-based or deterministic models typically rely on the formulation and solution of thermal and mass balance equations, which establish connections between various building zones, air transport systems, and internal equipment [21]. However, these physics-based models often fail to account for the complex and multifaceted nature of energy consumption behavior in buildings. Additionally, determining the required input parameters for these models poses significant practical challenges. These limitations lead to substantial approximation errors. As a result, such models are primarily utilized as comparative analytical tools rather than precise predictors of building energy consumption [19]. Data-driven forecasting techniques can be classified into statistical models [22,23] and artificial intelligence-based models [24,25], although hybrid models are also present in the related literature [26,27]. Table 1 provides an overview of selected studies conducted between 2020 and 2023 in the field of predicting electricity consumer behavior using machine learning and artificial intelligence techniques.

images

Statistical models and machine learning (ML) methods have been proposed as effective alternatives to physics-based modelling approaches [19,20]. Numerous researchers have employed statistical models for predictive purposes. Representative examples include regression models, Autoregressive Integrated Moving Average (ARIMA) models, Autoregressive Moving Average (ARMA) models, and exponential smoothing methods [13,28,43]. Statistical models demonstrate satisfactory performance in solving linear problems as classical approaches. However, when addressing nonlinear problems, modern AI-based methods significantly enhance the accuracy of data-driven models through their capability to identify complex relationships and patterns within datasets. In this context, researchers have employed a hybrid approach combining statistical, mathematical, and artificial intelligence techniques to achieve more precise electric load demand forecasting, thereby optimizing energy supply and distribution management. Empirical findings indicate that deep neural networks (DNNs) and their integration with machine learning algorithms or metaheuristic optimization techniques outperform conventional methods, with notable advantages in three key aspects: (1) Handling complex nonlinear relationships, (2) Substantially reducing model complexity, (3) Significantly improving prediction accuracy [6].

Recent advancements in hybrid deep learning approaches and interpretable transformer-based load forecasting have also been noteworthy. For instance, hybrid LSTM-XGBoost models have demonstrated remarkable accuracy in load forecasting for energy communities [44] and smart grids [45]. Additionally, the integration of XGBoost with CNN-GRU [46] and hybrid Autoformer frameworks [47] have yielded improvements in electricity demand prediction. In terms of interpretability, the application of the Temporal Fusion Transformer in substation load forecasting [48] and grid hierarchies [49], along with attention mechanisms for prediction under specific conditions such as heatwaves [50], underscore the emphasis on interpretable capabilities. The present work complements these advancements by focusing on the integration of TFT and XGBoost in the context of developing cities.

Despite these capabilities and advancements, a significant gap persists between predicted and actual electricity consumption. This discrepancy is commonly referred to as the “performance gap” in energy forecasting literature [51]. This research employs a hybrid combination of Temporal Fusion Transformer (TFT) and XGBoost models to address this challenge and enhance power distribution network performance. The TFT model, as an advanced transformer-based architecture, demonstrates superior capability in processing high-dimensional time-series data while effectively extracting complex inter-variable relationships and calculating the importance of both static and dynamic features. The XGBoost model, as a powerful decision tree-based algorithm, is utilized to improve final predictions and reduce errors. This hybrid approach leverages both TFT’s capacity for learning complex temporal dependencies and XGBoost’s ability to identify nonlinear patterns and optimize the final model, resulting in improved electricity consumption forecasting accuracy compared to traditional methods. This paper is organized in 4 sections. The proposed methodology is introduced in Section 2. Section 3 presents a case study and implementation of the proposed methodology to demonstrate its practical applicability and feasibility. In Section 4, conclusions and recommendations for future research are presented.

2  Methodology

In this research, a hybrid approach based on the Temporal Fusion Transformer (TFT) model and the XGBoost algorithm has been employed for accurate electricity consumption forecasting. The TFT model, developed on a transformer architecture, is capable of processing high-dimensional time-series data. By leveraging the self-attention mechanism, it can effectively identify complex relationships and long-term dependencies among variables. These features enhance the modelling precision of influential electricity consumption patterns, leading to improved predictive accuracy.

In the next stage, the extracted outputs from the TFT model are fed into the XGBoost algorithm, which is a gradient-boosted decision trees (GBDT) method. XGBoost plays a key role in enhancing predictive performance and reducing model error by efficiently processing nonlinear features, mitigating overfitting through regularization parameter tuning, and optimizing computational efficiency.

The primary innovation of the proposed method lies in the simultaneous exploitation of the interpretability capabilities of the Temporal Fusion Transformer (TFT) model and the superior error-correction performance of the XGBoost model. Unlike many existing hybrid approaches, which merely use the initial predictions of one model as input for correction by a second model, the present framework leverages the interpretable outputs of TFT—including the initial predictions, attention weights, and variable importance scores extracted from the model’s internal layers. These features are normalized using min-max scaling and subsequently fed into the XGBoost model to predict residual errors. Hyperparameter tuning for XGBoost was performed using Grid Search with a Rolling/Walk-Forward validation on the validation set. This approach proves particularly effective in the context of developing cities characterised by heterogeneous consumption patterns (such as Tabriz), as the features extracted by TFT provide meaningful, context-aware information that enhances the accuracy of error correction.

2.1 TFT Model Architecture

The TFT model comprises several key layers, as illustrated in Fig. 1.

images

Figure 1: The TFT architecture.

2.1.1 Key Layers of TFT

Processing Static Features

Static features (such as days of the week or holidays) are processed as follows:

•   Encoding Static Features: Static features (s) are mapped to a low-dimensional vector (cs).

cs=Embedding(s)(1)

This vector is propagated as input to the subsequent layers.

•   Extraction of Static Information: A fully connected neural network (FCN) is employed to extract the information.

hs=ReLU(Wscs+bs)(2)

here, Ws and bs are the learnable parameters.

Temporal Features Processing

Time series features (such as hourly power consumption or temperature) are processed as follows:

•   Encoding Temporal Features: The temporal features xt are mapped to a low-dimensional vector ct.

ct=Embedding(xt)(3)

•   Processing with LSTM/GRU: An LSTM or GRU layer is employed to extract temporal features:

ht,ct=LSTM(ct,ht1,ct1)(4)

here, ht represents the hidden state at time t.

Multi-Head Attention Layers

The multi-head attention layers are employed to identify the relationships between variables. These layers operate as follows:

•   Attention Mechanism Calculation:

Q=WQht,K=WKht,V=WVht(5)

where Q, K, and V represent the query, key, and value matrices, respectively, and dK denotes the dimension of the keys.

WQ,WK,WV are the learning parameters.

•   Calculation of Attention Weights:

The attention weights αij are calculated as follows:

Ai=jαijVj(6)

The outputs from all attention heads interconnect:

A=Concat(A1,A2,,AH)(7)

The variable H here represents the number of attention heads.

Interpretable Layers

These layers utilize attention mechanisms to identify the significance of variables.

•   Calculation of Variable Importance:

The importance weights βt are calculated as follows:

βt=softmax(Wβht+bβ)(8)

where ht is the hidden state at time t, and Wβ and bβ are learnable parameters.

•   Calculation of Interpretable Output:

The interpretable output ot is calculated as follows:

ot=iβtihi(9)

Forecasting Layers

The interpretable output ot is fed into a fully connected layer.

y^t=Wyot+by(10)

here, Wy and by are the learning parameters.

To train the model, the Mean Squared Error (MSE) loss function is employed:

=1Tt=1T(yty^t)2(11)

where yt represents the actual values and y^t denotes the predicted values.

2.2 The XGBoost (Extreme Gradient Boosting) Method

2.2.1 Objective Function

The objective function in XGBoost consists of two components: Loss Function: Measures the prediction error. Regularization Term: Prevents model overcomplexity. The objective function is defined as follows:

(ϕ)=i=1nl(yi,y^i)+k=1kΩ(fk)(12)

where l(yi,y^i) is the loss function for the i-th sample. Ω(fk) is the regularization term for the k-th tree. y^i is the model’s prediction for the i-th sample. k is the number of trees.

2.2.2 Loss Function

The loss function varies depending on the type of problem (regression, classification, etc.). For example:

•   Regression: Mean Squared Error (MSE) is used:

l(yi,y^i)=(yiy^i)2(13)

•   Classification: The Logarithmic Loss (Log Loss) metric is employed.

l(yi,y^i)=[yilog(y^i)+(1yi)log(1y^i)](14)

2.2.3 Regularization Term

The regularization term is employed to prevent overfitting and is defined as follows:

Ω(fk)=γT+12λj=1Twj2(15)

where T is the number of leaves in the tree. wj represents the weight of the j-th leaf. γ and λ are regularization parameters.

2.2.4 Model Training

The XGBoost model employs an iterative training process. In each step, a new tree is added to the model to reduce the residual errors from the previous model.

•   Initial Prediction: The initial prediction y^i(0) is typically determined as the mean of the target values (for regression) or the base probability (for classification).

•   Calculation of Gradients: The gradient gi and the Hessian hi are computed as follows:

gi=l(yi,y^i(t1))y^i(t1),hi=2l(yi,y^i(t1))(y^i(t1))2(16)

•   Optimization of the New Tree: The new tree ft is trained by optimizing the following objective function:

(t)=i=1n[gift(xi)+12hift2(xi)]+Ω(ft)(17)

•   Forecast Updates: Forecasts are updated by adding a new tree.

y^i(t)=y^i(t1)+ηft(xi)(18)

here, η represents the learning rate.

2.3 Combining TFT and XGBoost for Electricity Consumption Prediction

In this section, we explain how the output of the Temporal Fusion Transformer (TFT) model can be used as input for the XGBoost model to enhance prediction accuracy. This hybrid approach leverages TFT’s strength in processing time-series data and XGBoost’s capability in refining predictive performance.

Below, before explaining in detail how to combine the two models, the process of combining the two will first be explained in a simple, step-by-step manner.

Step 1: Train the TFT model on the training data and generate initial predictions.

Step 2: Compute the residual errors (Residual errors = Actual values − TFT predictions).

Step 3: Train the XGBoost model using the same features to predict the residual errors of the TFT model.

Step 4: Obtain the final prediction as the sum of the TFT prediction and the XGBoost correction (Final prediction = TFT prediction + XGBoost prediction).

2.3.1 General Steps for Combining TFT and XGBoost

•   Training the TFT Model: The TFT model is trained on time-series data to generate initial predictions.

•   Extracting TFT Outputs: The outputs of TFT (predictions and extracted features) are fed as inputs to XGBoost.

•   Training XGBoost: The XGBoost model is trained on the TFT outputs to reduce residual errors.

•   Final Prediction: The final predictions are produced by combining the outputs of TFT and XGBoost.

2.3.2 The Output of TFT Serves as the Input to XGBoost

The output of the TFT model consists of two main components:

•   TFT Forecasts (y^t): These forecasts are calculated as follows:

y^t=Wyot+by(19)

where ot is the interpretable output of TFT. Features Extracted by TFT: These features comprise the information extracted by the TFT layers, including: Hidden states of LSTM/GRU (ht). Attention weights (αij). Variable importance weights (βt).

2.3.3 XGBoost Input Structure

The input format for XGBoost is defined as follows:

XXGBoost=[y^t,ht,αij,βt](20)

where y^t: TFT (Temporal Fusion Transformer) forecasts. ht: Hidden states of LSTM/GRU. αij: Attention weights. βt: Variable importance weights

1.    Layer Selection and Heads: To maintain interpretability and enforce dimensionality constraints suitable for XGBoost, we exclusively utilize the features derived from the Final Multi-Head Attention Layer. The attention weights, αij, are extracted from this specific layer.

2.    Aggregation Method: To transform the tensor-based attention features into a fixed-dimension vector, we employ Mean Pooling across all heads for each time step t. Specifically, αavg is calculated as the average of the attention weights (αij) across all H attention heads:

αavg=1Hh=1Hαh,t(21)

where H is the total number of attention heads. This operation effectively reduces the dimensionality of the attention features to a single scalar value for time t.

3.   Final Feature Dimensionality (Dfinal): The final dimensionality of the input vector XXGBoost is determined by the concatenation of these components:

Dfinal=1+dh+1+dv(22)

where:

•   1: Corresponds to y^t (the final forecast output from TFT).

•   dh: The dimension of the LSTM/GRU hidden state (ht).

•   1: Corresponds to αavg (the mean pooled attention weight).

•   dv: The dimension corresponding to the static feature vector, which is implicitly represented by the variable importance vector βt (the extracted importance weights provide a fixed-size, context-aware representation related to static/dynamic feature influence).

2.3.4 Training XGBoost on TFT Output

•   XGBoost Objective Function:

The XGBoost objective function is defined as follows:

(ϕ)=i=1nl(yi,y^iXGBoost)+k=1KΩ(fk)(23)

In this context, y^iXGBoost represents the predictions generated by the XGBoost algorithm.

•   Optimization of Trees:

XGBoost utilizes gradients and Hessians for tree optimization as follows:

gi=l(yi,y^iTFT)y^iTFT,hi=2l(yi,y^iTFT)(y^iTFT)2(24)

here, y^iTFT represents the TFT predictions

•   Forecast Updates:

Final forecasts are updated as follows:

y^ifinal=y^iTFT+ηk=1Kfk(XXGBoost)(25)

here, η represents the learning rate.

The advantage of this approach lies in TFT’s capacity to learn complex temporal patterns and generate interpretable features, while XGBoost optimizes these features to correct residual errors. This synergistic integration proves particularly suitable for developing cities like Tabriz, characterized by heterogeneous industrial consumption patterns.

3  Discussion and Results

The case study of this research is Tabriz city, located at 38.0792° N, 46.2887° E in northwestern Iran, with a statistical population of 1000 residential electricity consumers and data covering the period from January 2021 to July 2023. The dataset comprises real-world hourly electricity consumption data from the Tabriz distribution network, covering the period from 1 January 2021, to 31 July 2023. This results in a total of 22,608 hourly samples. The collected data, recorded at 60-min intervals (yielding 24 consumption records per day), was categorized into historical load data (past electricity consumption), environmental data (weather conditions like temperature, humidity, and pressure), and temporal data (time-related variables including hour, day, and season). The data was recorded at 60-min intervals, resulting in 24 data points per 24-h period. This level of granularity provides sufficient accuracy for analysis. However, during the data preprocessing phase, certain records were removed due to missing, incomplete, corrupted, or noisy data (outliers). After preprocessing, a total of 16,950 records remained for analysis. Notably, missing data could be estimated using a multilayer perceptron (MLP) neural network, as referenced in [40]. The proposed TFT-XGBoost method was then implemented in PyCharm Community Edition 2024.3.1.1 and evaluated using this processed dataset. Due to the time-series nature of the data, the samples were split chronologically into training (70%), validation (15%), and test (15%) sets. Specifically January 1, 2021, to approximately 23 September 2022 (first 11,865 samples after preprocessing, corresponding to 70% of the data), Validation data: approximately 24 September 2022, to 4 January 2023 (next 2543 samples, 15% of the data) and Test data: 5 January 2023, to 31 July 2023 (remaining 2542 samples, 15% of the data). As illustrated in Fig. 2, the proposed framework reduces the training error through an iterative model training process, wherein a new tree is sequentially added at each step to minimize the residual errors of the preceding model. Fig. 1 clearly depicts this progressive reduction in training error as trees are incrementally incorporated into the ensemble. To evaluate computational efficiency, the inference time of the models was measured on a standard CPU (Intel Core i7, without GPU acceleration). The average training time was approximately 250 min for the standalone TFT model and 280 min for the hybrid model (with a minor increase attributable to the additional training of the XGBoost component). The standalone TFT model requires an average of 42 ms per hourly prediction, whereas the proposed hybrid model (TFT + XGBoost) achieves this in only 23 ms. This approximately 1.8-fold improvement is primarily attributable to the lower computational overhead of XGBoost when correcting residual errors based on features extracted by the TFT.

images

Figure 2: The error reduction in the network training algorithm achieved by the proposed TFT-XGBoost approach.

To assess the stability of the interpretability patterns, attention weights and variable importance scores were analyzed separately across the four seasons as well as in different time windows (weekly and monthly). The results indicated that the patterns are relatively stable: temperature consistently exhibited the highest importance across all seasons (ranging from 0.30 to 0.35), while the importance of industrial activity slightly increased during winter (up to 0.25). Weekly patterns (e.g., weekend peaks) also remained consistent throughout the periods. This stability strengthens the model’s interpretability claim and demonstrates its reliability for practical applications in grid management.

The scatter plot presented in Fig. 3 displays the actual electricity consumption values (True Values) against the predicted values generated by TFT and the hybrid TFT-XGBoost method. The ideal prediction line (Ideal Prediction), depicted as a diagonal line with a slope of 1, represents the scenario where predicted values perfectly match the actual values. The data points corresponding to TFT-XGBoost predictions are positioned closer to the ideal line, indicating the model’s superior accuracy in forecasting electricity consumption. In contrast, the points associated with TFT predictions exhibit a greater deviation from the ideal line, suggesting lower predictive accuracy compared to TFT-XGBoost. The narrower dispersion of TFT-XGBoost predictions around the ideal line further confirms that this model yields smaller errors and demonstrates better overall performance. In general, the combination of TFT and XGBoost can enhance prediction accuracy, as XGBoost effectively compensates for residual errors in TFT’s forecasts.

images

Figure 3: The scatter plot of actual data values against their predicted values for both the TFT and TFT-XGBoost methods.

Prediction Accuracy

The evaluation results of the proposed method (TFT-XGBoost) for electricity consumption forecasting in Tabriz demonstrate that this method performs exceptionally well in terms of prediction accuracy. Table 2 presents the evaluation metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE), for the proposed method and the baseline TFT model.

images

To investigate the individual contribution of the XGBoost component to the accuracy improvements, an ablation study was conducted. The results demonstrate that removing XGBoost (i.e., relying solely on the TFT model) causes the performance to revert to the baseline level, indicating that all reported enhancements are directly attributable to the residual error correction provided by XGBoost. These findings are presented in Table 2 too.

As illustrated in Table 2, the proposed method demonstrates superior performance across all evaluation metrics compared to TFT model. For instance, the RMSE decreased from 0.1249 to 0.1109, indicating an approximately 11.2% improvement in prediction accuracy. All reported performance improvements are directly attributable to the XGBoost component for residual error correction. Removal of XGBoost results in the performance reverting to the baseline TFT level, thus validating the essential and irreplaceable contribution of the residual component. Fig. 4 presents a comparative plot of the actual values and the predictions generated by the TFT-XGBoost model for the first 100 samples, clearly demonstrating the model’s performance. The line representing the actual values (True Values) serves as the reference, while the TFT-XGBoost prediction line illustrates the model’s output. The predictions made by TFT-XGBoost closely align with the true values, indicating the model’s high accuracy in forecasting electricity consumption. Overall, TFT-XGBoost exhibits strong performance, with its predictions nearly overlapping the actual values, as XGBoost effectively mitigates residual errors from the TFT predictions.

images

Figure 4: The comparison between actual values and TFT-XGBoost model predictions for the first 100 samples.

One of the key advantages of the proposed method is its interpretability. The TFT model leverages multi-head attention mechanisms and interpretable layers to identify the significance of variables and their interrelationships. This feature enables users to recognize the key variables influencing electricity consumption in the city of Tabriz and make more optimal decisions. Multi-head Attention Mechanism: The multi-head attention mechanism in TFT (Temporal Fusion Transformer) enables the model to identify both long-term and short-term relationships between variables. For instance, in predicting electricity consumption in the city of Tabriz, this mechanism can demonstrate how temperature variations over time influence power usage. The attention weights computed by multi-head attention mechanism are illustrated in Fig. 5.

images

Figure 5: The attention weights computed by the multi-head attention mechanism (case study: Tabriz city).

Interpretable Layers: Interpretable layers in TFT (Temporal Fusion Transformer) calculate the importance of each variable in the final prediction. For instance, in this study, variables such as temperature, humidity, day of the week, and holidays were identified as key factors influencing electricity consumption in the city of Tabriz. Fig. 6 illustrates the relative importance of these variables. The high importance assigned to temperature (coefficient ≈ 0.32) aligns closely with the continental climate characteristics of Tabriz, which features extremely hot summers (often exceeding 40°C) and cold winters; this finding could encourage grid managers to implement targeted demand management programs during peak cooling periods (e.g., incentives for adopting energy-efficient appliances in summer). The prominent weekly patterns (coefficient ≈ 0.18) are primarily driven by reduced consumption on weekend holidays and heightened administrative/commercial activity on weekdays, thereby creating opportunities for load shifting strategies. Furthermore, the significance of industrial activity (coefficient ≈ 0.21) corresponds to the presence of large industrial zones on the outskirts of Tabriz, suggesting that policymakers could use incentive-based contracts to shift industrial peak demand to off-peak hours. These practical insights underscore the added value of the proposed model in supporting targeted demand-side decision-making and local energy policy formulation. To assess the stability of feature importance findings under varying conditions, we analyzed attention weights independently for each of the four seasons and across different time granularities (weekly and monthly). The results demonstrate a high degree of stability in the identified patterns: Temperature consistently exhibits the highest importance across all seasons (ranging from 0.3 to 0.35), while the importance of industrial activity shows a marginal increase during the winter months (reaching up to 0.25). This stability reinforces the interpretability of the model and its reliability for practical applications in grid management.

images

Figure 6: The relative importance of variables in predicting electricity consumption.

To validate the statistical significance of the identified key factors, permutation feature importance was employed with 100 repetitions. The results indicated that temperature (importance ≈ 0.32, p-value < 0.001), weekly patterns (importance ≈ 0.18, p-value < 0.001), and industrial activity (importance ≈ 0.21, p-value < 0.001) all exhibit statistically significant importance. This test confirms that the observed interpretability patterns are not random.

Model interpretability enables users to make more optimal decisions. For instance, if temperature is identified as a key variable, electricity distribution companies in Tabriz can better plan their energy production for hot or cold days of the year. To evaluate the performance of the proposed method, its results were compared with five other established methods. These methods include LSTM, GRU, Random Forest, SVR, and ARIMA. Table 3 presents a comparison of evaluation criteria of the proposed method with other methods. To enhance the credibility of the results, all models were executed 20 times using different random seeds, and the performance metrics are reported along with their standard deviations (±SD). Additionally, a paired t-test was conducted between the proposed model and the standalone TFT model, confirming that the improvements are statistically significant at the p < 0.01 level.

images

As shown in Table 3, the proposed method demonstrates superior performance across all evaluation metrics compared to other methods. For instance, the RMSE of the proposed method is 0.1109, whereas this value is 0.1225 and 0.1217 for LSTM and GRU, respectively. This improvement stems from the integration of TFT’s capabilities in time-series data processing and XGBoost’s ability to reduce residual errors. For greater clarity and comprehension of the diagrams, the predicted values of all methods along with the predicted values of the proposed method and the actual electrical load consumption have been plotted in Fig. 7. As evident in all figures and in comparison, with the proposed method while considering the actual power consumption, the suggested TFT-XGBoost model outperforms the other methods and accurately predicts the actual electrical load consumption.

images images

Figure 7: Comparison of actual values and predicted values of the proposed method with other methods.

To evaluate the robustness of the model under extreme conditions, its performance was examined during two identified events: (a) the severe heatwave in the summer of 2022 (temperatures exceeding 40°C for more than five consecutive days in July), and (b) the substantial reduction in industrial consumption during the Nowruz holidays in 2023 (an average 25% decrease in consumption compared to normal days). Table 4 shows the performance results of the proposed method and TFT in the considered critical conditions (heatwave 2022 and Nowruz holidays 2023). As can be observed, the proposed hybrid model exhibits lower MAPE and RMSE values in both events compared to the standalone TFT model, indicating its superior capability in managing severe fluctuations.

images

4  Conclusion

In this paper, a hybrid approach for predicting electricity consumption in the city of Tabriz was presented, leveraging the capabilities of the Temporal Fusion Transformer (TFT) model and XGBoost. The results demonstrated that this method not only significantly enhances prediction accuracy but also provides interpretability and enables the identification of key factors influencing electricity consumption. The integration of TFT and XGBoost reduced the prediction error (RMSE) from 0.1249 to 0.1109, indicating an approximately 11.2% improvement in predictive accuracy. Furthermore, a comparative analysis between the proposed method and five other established approaches (including LSTM, GRU, Random Forest, SVR, and ARIMA) revealed that the hybrid model exhibits superior performance in terms of both prediction accuracy and interpretability.

A key advantage of this method lies in its utilization of multi-head attention mechanisms and interpretable layers in TFT, which enables the identification of variable importance and their interrelationships. For instance, in this study, variables such as temperature, humidity, weekdays, and holidays were identified as key factors influencing electricity consumption in the city of Tabriz. This capability empowers users to make more informed decisions and develop more precise energy production and distribution plans.

This study is based exclusively on hourly electricity consumption data from Tabriz (2021–2023), which limits its immediate generalizability to other climatic zones, city sizes, consumption structures, or developing urban contexts with different socio-economic and industrial profiles. The model was trained and evaluated on a single-region dataset without external validation on geographically or structurally diverse networks. Additionally, the current implementation does not explicitly model rare extreme events (beyond the analyzed heatwave and holiday periods) or incorporate real-time data streams, which may affect performance under highly volatile or unprecedented conditions. Finally, computational requirements of the TFT component remain relatively high compared to simpler statistical methods, potentially constraining deployment on resource-limited edge devices.

According to the obtained results, the proposed method can be widely applied in the energy industry and energy management systems. This approach not only enhances prediction accuracy but also contributes to the optimization of energy distribution network management by providing valuable insights into the factors influencing electricity consumption. In future work, the proposed model has been evaluated solely on data from the city of Tabriz, and its generalizability to other developing urban contexts requires further validation, while other techniques may be employed to improve the model’s robustness against noisy data and reduce computational resource requirements. Overall, this study represents a significant step toward improving electricity consumption forecasting and optimizing energy management in Tabriz and other similar regions.

Acknowledgement: The authors would like to thanks Asia Pacific University of Technology Malaysia for their kind support.

Funding Statement: The authors received no specific funding for this study.

Author Contributions: Data Collection: Sobhan Manjili and Saeid Jafarzadeh Ghoushchi; Research Methodology: Sobhan Manjili, Saeid Jafarzadeh Ghoushchi, Mohammad Reza Maghami and Mazlan Mohamed; Writing reviewing: Sobhan Manjili, Saeid Jafarzadeh Ghoushchi, Mohammad Reza Maghami and Mazlan Mohamed; Supervision and validation: Saeid Jafarzadeh Ghoushchi, Mohammad Reza Maghami and Mazlan Mohamed; Funding and administrative: Mohammad Reza Maghami, Mazlan Mohamed and Sobhan Manjili. All authors reviewed and approved the final version of the manuscript.

Availability of Data and Materials: Data available on request from the authors. The data that support the findings of this study are available from the corresponding authors, upon reasonable request.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest.

References

1. Ye Y, Zuo W, Wang G. A comprehensive review of energy-related data for U.S. commercial buildings. Energy Build. 2019;186(3):126–37. doi:10.1016/j.enbuild.2019.01.020. [Google Scholar] [CrossRef]

2. Lou Y, Jayantha WM, Shen L, Liu Z, Shu T. The application of low-carbon city (LCC) indicators—a comparison between academia and practice. Sustain Cities Soc. 2019;51:101677. doi:10.1016/j.scs.2019.101677. [Google Scholar] [CrossRef]

3. Building energy databook. [cited 2017 Nov 23]. Available from: https://data.openei.org/submissions/144. [Google Scholar]

4. El-hawary ME. The smart grid—state-of-the-art and future trends. Electr Power Compon Syst. 2014;42(3–4):239–50. doi:10.1080/15325008.2013.868558. [Google Scholar] [CrossRef]

5. IEC smart grid standardization roadmap. [cited 2017 Nov 23]. Available from: https://www.itrco.jp/libraries/IEC-SmartgridStandardizationRoadmap.pdf. [Google Scholar]

6. Huang P, Sun Y. A collaborative demand control of nearly zero energy buildings in response to dynamic pricing for performance improvements at cluster level. Energy. 2019;174(12):911–21. doi:10.1016/j.energy.2019.02.192. [Google Scholar] [CrossRef]

7. Paone A, Bacher JP. The impact of building occupant behavior on energy efficiency and methods to influence it: a review of the state of the art. Energies. 2018;11(4):953. doi:10.3390/en11040953. [Google Scholar] [CrossRef]

8. De Silva D, Yu X, Alahakoon D, Holmes G. A data mining framework for electricity consumption analysis from meter data. IEEE Trans Ind Inf. 2011;7(3):399–407. doi:10.1109/tii.2011.2158844. [Google Scholar] [CrossRef]

9. Menezes AC, Cripps A, Bouchlaghem D, Buswell R. Predicted vs. actual energy performance of non-domestic buildings: using post-occupancy evaluation data to reduce the performance gap. Appl Energy. 2012;97(2):355–64. doi:10.1016/j.apenergy.2011.11.075. [Google Scholar] [CrossRef]

10. Al Mamun A, Sohel M, Mohammad N, Haque Sunny MS, Dipta DR, Hossain E. A comprehensive review of the load forecasting techniques using single and hybrid predictive models. IEEE Access. 2020;8:134911–39. doi:10.1109/access.2020.3010702. [Google Scholar] [CrossRef]

11. AbuBaker M. Data mining applications in understanding electricity consumers’ behavior: a case study of tulkarm district. Palestine Energies. 2019;12(22):4287. doi:10.3390/en12224287. [Google Scholar] [CrossRef]

12. Bhaskar K, Singh SN. AWNN-assisted wind power forecasting using feed-forward neural network. IEEE Trans Sustain Energy. 2012;3(2):306–15. doi:10.1109/tste.2011.2182215. [Google Scholar] [CrossRef]

13. Chang GW, Lu HJ, Chang YR, Lee YD. An improved neural network-based approach for short-term wind speed and power forecast. Renew Energy. 2017;105:301–11. doi:10.1016/j.renene.2016.12.071. [Google Scholar] [CrossRef]

14. Mocanu E, Nguyen PH, Gibescu M, Kling WL. Deep learning for estimating building energy consumption. Sustain Energy Grids Netw. 2016;6(4):91–9. doi:10.1016/j.segan.2016.02.005. [Google Scholar] [CrossRef]

15. Friedrich L, Afshari A. Short-term forecasting of the Abu Dhabi electricity load using multiple weather variables. Energy Proc. 2015;75(4):3014–26. doi:10.1016/j.egypro.2015.07.616. [Google Scholar] [CrossRef]

16. Bakirtzis EA, Simoglou CK, Biskas PN, Labridis DP, Bakirtzis AG. Comparison of advanced power system operations models for large-scale renewable integration. Electr Power Syst Res. 2015;128:90–9. doi:10.1016/j.epsr.2015.06.025. [Google Scholar] [CrossRef]

17. Kolokotsa D. The role of smart grids in the building sector. Energy Build. 2016;116:703–8. doi:10.1016/j.enbuild.2015.12.033. [Google Scholar] [CrossRef]

18. Goude Y, Nedellec R, Kong N. Local short and middle term electricity load forecasting with semi-parametric additive models. IEEE Trans Smart Grid. 2014;5(1):440–6. doi:10.1109/tsg.2013.2278425. [Google Scholar] [CrossRef]

19. Zhao HX, Magoulès F. A review on the prediction of building energy consumption. Renew Sustain Energy Rev. 2012;16(6):3586–92. doi:10.1016/j.rser.2012.02.049. [Google Scholar] [CrossRef]

20. Foucquier A, Robert S, Suard F, Stéphan L, Jay A. State of the art in building modelling and energy performances prediction: a review. Renew Sustain Energy Rev. 2013;23(1):272–88. doi:10.1016/j.rser.2013.03.004. [Google Scholar] [CrossRef]

21. U.S. Department of Energy. EnergyPlus documentation. Washington, DC, USA: U.S. Department of Energy; 2010. [Google Scholar]

22. Gellert A, Fiore U, Florea A, Chis R, Palmieri F. Forecasting electricity consumption and production in smart homes through statistical methods. Sustain Cities Soc. 2022;76(1):103426. doi:10.1016/j.scs.2021.103426. [Google Scholar] [CrossRef]

23. Chitalia G, Pipattanasomporn M, Garg V, Rahman S. Robust short-term electrical load forecasting framework for commercial buildings using deep recurrent neural networks. Appl Energy. 2020;278(4):115410. doi:10.1016/j.apenergy.2020.115410. [Google Scholar] [CrossRef]

24. Lim B, Arık SÖ, Loeff N, Pfister T. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. Int J Forecast. 2021;37(4):1748–64. doi:10.1016/j.ijforecast.2021.03.012. [Google Scholar] [CrossRef]

25. Ahmad T, Chen H. Nonlinear autoregressive and random forest approaches to forecasting electricity load for utility energy management systems. Sustain Cities Soc. 2019;45(2):460–73. doi:10.1016/j.scs.2018.12.013. [Google Scholar] [CrossRef]

26. Somu N, Raman MRG, Ramamritham K. A hybrid model for building energy consumption forecasting using long short term memory networks. Appl Energy. 2020;261(22):114131. doi:10.1016/j.apenergy.2019.114131. [Google Scholar] [CrossRef]

27. Mariano-Hernández D, Hernández-Callejo L, García FS, Duque-Perez O, Zorita-Lamadrid AL. A review of energy consumption forecasting in smart buildings: methods, input variables, forecasting horizon and metrics. Appl Sci. 2020;10(23):8323. doi:10.3390/app10238323. [Google Scholar] [CrossRef]

28. Zhang G, Tian C, Li C, Zhang JJ, Zuo W. Accurate forecasting of building energy consumption via a novel ensembled deep learning method considering the cyclic feature. Energy. 2020;201(2):117531. doi:10.1016/j.energy.2020.117531. [Google Scholar] [CrossRef]

29. He Y, Wu P, Li Y, Wang Y, Tao F, Wang Y. A generic energy prediction model of machine tools using deep learning algorithms. Appl Energy. 2020;275:115402. doi:10.1016/j.apenergy.2020.115402. [Google Scholar] [CrossRef]

30. Gopinath R, Kumar M, Prakash Chandra Joshua C, Srinivas K. Energy management using non-intrusive load monitoring techniques-State-of-the-art and future research directions. Sustain Cities Soc. 2020;62(3):102411. doi:10.1016/j.scs.2020.102411. [Google Scholar] [CrossRef]

31. Gupta P, Singh R. PV power forecasting based on data-driven models: a review. Int J Sustain Eng. 2021;14(6):1733–55. doi:10.1080/19397038.2021.1986590. [Google Scholar] [CrossRef]

32. Yu KH, Chen YA, Jaimes E, Wu WC, Liao KK, Liao JC, et al. Optimization of thermal comfort, indoor quality, and energy-saving in campus classroom through deep Q learning. Case Stud Therm Eng. 2021;24(2):100842. doi:10.1016/j.csite.2021.100842. [Google Scholar] [CrossRef]

33. Trizoglou P, Liu X, Lin Z. Fault detection by an ensemble framework of Extreme Gradient Boosting (XGBoost) in the operation of offshore wind turbines. Renew Energy. 2021;179:945–62. doi:10.1016/j.renene.2021.07.085. [Google Scholar] [CrossRef]

34. Zhang M, Li J, Li Y, Xu R. Deep learning for short-term voltage stability assessment of power systems. IEEE Access. 2021;9:29711–8. doi:10.1109/access.2021.3057659. [Google Scholar] [CrossRef]

35. Fekri MN, Patel H, Grolinger K, Sharma V. Deep learning for load forecasting with smart meter data: online adaptive recurrent neural network. Appl Energy. 2021;282(3):116177. doi:10.1016/j.apenergy.2020.116177. [Google Scholar] [CrossRef]

36. Singh P, Masud M, Hossain MS, Kaur A, Muhammad G, Ghoneim A. Privacy-preserving serverless computing using federated learning for smart grids. IEEE Trans Ind Inf. 2022;18(11):7843–52. doi:10.1109/tii.2021.3126883. [Google Scholar] [CrossRef]

37. Islam BU, Ahmed SF. Short-term electrical load demand forecasting based on LSTM and RNN deep neural networks. Math Probl Eng. 2022;2022(12):2316474. doi:10.1155/2022/2316474. [Google Scholar] [CrossRef]

38. Khan W, Walker S, Zeiler W. Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy. 2022;240(17):122812. doi:10.1016/j.energy.2021.122812. [Google Scholar] [CrossRef]

39. Rahman A, Srikumar V, Smith AD. Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks. Appl Energy. 2018;212(3–4):372–85. doi:10.1016/j.apenergy.2017.12.051. [Google Scholar] [CrossRef]

40. López Santos M, Díaz García S, García-Santiago X, Ogando-Martínez A, Echevarría Camarero F, Blázquez Gil G, et al. Deep learning and transfer learning techniques applied to short-term load forecasting of data-poor buildings in local energy communities. Energy Build. 2023;292(16):113164. doi:10.1016/j.enbuild.2023.113164. [Google Scholar] [CrossRef]

41. Nguyen VG, Duong XQ, Nguyen LH, Nguyen PQP, Priya JC, Truong TH, et al. An extensive investigation on leveraging machine learning techniques for high-precision predictive modeling of CO2 emission. Energy Sources Part A Recovery Util Environ Eff. 2023;45(3):9149–77. doi:10.1080/15567036.2023.2231898. [Google Scholar] [CrossRef]

42. Lin J, Fernández JA, Rayhana R, Zaji A, Zhang R, Herrera OE, et al. Predictive analytics for building power demand: day-ahead forecasting and anomaly prediction. Energy Build. 2022;255(6):111670. doi:10.1016/j.enbuild.2021.111670. [Google Scholar] [CrossRef]

43. Xu B, Zhu F, Zhong PA, Chen J, Liu W, Ma Y, et al. Identifying long-term effects of using hydropower to complement wind power uncertainty through stochastic programming. Appl Energy. 2019;253(2):113535. doi:10.1016/j.apenergy.2019.113535. [Google Scholar] [CrossRef]

44. Semmelmann L, Henni S, Weinhardt C. Load forecasting for energy communities: a novel LSTM-XGBoost hybrid model based on smart meter data. Energy Inform. 2022;5(1):24. doi:10.1186/s42162-022-00212-9. [Google Scholar] [CrossRef]

45. Dakheel F, Çevik M. Optimizing smart grid load forecasting via a hybrid long short-term memory-XGBoost framework: enhancing accuracy, robustness, and energy management. Energies. 2025;18(11):2842. doi:10.3390/en18112842. [Google Scholar] [CrossRef]

46. Cui J, Kuang W, Geng K, Bi A, Bi F, Zheng X, et al. Advanced short-term load forecasting with XGBoost-RF feature selection and CNN-GRU. Processes. 2024;12(11):2466. doi:10.3390/pr12112466. [Google Scholar] [CrossRef]

47. Wang Z, Chen Z, Yang Y, Liu C, Li XA, Wu J. A hybrid Autoformer framework for electricity demand forecasting. Energy Rep. 2023;9(2):3800–12. doi:10.1016/j.egyr.2023.02.083. [Google Scholar] [CrossRef]

48. Ferreira ABA, Leite JB, Salvadeo DHP. Power substation load forecasting using interpretable transformer-based temporal fusion neural networks. Electr Power Syst Res. 2025;238:111169. doi:10.1016/j.epsr.2024.111169. [Google Scholar] [CrossRef]

49. Giacomazzi E, Haag F, Hopf K. Short-term electricity load forecasting using the temporal fusion transformer: effect of grid hierarchies and data sources. In: Proceedings of the 14th ACM International Conference on Future Energy Systems; 2023 Jun 20–23; Orlando, FL, USA. p. 353–60. doi:10.1145/3575813.3597345. [Google Scholar] [CrossRef]

50. Xiao Y, Hu X, Lin Y, Lu Y, Jing R, Zhao Y. Interpretable short-term electricity load forecasting considering small sample heatwaves. Appl Energy. 2025;398:126417. doi:10.1016/j.apenergy.2025.126417. [Google Scholar] [CrossRef]

51. Bordass B, Cohen R, Field J. Energy performance of non-domestic buildings: closing the credibility gap. Build Perform Congr. 2004;2004:1–10. [Google Scholar]


Cite This Article

APA Style
Manjili, S., Ghoushchi, S.J., Maghami, M.R., Mohamed, M. (2026). Interpretable AI Hybrid Model for Electricity Demand Forecasting: Combining TFT and XGBoost in Smart Grid Data. Computer Modeling in Engineering & Sciences, 147(1), 23. https://doi.org/10.32604/cmes.2026.076217
Vancouver Style
Manjili S, Ghoushchi SJ, Maghami MR, Mohamed M. Interpretable AI Hybrid Model for Electricity Demand Forecasting: Combining TFT and XGBoost in Smart Grid Data. Comput Model Eng Sci. 2026;147(1):23. https://doi.org/10.32604/cmes.2026.076217
IEEE Style
S. Manjili, S. J. Ghoushchi, M. R. Maghami, and M. Mohamed, “Interpretable AI Hybrid Model for Electricity Demand Forecasting: Combining TFT and XGBoost in Smart Grid Data,” Comput. Model. Eng. Sci., vol. 147, no. 1, pp. 23, 2026. https://doi.org/10.32604/cmes.2026.076217


cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 288

    View

  • 72

    Download

  • 0

    Like

Share Link