TY  - EJOU
AU  - Wang, Qi 
AU  - Nicodemas, Kelvin Amos 

TI  - Hierarchical Attention Transformer for Multivariate Time Series Forecasting
T2  - Computers, Materials \& Continua

PY  - 2026
VL  - 87
IS  - 2
SN  - 1546-2226

AB  - Multivariate time series forecasting plays a crucial role in decision-making for systems like energy grids and transportation networks, where temporal patterns emerge across diverse scales from short-term fluctuations to long-term trends. However, existing Transformer-based methods often process data at a single resolution or handle multiple scales independently, overlooking critical cross-scale interactions that influence prediction accuracy. To address this gap, we introduce the Hierarchical Attention Transformer (HAT), which enables direct information exchange between temporal hierarchies through a novel cross-scale attention mechanism. HAT extracts multi-scale features using hierarchical convolutional-recurrent blocks, fuses them via temperature-controlled mechanisms, and optimizes gradient flow with residual connections for stable training. Evaluations on eight benchmark datasets show HAT outperforming state-of-the-art baselines, with average reductions of 8.2% in MSE and 7.5% in MAE across horizons, while achieving a <mml:math id="mml-ieqn-1"><mml:mn>6.1</mml:mn><mml:mo>×</mml:mo></mml:math> training speedup over patch-based methods. These advancements highlight HAT’s potential for applications requiring multi-resolution temporal modeling.
KW  - Time series forecasting; multi-scale temporal modeling; cross-scale attention; transformer architecture; hierarchical embeddings; gradient flow optimization

DO  - 10.32604/cmc.2026.074305