TY - EJOU AU - Wang, Qi AU - Nicodemas, Kelvin Amos TI - Hierarchical Attention Transformer for Multivariate Time Series Forecasting T2 - Computers, Materials \& Continua PY - 2026 VL - 87 IS - 2 SN - 1546-2226 AB - Multivariate time series forecasting plays a crucial role in decision-making for systems like energy grids and transportation networks, where temporal patterns emerge across diverse scales from short-term fluctuations to long-term trends. However, existing Transformer-based methods often process data at a single resolution or handle multiple scales independently, overlooking critical cross-scale interactions that influence prediction accuracy. To address this gap, we introduce the Hierarchical Attention Transformer (HAT), which enables direct information exchange between temporal hierarchies through a novel cross-scale attention mechanism. HAT extracts multi-scale features using hierarchical convolutional-recurrent blocks, fuses them via temperature-controlled mechanisms, and optimizes gradient flow with residual connections for stable training. Evaluations on eight benchmark datasets show HAT outperforming state-of-the-art baselines, with average reductions of 8.2% in MSE and 7.5% in MAE across horizons, while achieving a 6.1× training speedup over patch-based methods. These advancements highlight HAT’s potential for applications requiring multi-resolution temporal modeling. KW - Time series forecasting; multi-scale temporal modeling; cross-scale attention; transformer architecture; hierarchical embeddings; gradient flow optimization DO - 10.32604/cmc.2026.074305