Open Access
ARTICLE
Hierarchical Attention Transformer for Multivariate Time Series Forecasting
School of Computer Science, Nanjing University of Information Science and Technology, Nanjing, China
* Corresponding Author: Kelvin Amos Nicodemas. Email:
(This article belongs to the Special Issue: Advances in Time Series Analysis, Modelling and Forecasting)
Computers, Materials & Continua 2026, 87(2), 78 https://doi.org/10.32604/cmc.2026.074305
Received 08 October 2025; Accepted 13 January 2026; Issue published 12 March 2026
Abstract
Multivariate time series forecasting plays a crucial role in decision-making for systems like energy grids and transportation networks, where temporal patterns emerge across diverse scales from short-term fluctuations to long-term trends. However, existing Transformer-based methods often process data at a single resolution or handle multiple scales independently, overlooking critical cross-scale interactions that influence prediction accuracy. To address this gap, we introduce the Hierarchical Attention Transformer (HAT), which enables direct information exchange between temporal hierarchies through a novel cross-scale attention mechanism. HAT extracts multi-scale features using hierarchical convolutional-recurrent blocks, fuses them via temperature-controlled mechanisms, and optimizes gradient flow with residual connections for stable training. Evaluations on eight benchmark datasets show HAT outperforming state-of-the-art baselines, with average reductions of 8.2% in MSE and 7.5% in MAE across horizons, while achieving aKeywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools