Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.075152
Special Issues
Table of Content

Open Access

ARTICLE

Robust Swin Transformer for Vehicle Re-Identification with Dynamic Feature Fusion

Saifullah Tumrani1,2,*, Abdul Jabbar Siddiqui2,3,*
1 BioQuant, Ruprecht-Karls-Universität Heidelberg (Uni Heidelberg), Heidelberg, 69120, Germany
2 SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum & Minerals (KFUPM), Dhahran, 31261, Saudi Arabia
3 Computer Engineering Department, King Fahd University of Petroleum & Minerals (KFUPM), Dhahran, 31261, Saudi Arabia
* Corresponding Author: Saifullah Tumrani. Email: email; Abdul Jabbar Siddiqui. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.075152

Received 26 October 2025; Accepted 17 December 2025; Published online 04 January 2026

Abstract

Vehicle re-identification (ReID) is a challenging task in intelligent transportation, and urban surveillance systems due to its complications in camera viewpoints, vehicle scales, and environmental conditions. Recent transformer-based approaches have shown impressive performance by utilizing global dependencies, these models struggle with aspect ratio distortions and may overlook fine-grained local attributes crucial for distinguishing visually similar vehicles. We introduce a framework based on Swin Transformers that addresses these challenges by implementing three components. First, to improve feature robustness and maintain vehicle proportions, our Aspect Ratio-Aware Swin Transformer (AR-Swin) preserve the native ratio via letterbox, uses a non-square (16 × 8) patch-embedding stem, and keeps fixed 7 × 7 token windows. Second, we introduce a Dynamic Feature Fusion Network (DFFNet) that adaptively integrates global Swin features with local attribute embeddings; such as color and vehicle type enabling more discriminative representations. Third, our Regional Attention Blocks incorporate regional masks into the transformer’s windowed attention mechanism, effectively highlighting critical details like manufacturer logos or lights. On VeRi-776, we obtain 82.55 mAP, 97.26 Rank-1 and 99.23 Rank-5, and on VehicleID we obtain 91.8 Rank-1 and 97.75 Rank-5. The design is drop-in for Swin backbones and emphasizes robustness without increasing architectural complexity. Code: https://github.com/sft110/Swinvreid.

Keywords

Vehicle ReID; swin transformer; aspect ratio robustness; multi-attribute learning
  • 102

    View

  • 20

    Download

  • 0

    Like

Share Link