Open Access iconOpen Access

ARTICLE

crossmark

Multi-Scale Feature Fusion and Advanced Representation Learning for Multi Label Image Classification

Naikang Zhong1, Xiao Lin1,2,3,4,*, Wen Du5, Jin Shi6

1 Institute of Artificial Intelligence on Education Research, College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai, 200234, China
2 Lab for Educational Big Data and Policymaking, Ministry of Education, Shanghai Normal University, Shanghai, 200234, China
3 Shanghai Intelligent Education Big Data Engineering Technology Research Center, Shanghai Normal University, Shanghai, 200234, China
4 Shanghai Online Education Research Base for Primary and Secondary Schools, Shanghai, 200234, China
5 DS Information Technology Co., Ltd., Shanghai, 200032, China
6 Faculty of Innovation Engineering, Macau university of Science and Technology, Macau, 999078, China

* Corresponding Author: Xiao Lin. Email: email

(This article belongs to the Special Issue: The Latest Deep Learning Architectures for Artificial Intelligence Applications)

Computers, Materials & Continua 2025, 82(3), 5285-5306. https://doi.org/10.32604/cmc.2025.059102

Abstract

Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images. Obtaining class-specific precise representations at different scales is a key aspect of feature representation. However, existing methods often rely on the single-scale deep feature, neglecting shallow and deeper layer features, which poses challenges when predicting objects of varying scales within the same image. Although some studies have explored multi-scale features, they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales. To address these issues, we propose a two-stage, three-branch Transformer-based framework. The first stage incorporates multi-scale image feature extraction and hierarchical scale attention. This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales, improving the model’s generalization to diverse object scales. The second stage includes a global feature enhancement module and a region selection module. The global feature enhancement module strengthens interconnections between different image regions, mitigating the issue of incomplete representations, while the region selection module models the cross-modal relationships between image features and labels. Together, these components enable the efficient acquisition of class-specific precise feature representations. Extensive experiments on public datasets, including COCO2014, VOC2007, and VOC2012, demonstrate the effectiveness of our proposed method. Our approach achieves consistent performance gains of 0.3%, 0.4%, and 0.2% over state-of-the-art methods on the three datasets, respectively. These results validate the reliability and superiority of our approach for multi-label image classification.

Keywords

Image classification; multi-label; multi scale; attention mechanisms; feature fusion

Cite This Article

APA Style
Zhong, N., Lin, X., Du, W., Shi, J. (2025). Multi-scale feature fusion and advanced representation learning for multi label image classification. Computers, Materials & Continua, 82(3), 5285–5306. https://doi.org/10.32604/cmc.2025.059102
Vancouver Style
Zhong N, Lin X, Du W, Shi J. Multi-scale feature fusion and advanced representation learning for multi label image classification. Comput Mater Contin. 2025;82(3):5285–5306. https://doi.org/10.32604/cmc.2025.059102
IEEE Style
N. Zhong, X. Lin, W. Du, and J. Shi, “Multi-Scale Feature Fusion and Advanced Representation Learning for Multi Label Image Classification,” Comput. Mater. Contin., vol. 82, no. 3, pp. 5285–5306, 2025. https://doi.org/10.32604/cmc.2025.059102



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 330

    View

  • 113

    Download

  • 0

    Like

Share Link