Multi-Scale Feature Fusion and Advanced Representation Learning for Multi Label Image Classification

Naikang Zhong; Xiao Lin; Wen Du; Jin Shi

doi:10.32604/cmc.2025.059102

Open Access icon Open Access

ARTICLE

Multi-Scale Feature Fusion and Advanced Representation Learning for Multi Label Image Classification

Naikang Zhong¹, Xiao Lin^1,2,3,4,*, Wen Du⁵, Jin Shi⁶

1 Institute of Artificial Intelligence on Education Research, College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai, 200234, China
2 Lab for Educational Big Data and Policymaking, Ministry of Education, Shanghai Normal University, Shanghai, 200234, China
3 Shanghai Intelligent Education Big Data Engineering Technology Research Center, Shanghai Normal University, Shanghai, 200234, China
4 Shanghai Online Education Research Base for Primary and Secondary Schools, Shanghai, 200234, China
5 DS Information Technology Co., Ltd., Shanghai, 200032, China
6 Faculty of Innovation Engineering, Macau university of Science and Technology, Macau, 999078, China

* Corresponding Author: Xiao Lin. Email: email

(This article belongs to the Special Issue: The Latest Deep Learning Architectures for Artificial Intelligence Applications)

Computers, Materials & Continua 2025, 82(3), 5285-5306. https://doi.org/10.32604/cmc.2025.059102

Received 28 September 2024; Accepted 10 January 2025; Issue published 06 March 2025

Abstract

Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images. Obtaining class-specific precise representations at different scales is a key aspect of feature representation. However, existing methods often rely on the single-scale deep feature, neglecting shallow and deeper layer features, which poses challenges when predicting objects of varying scales within the same image. Although some studies have explored multi-scale features, they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales. To address these issues, we propose a two-stage, three-branch Transformer-based framework. The first stage incorporates multi-scale image feature extraction and hierarchical scale attention. This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales, improving the model’s generalization to diverse object scales. The second stage includes a global feature enhancement module and a region selection module. The global feature enhancement module strengthens interconnections between different image regions, mitigating the issue of incomplete representations, while the region selection module models the cross-modal relationships between image features and labels. Together, these components enable the efficient acquisition of class-specific precise feature representations. Extensive experiments on public datasets, including COCO2014, VOC2007, and VOC2012, demonstrate the effectiveness of our proposed method. Our approach achieves consistent performance gains of 0.3%, 0.4%, and 0.2% over state-of-the-art methods on the three datasets, respectively. These results validate the reliability and superiority of our approach for multi-label image classification.

Keywords

Image classification; multi-label; multi scale; attention mechanisms; feature fusion

Cite This Article

APA Style

Zhong, N., Lin, X., Du, W., Shi, J. (2025). Multi-Scale Feature Fusion and Advanced Representation Learning for Multi Label Image Classification. Computers, Materials & Continua, 82(3), 5285–5306. https://doi.org/10.32604/cmc.2025.059102

Vancouver Style

Zhong N, Lin X, Du W, Shi J. Multi-Scale Feature Fusion and Advanced Representation Learning for Multi Label Image Classification. Comput Mater Contin. 2025;82(3):5285–5306. https://doi.org/10.32604/cmc.2025.059102

IEEE Style

N. Zhong, X. Lin, W. Du, and J. Shi, “Multi-Scale Feature Fusion and Advanced Representation Learning for Multi Label Image Classification,” Comput. Mater. Contin., vol. 82, no. 3, pp. 5285–5306, 2025. https://doi.org/10.32604/cmc.2025.059102

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Multi-Scale Feature Fusion and Advanced Representation Learning for Multi Label Image Classification

Abstract

Keywords

Cite This Article

1859

1595

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link