LREGT: Local Relationship Enhanced Gated Transformer for Image Captioning

Yuting He; Zetao Jiang

doi:10.32604/cmc.2025.065169

Open Access icon Open Access

ARTICLE

LREGT: Local Relationship Enhanced Gated Transformer for Image Captioning

Yuting He, Zetao Jiang^*

Guangxi Key Lab of Image and Graphic Intelligent Processing, Guilin University of Electronic Technology, Guilin, 541004, China

* Corresponding Author: Zetao Jiang. Email: email

Computers, Materials & Continua 2025, 84(3), 5487-5508. https://doi.org/10.32604/cmc.2025.065169

Received 05 March 2025; Accepted 11 June 2025; Issue published 30 July 2025

Abstract

Existing Transformer-based image captioning models typically rely on the self-attention mechanism to capture long-range dependencies, which effectively extracts and leverages the global correlation of image features. However, these models still face challenges in effectively capturing local associations. Moreover, since the encoder extracts global and local association features that focus on different semantic information, semantic noise may occur during the decoding stage. To address these issues, we propose the Local Relationship Enhanced Gated Transformer (LREGT). In the encoder part, we introduce the Local Relationship Enhanced Encoder (LREE), whose core component is the Local Relationship Enhanced Module (LREM). LREM consists of two novel designs: the Local Correlation Perception Module (LCPM) and the Local-Global Fusion Module (LGFM), which are beneficial for generating a comprehensive feature representation that integrates both global and local information. In the decoder part, we propose the Dual-level Multi-branch Gated Decoder (DMGD). It first creates multiple decoding branches to generate multi-perspective contextual feature representations. Subsequently, it employs the Dual-Level Gating Mechanism (DLGM) to model the multi-level relationships of these multi-perspective contextual features, enhancing their fine-grained semantics and intrinsic relationship representations. This ultimately leads to the generation of high-quality and semantically rich image captions. Experiments on the standard MSCOCO dataset demonstrate that LREGT achieves state-of-the-art performance, with a CIDEr score of 140.8 and BLEU-4 score of 41.3, significantly outperforming existing mainstream methods. These results highlight LREGT’s superiority in capturing complex visual relationships and resolving semantic noise during decoding.

Keywords

Image captioning; local relation enhancement; local correlation perception; dual-level gating mechanism

Cite This Article

APA Style

He, Y., Jiang, Z. (2025). LREGT: Local Relationship Enhanced Gated Transformer for Image Captioning. Computers, Materials & Continua, 84(3), 5487–5508. https://doi.org/10.32604/cmc.2025.065169

Vancouver Style

He Y, Jiang Z. LREGT: Local Relationship Enhanced Gated Transformer for Image Captioning. Comput Mater Contin. 2025;84(3):5487–5508. https://doi.org/10.32604/cmc.2025.065169

IEEE Style

Y. He and Z. Jiang, “LREGT: Local Relationship Enhanced Gated Transformer for Image Captioning,” Comput. Mater. Contin., vol. 84, no. 3, pp. 5487–5508, 2025. https://doi.org/10.32604/cmc.2025.065169

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

LREGT: Local Relationship Enhanced Gated Transformer for Image Captioning

Abstract

Keywords

Cite This Article

928

556

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link