Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment

Emran Al-Buraihy; Dan Wang

doi:10.32604/cmc.2024.048104

Open Access icon Open Access

ARTICLE

Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment

Emran Al-Buraihy, Dan Wang^*

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China

* Corresponding Author: Dan Wang. Email: email

Computers, Materials & Continua 2024, 79(3), 3913-3938. https://doi.org/10.32604/cmc.2024.048104

Received 28 November 2023; Accepted 28 February 2024; Issue published 20 June 2024

Abstract

Cross-lingual image description, the task of generating image captions in a target language from images and descriptions in a source language, is addressed in this study through a novel approach that combines neural network models and semantic matching techniques. Experiments conducted on the Flickr8k and AraImg2k benchmark datasets, featuring images and descriptions in English and Arabic, showcase remarkable performance improvements over state-of-the-art methods. Our model, equipped with the Image & Cross-Language Semantic Matching module and the Target Language Domain Evaluation module, significantly enhances the semantic relevance of generated image descriptions. For English-to-Arabic and Arabic-to-English cross-language image descriptions, our approach achieves a CIDEr score for English and Arabic of 87.9% and 81.7%, respectively, emphasizing the substantial contributions of our methodology. Comparative analyses with previous works further affirm the superior performance of our approach, and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language. In summary, this study advances the field of cross-lingual image description, offering an effective solution for generating image captions across languages, with the potential to impact multilingual communication and accessibility. Future research directions include expanding to more languages and incorporating diverse visual and textual data sources.

Keywords

Cross-language image description; multimodal deep learning; semantic matching; reward mechanisms

Cite This Article

APA Style

Al-Buraihy, E., Wang, D. (2024). Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment. Computers, Materials & Continua, 79(3), 3913–3938. https://doi.org/10.32604/cmc.2024.048104

Vancouver Style

Al-Buraihy E, Wang D. Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment. Comput Mater Contin. 2024;79(3):3913–3938. https://doi.org/10.32604/cmc.2024.048104

IEEE Style

E. Al-Buraihy and D. Wang, “Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment,” Comput. Mater. Contin., vol. 79, no. 3, pp. 3913–3938, 2024. https://doi.org/10.32604/cmc.2024.048104

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment

Abstract

Keywords

Cite This Article

1559

824

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link