Encoder-Decoder Based Multi-Feature Fusion Model for Image Caption  Generation

Mingyang Duan; Jin Liu; Shiqi Lv

doi:10.32604/jbd.2021.016674

Open Access icon Open Access

ARTICLE

Encoder-Decoder Based Multi-Feature Fusion Model for Image Caption Generation

Mingyang Duan, Jin Liu^*, Shiqi Lv

Shanghai Maritime University, Shanghai, 201306, China

* Corresponding Author: Jin Liu. Email: email

Journal on Big Data 2021, 3(2), 77-83. https://doi.org/10.32604/jbd.2021.016674

Received 08 January 2021; Accepted 07 April 2021; Issue published 13 April 2021

Download PDF

Abstract

Image caption generation is an essential task in computer vision and image understanding. Contemporary image caption generation models usually use the encoder-decoder model as the underlying network structure. However, in the traditional Encoder-Decoder architectures, only the global features of the images are extracted, while the local information of the images is not well utilized. This paper proposed an Encoder-Decoder model based on fused features and a novel mechanism for correcting the generated caption text. We use VGG16 and Faster R-CNN to extract global and local features in the encoder first. Then, we train the bidirectional LSTM network with the fused features in the decoder. Finally, the local features extracted is used to correct the caption text. The experiment results prove that the effectiveness of the proposed method.

Keywords

Image understanding; image captioning; deep learning; fused features

Cite This Article

APA Style

Duan, M., Liu, J., Lv, S. (2021). Encoder-Decoder Based Multi-Feature Fusion Model for Image Caption Generation. Journal on Big Data, 3(2), 77–83. https://doi.org/10.32604/jbd.2021.016674

Vancouver Style

Duan M, Liu J, Lv S. Encoder-Decoder Based Multi-Feature Fusion Model for Image Caption Generation. J Big Data. 2021;3(2):77–83. https://doi.org/10.32604/jbd.2021.016674

IEEE Style

M. Duan, J. Liu, and S. Lv, “Encoder-Decoder Based Multi-Feature Fusion Model for Image Caption Generation,” J. Big Data, vol. 3, no. 2, pp. 77–83, 2021. https://doi.org/10.32604/jbd.2021.016674

BibTex EndNote RIS

Copyright © 2021 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Encoder-Decoder Based Multi-Feature Fusion Model for Image Caption Generation

Abstract

Keywords

Cite This Article

2505

2079

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link