Stacked Attention Networks for Referring Expressions  Comprehension

Yugang Li; Haibo Sun; Zhe Chen; Yudan Ding; Siqi Zhou

doi:10.32604/cmc.2020.011886

Open Access icon Open Access

ARTICLE

Stacked Attention Networks for Referring Expressions Comprehension

Yugang Li^{1, *}, Haibo Sun¹, Zhe Chen¹, Yudan Ding¹, Siqi Zhou²

1 Academy of Broadcasting Science, Beijing, 100866, China.
2 School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore.

* Corresponding Author: Yugang Li. Email: email .

Computers, Materials & Continua 2020, 65(3), 2529-2541. https://doi.org/10.32604/cmc.2020.011886

Received 03 June 2020; Accepted 03 July 2020; Issue published 16 September 2020

Download PDF

Abstract

Referring expressions comprehension is the task of locating the image region described by a natural language expression, which refer to the properties of the region or the relationships with other regions. Most previous work handles this problem by selecting the most relevant regions from a set of candidate regions, when there are many candidate regions in the set these methods are inefficient. Inspired by recent success of image captioning by using deep learning methods, in this paper we proposed a framework to understand the referring expressions by multiple steps of reasoning. We present a model for referring expressions comprehension by selecting the most relevant region directly from the image. The core of our model is a recurrent attention network which can be seen as an extension of Memory Network. The proposed model capable of improving the results by multiple computational hops. We evaluate the proposed model on two referring expression datasets: Visual Genome and Flickr30k Entities. The experimental results demonstrate that the proposed model outperform previous state-of-the-art methods both in accuracy and efficiency. We also conduct an ablation experiment to show that the performance of the model is not getting better with the increase of the attention layers.

Keywords

Stacked attention networks, referring expressions, visual relationship, deep learning.

Cite This Article

APA Style

Li, Y., Sun, H., Chen, Z., Ding, Y., Zhou, S. (2020). Stacked attention networks for referring expressions comprehension. Computers, Materials & Continua, 65(3), 2529-2541. https://doi.org/10.32604/cmc.2020.011886

Vancouver Style

Li Y, Sun H, Chen Z, Ding Y, Zhou S. Stacked attention networks for referring expressions comprehension. Comput Mater Contin. 2020;65(3):2529-2541 https://doi.org/10.32604/cmc.2020.011886

IEEE Style

Y. Li, H. Sun, Z. Chen, Y. Ding, and S. Zhou "Stacked Attention Networks for Referring Expressions Comprehension," Comput. Mater. Contin., vol. 65, no. 3, pp. 2529-2541. 2020. https://doi.org/10.32604/cmc.2020.011886

BibTex EndNote RIS

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Stacked Attention Networks for Referring Expressions Comprehension

Abstract

Keywords

Cite This Article

1900

1191

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link