Improved Blending Attention Mechanism in Visual Question Answering

Siyu Lu; Yueming Ding; Zhengtong Yin; Mingzhe Liu; Xuan Liu; Wenfeng Zheng; Lirong Yin

doi:10.32604/csse.2023.038598

Open Access icon Open Access

ARTICLE

Improved Blending Attention Mechanism in Visual Question Answering

Siyu Lu¹, Yueming Ding¹, Zhengtong Yin², Mingzhe Liu^3,*, Xuan Liu⁴, Wenfeng Zheng^1,*, Lirong Yin⁵

1 School of Automation, University of Electronic Science and Technology of China, Chengdu, 610054, China
2 College of Resource and Environment Engineering, Guizhou University, Guiyang, 550025, China
3 School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, 325000, China
4 School of Public Affairs and Administration, University of Electronic Science and Technology of China, Chengdu, 611731, China
5 Department of Geography and Anthropology, Louisiana State University, Baton Rouge, 70803, LA, USA

* Corresponding Authors: Mingzhe Liu. Email: email ; Wenfeng Zheng. Email: email

Computer Systems Science and Engineering 2023, 47(1), 1149-1161. https://doi.org/10.32604/csse.2023.038598

Received 20 December 2022; Accepted 10 April 2023; Issue published 26 May 2023

Abstract

Visual question answering (VQA) has attracted more and more attention in computer vision and natural language processing. Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks. Analysis of all features may cause information redundancy and heavy computational burden. Attention mechanism is a wise way to solve this problem. However, using single attention mechanism may cause incomplete concern of features. This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method. In the case that the attention mechanism will cause the loss of the original features, a small portion of image features were added as compensation. For the attention mechanism of text features, a self-attention mechanism was introduced, and the internal structural features of sentences were strengthened to improve the overall model. The results show that attention mechanism and feature compensation add 6.1% accuracy to multimodal low-rank bilinear pooling network.

Keywords

Visual question answering; spatial attention mechanism; channel attention mechanism; image feature processing; text feature extraction

Cite This Article

APA Style

Lu, S., Ding, Y., Yin, Z., Liu, M., Liu, X. et al. (2023). Improved blending attention mechanism in visual question answering. Computer Systems Science and Engineering, 47(1), 1149-1161. https://doi.org/10.32604/csse.2023.038598

Vancouver Style

Lu S, Ding Y, Yin Z, Liu M, Liu X, Zheng W, et al. Improved blending attention mechanism in visual question answering. Comput Syst Sci Eng. 2023;47(1):1149-1161 https://doi.org/10.32604/csse.2023.038598

IEEE Style

S. Lu et al., "Improved Blending Attention Mechanism in Visual Question Answering," Comput. Syst. Sci. Eng., vol. 47, no. 1, pp. 1149-1161. 2023. https://doi.org/10.32604/csse.2023.038598

BibTex EndNote RIS

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Improved Blending Attention Mechanism in Visual Question Answering

Abstract

Keywords

Cite This Article

562

297

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link