Tech Science Press - Publisher of Open Access Journals

News & Announcements

30 January 2026
Tech Science Press Shares Integrity Insights on AI-Enabled Paper Mills at Charleston Asia Conference
27 January 2026
SDHM-Recommended: I3CSE 2026 in Guangzhou
26 January 2026
TSP Establishes Strategic Cooperation with Chinese Medical Association Publishing House (CMAPH)
05 January 2026
Prof. Lin Lu Appointed Editor-in-Chief of Energy Engineering
29 December 2025
Two More Tech Science Press Journals Now Indexed in Chemical Abstracts Service (CAS) Databases
24 December 2025
Oncologie Welcomes Dr. Lei Zheng as Editor-in-Chief

Title/Keywords
Author/Affliations
Journal
Article Type
Start Year
End Year

Update Searching Clear

Show export options

Articles
Online

Search Results (4)

Open Access

ARTICLE

Performance vs. Complexity Comparative Analysis of Multimodal Bilinear Pooling Fusion Approaches for Deep Learning-Based Visual Arabic-Question Answering Systems

Sarah M. Kamel^1,*, Mai A. Fadel², Lamiaa Elrefaei^1,3, Shimaa I. Hassan^1,4

CMES-Computer Modeling in Engineering & Sciences, Vol.143, No.1, pp. 373-411, 2025, DOI:10.32604/cmes.2025.062837 - 11 April 2025

Abstract Visual question answering (VQA) is a multimodal task, involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer. In this paper, we propose a VQA system intended to answer yes/no questions about real-world images, in Arabic. To support a robust VQA system, we work in two directions: (1) Using deep neural networks to semantically represent the given image and question in a fine-grained manner, namely ResNet-152 and Gated Recurrent Units (GRU). (2) Studying the role of the utilized multimodal bilinear… More >

View
1207

Download
524
Open Access

ARTICLE

Adjusted Reasoning Module for Deep Visual Question Answering Using Vision Transformer

Christine Dewi^1,3, Hanna Prillysca Chernovita², Stephen Abednego Philemon¹, Christian Adi Ananta¹, Abbott Po Shun Chen^4,*

CMC-Computers, Materials & Continua, Vol.81, No.3, pp. 4195-4216, 2024, DOI:10.32604/cmc.2024.057453 - 19 December 2024

Abstract Visual Question Answering (VQA) is an interdisciplinary artificial intelligence (AI) activity that integrates computer vision and natural language processing. Its purpose is to empower machines to respond to questions by utilizing visual information. A VQA system typically takes an image and a natural language query as input and produces a textual answer as output. One major obstacle in VQA is identifying a successful method to extract and merge textual and visual data. We examine “Fusion” Models that use information from both the text encoder and picture encoder to efficiently perform the visual question-answering challenge. For More >

View
1721

Download
798
Open Access

ARTICLE

Improving VQA via Dual-Level Feature Embedding Network

Yaru Song^*, Huahu Xu, Dikai Fang

Intelligent Automation & Soft Computing, Vol.39, No.3, pp. 397-416, 2024, DOI:10.32604/iasc.2023.040521 - 11 July 2024

Abstract Visual Question Answering (VQA) has sparked widespread interest as a crucial task in integrating vision and language. VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions. The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively. However, it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details, which is the advantage of grid-based features. In… More >

View
1419

Download
1212
Open Access

ARTICLE

WMA: A Multi-Scale Self-Attention Feature Extraction Network Based on Weight Sharing for VQA

Yue Li, Jin Liu^*, Shengjie Shang

Journal on Big Data, Vol.3, No.3, pp. 111-118, 2021, DOI:10.32604/jbd.2021.017169 - 22 November 2021

Abstract Visual Question Answering (VQA) has attracted extensive research focus and has become a hot topic in deep learning recently. The development of computer vision and natural language processing technology has contributed to the advancement of this research area. Key solutions to improve the performance of VQA system exist in feature extraction, multimodal fusion, and answer prediction modules. There exists an unsolved issue in the popular VQA image feature extraction module that extracts the fine-grained features from objects of different scale difficultly. In this paper, a novel feature extraction network that combines multi-scale convolution and self-attention More >

View
2131

Download
2811

Displaying 1-10 on page 1 of 4. Per Page

Performance vs. Complexity Comparative Analysis of Multimodal Bilinear Pooling Fusion Approaches for Deep Learning-Based Visual Arabic-Question Answering Systems

View

1207

Download

524

Adjusted Reasoning Module for Deep Visual Question Answering Using Vision Transformer

View

1721

Download

798

Improving VQA via Dual-Level Feature Embedding Network

View

1419

Download

1212

WMA: A Multi-Scale Self-Attention Feature Extraction Network Based on Weight Sharing for VQA

View

2131

Download

2811

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: