Tech Science Press - Publisher of Open Access Journals

Open Access

ARTICLE

VIF-YOLO: A Visible-Infrared Fusion YOLO Model for Real-Time Human Detection in Dense Smoke Environments

Wenhe Chen¹, Yue Wang¹, Shuonan Shen¹, Leer Hua¹, Caixia Zheng², Qi Pu^1,*, Xundiao Ma^3,*

CMC-Computers, Materials & Continua, Vol.87, No.1, 2026, DOI:10.32604/cmc.2025.074682 - 10 February 2026

Abstract In fire rescue scenarios, traditional manual operations are highly dangerous, as dense smoke, low visibility, extreme heat, and toxic gases not only hinder rescue efficiency but also endanger firefighters’ safety. Although intelligent rescue robots can enter hazardous environments in place of humans, smoke poses major challenges for human detection algorithms. These challenges include the attenuation of visible and infrared signals, complex thermal fields, and interference from background objects, all of which make it difficult to accurately identify trapped individuals. To address this problem, we propose VIF-YOLO, a visible–infrared fusion model for real-time human detection in… More >

Open Access

ARTICLE

Transformer-Driven Multimodal for Human-Object Detection and Recognition for Intelligent Robotic Surveillance

Aman Aman Ullah^1,2,#, Yanfeng Wu^1,#, Shaheryar Najam³, Nouf Abdullah Almujally⁴, Ahmad Jalal^5,6,*, Hui Liu^1,7,8,*

CMC-Computers, Materials & Continua, Vol.87, No.1, 2026, DOI:10.32604/cmc.2025.072508 - 10 February 2026

Abstract Human object detection and recognition is essential for elderly monitoring and assisted living however, models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings. To address this, we present SCENET-3D, a transformer-driven multimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline. In the first stage, scene analysis, rich geometric and texture descriptors are extracted from RGB frames, including surface-normal histograms, angles between neighboring normals, Zernike moments, directional standard deviation, and Gabor-filter responses. In the second stage, scene-object analysis, non-human objects… More >

Open Access

ARTICLE

LLM-Powered Multimodal Reasoning for Fake News Detection

Md. Ahsan Habib¹, Md. Anwar Hussen Wadud², M. F. Mridha^3,*, Md. Jakir Hossen^4,*

CMC-Computers, Materials & Continua, Vol.87, No.1, 2026, DOI:10.32604/cmc.2025.070235 - 10 February 2026

Abstract The problem of fake news detection (FND) is becoming increasingly important in the field of natural language processing (NLP) because of the rapid dissemination of misleading information on the web. Large language models (LLMs) such as GPT-4. Zero excels in natural language understanding tasks but can still struggle to distinguish between fact and fiction, particularly when applied in the wild. However, a key challenge of existing FND methods is that they only consider unimodal data (e.g., images), while more detailed multimodal data (e.g., user behaviour, temporal dynamics) is neglected, and the latter is crucial for… More >

Open Access

ARTICLE

A Novel Unified Framework for Automated Generation and Multimodal Validation of UML Diagrams

Van-Viet Nguyen¹, Huu-Khanh Nguyen², Kim-Son Nguyen¹, Thi Minh-Hue Luong¹, Duc-Quang Vu¹, Trung-Nghia Phung³, The-Vinh Nguyen^1,*

CMES-Computer Modeling in Engineering & Sciences, Vol.146, No.1, 2026, DOI:10.32604/cmes.2025.075442 - 29 January 2026

Abstract It remains difficult to automate the creation and validation of Unified Modeling Language (UML) diagrams due to unstructured requirements, limited automated pipelines, and the lack of reliable evaluation methods. This study introduces a cohesive architecture that amalgamates requirement development, UML synthesis, and multimodal validation. First, LLaMA-3.2-1B-Instruct was utilized to generate user-focused requirements. Then, DeepSeek-R1-Distill-Qwen-32B applies its reasoning skills to transform these requirements into PlantUML code. Using this dual-LLM pipeline, we constructed a synthetic dataset of 11,997 UML diagrams spanning six major diagram families. Rendering analysis showed that 89.5% of the generated diagrams compile correctly, while… More >

Open Access

ARTICLE

A Dual-Stream Framework for Landslide Segmentation with Cross-Attention Enhancement and Gated Multimodal Fusion

Md Minhazul Islam^1,2, Yunfei Yin^1,2,*, Md Tanvir Islam^1,2, Zheng Yuan^1,2, Argho Dey^1,2

CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.072550 - 12 January 2026

Abstract Automatic segmentation of landslides from remote sensing imagery is challenging because traditional machine learning and early CNN-based models often fail to generalize across heterogeneous landscapes, where segmentation maps contain sparse and fragmented landslide regions under diverse geographical conditions. To address these issues, we propose a lightweight dual-stream siamese deep learning framework that integrates optical and topographical data fusion with an adaptive decoder, guided multimodal fusion, and deep supervision. The framework is built upon the synergistic combination of cross-attention, gated fusion, and sub-pixel upsampling within a unified dual-stream architecture specifically optimized for landslide segmentation, enabling efficient… More >

Open Access

ARTICLE

A Multimodal Sentiment Analysis Method Based on Multi-Granularity Guided Fusion

Zilin Zhang¹, Yan Liu^1,*, Jia Liu², Senbao Hou³, Yuping Zhang¹, Chenyuan Wang¹

CMC-Computers, Materials & Continua, Vol.86, No.2, pp. 1-14, 2026, DOI:10.32604/cmc.2025.072286 - 09 December 2025

Abstract With the growing demand for more comprehensive and nuanced sentiment understanding, Multimodal Sentiment Analysis (MSA) has gained significant traction in recent years and continues to attract widespread attention in the academic community. Despite notable advances, existing approaches still face critical challenges in both information modeling and modality fusion. On one hand, many current methods rely heavily on encoders to extract global features from each modality, which limits their ability to capture latent fine-grained emotional cues within modalities. On the other hand, prevailing fusion strategies often lack mechanisms to model semantic discrepancies across modalities and to… More >

Open Access

ARTICLE

MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding

Ans D. Alghamdi^*

CMC-Computers, Materials & Continua, Vol.86, No.2, pp. 1-35, 2026, DOI:10.32604/cmc.2025.071210 - 09 December 2025

Abstract Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities. Current approaches struggle with cross-modal alignment, temporal consistency, and robust handling of noisy or incomplete inputs across multiple modalities. We propose MultiAgent-Chain of Thought (CoT), a novel multi-agent chain-of-thought reasoning framework where specialized agents for text, vision, and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms. Our architecture incorporates self-reflection modules, conflict resolution protocols, and dynamic rationale alignment to enhance consistency, factual accuracy, and user engagement. More >

Open Access

ARTICLE

Bearing Fault Diagnosis Based on Multimodal Fusion GRU and Swin-Transformer

Yingyong Zou^*, Yu Zhang, Long Li, Tao Liu, Xingkui Zhang

CMC-Computers, Materials & Continua, Vol.86, No.1, pp. 1-24, 2026, DOI:10.32604/cmc.2025.068246 - 10 November 2025

Abstract Fault diagnosis of rolling bearings is crucial for ensuring the stable operation of mechanical equipment and production safety in industrial environments. However, due to the nonlinearity and non-stationarity of collected vibration signals, single-modal methods struggle to capture fault features fully. This paper proposes a rolling bearing fault diagnosis method based on multi-modal information fusion. The method first employs the Hippopotamus Optimization Algorithm (HO) to optimize the number of modes in Variational Mode Decomposition (VMD) to achieve optimal modal decomposition performance. It combines Convolutional Neural Networks (CNN) and Gated Recurrent Units (GRU) to extract temporal features… More >

Open Access

ARTICLE

CAPGen: An MLLM-Based Framework Integrated with Iterative Optimization Mechanism for Cultural Artifacts Poster Generation

Qianqian Hu, Chuhan Li, Mohan Zhang, Fang Liu^*

CMC-Computers, Materials & Continua, Vol.86, No.1, pp. 1-17, 2026, DOI:10.32604/cmc.2025.068225 - 10 November 2025

Abstract Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform, the demands of visual communication keep increasing for promoting traditional cultural artifacts online. As an effective medium, posters serve to attract public attention and facilitate broader engagement with cultural artifacts. However, existing poster generation methods mainly rely on fixed templates and manual design, which limits their scalability and adaptability to the diverse visual and semantic features of the artifacts. Therefore, we propose CAPGen, an automated aesthetic Cultural Artifacts Poster Generation framework built on a Multimodal Large Language More >

Human Behaviour Classification in Emergency Situations Using Machine Learning with Multimodal Data: A Systematic Review (2020–2025)

Mirza Murad Baig¹, Muhammad Rehan Faheem^2,*, Lal Khan^3,*, Hannan Adeel², Syed Asim Ali Shah⁴

CMES-Computer Modeling in Engineering & Sciences, Vol.145, No.3, pp. 2895-2935, 2025, DOI:10.32604/cmes.2025.073172 - 23 December 2025

Abstract With growing urban areas, the climate continues to change as a result of growing populations, and hence, the demand for better emergency response systems has become more important than ever. Human Behaviour Classification (HBC) systems have started to play a vital role by analysing data from different sources to detect signs of emergencies. These systems are being used in many critical areas like healthcare, public safety, and disaster management to improve response time and to prepare ahead of time. But detecting human behaviour in such stressful conditions is not simple; it often comes with noisy… More > Graphic Abstract

Human Behaviour Classification in Emergency Situations Using Machine Learning with Multimodal Data: A Systematic Review (2020–2025)

Displaying 1-10 on page 1 of 135. Per Page

View

476

Download

85

View

576

Download

180

View

1405

Download

1057

View

633

Download

187

View

973

Download

342

View

842

Download

340

View

729

Download

325

View

2016

Download

573

View

770

Download

246

View

1509

Download

684

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: