Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (135)
  • Open Access

    ARTICLE

    VIF-YOLO: A Visible-Infrared Fusion YOLO Model for Real-Time Human Detection in Dense Smoke Environments

    Wenhe Chen1, Yue Wang1, Shuonan Shen1, Leer Hua1, Caixia Zheng2, Qi Pu1,*, Xundiao Ma3,*

    CMC-Computers, Materials & Continua, Vol.87, No.1, 2026, DOI:10.32604/cmc.2025.074682 - 10 February 2026

    Abstract In fire rescue scenarios, traditional manual operations are highly dangerous, as dense smoke, low visibility, extreme heat, and toxic gases not only hinder rescue efficiency but also endanger firefighters’ safety. Although intelligent rescue robots can enter hazardous environments in place of humans, smoke poses major challenges for human detection algorithms. These challenges include the attenuation of visible and infrared signals, complex thermal fields, and interference from background objects, all of which make it difficult to accurately identify trapped individuals. To address this problem, we propose VIF-YOLO, a visible–infrared fusion model for real-time human detection in… More >

  • Open Access

    ARTICLE

    Transformer-Driven Multimodal for Human-Object Detection and Recognition for Intelligent Robotic Surveillance

    Aman Aman Ullah1,2,#, Yanfeng Wu1,#, Shaheryar Najam3, Nouf Abdullah Almujally4, Ahmad Jalal5,6,*, Hui Liu1,7,8,*

    CMC-Computers, Materials & Continua, Vol.87, No.1, 2026, DOI:10.32604/cmc.2025.072508 - 10 February 2026

    Abstract Human object detection and recognition is essential for elderly monitoring and assisted living however, models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings. To address this, we present SCENET-3D, a transformer-driven multimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline. In the first stage, scene analysis, rich geometric and texture descriptors are extracted from RGB frames, including surface-normal histograms, angles between neighboring normals, Zernike moments, directional standard deviation, and Gabor-filter responses. In the second stage, scene-object analysis, non-human objects… More >

  • Open Access

    ARTICLE

    LLM-Powered Multimodal Reasoning for Fake News Detection

    Md. Ahsan Habib1, Md. Anwar Hussen Wadud2, M. F. Mridha3,*, Md. Jakir Hossen4,*

    CMC-Computers, Materials & Continua, Vol.87, No.1, 2026, DOI:10.32604/cmc.2025.070235 - 10 February 2026

    Abstract The problem of fake news detection (FND) is becoming increasingly important in the field of natural language processing (NLP) because of the rapid dissemination of misleading information on the web. Large language models (LLMs) such as GPT-4. Zero excels in natural language understanding tasks but can still struggle to distinguish between fact and fiction, particularly when applied in the wild. However, a key challenge of existing FND methods is that they only consider unimodal data (e.g., images), while more detailed multimodal data (e.g., user behaviour, temporal dynamics) is neglected, and the latter is crucial for… More >

  • Open Access

    ARTICLE

    A Novel Unified Framework for Automated Generation and Multimodal Validation of UML Diagrams

    Van-Viet Nguyen1, Huu-Khanh Nguyen2, Kim-Son Nguyen1, Thi Minh-Hue Luong1, Duc-Quang Vu1, Trung-Nghia Phung3, The-Vinh Nguyen1,*

    CMES-Computer Modeling in Engineering & Sciences, Vol.146, No.1, 2026, DOI:10.32604/cmes.2025.075442 - 29 January 2026

    Abstract It remains difficult to automate the creation and validation of Unified Modeling Language (UML) diagrams due to unstructured requirements, limited automated pipelines, and the lack of reliable evaluation methods. This study introduces a cohesive architecture that amalgamates requirement development, UML synthesis, and multimodal validation. First, LLaMA-3.2-1B-Instruct was utilized to generate user-focused requirements. Then, DeepSeek-R1-Distill-Qwen-32B applies its reasoning skills to transform these requirements into PlantUML code. Using this dual-LLM pipeline, we constructed a synthetic dataset of 11,997 UML diagrams spanning six major diagram families. Rendering analysis showed that 89.5% of the generated diagrams compile correctly, while… More >

  • Open Access

    ARTICLE

    A Dual-Stream Framework for Landslide Segmentation with Cross-Attention Enhancement and Gated Multimodal Fusion

    Md Minhazul Islam1,2, Yunfei Yin1,2,*, Md Tanvir Islam1,2, Zheng Yuan1,2, Argho Dey1,2

    CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.072550 - 12 January 2026

    Abstract Automatic segmentation of landslides from remote sensing imagery is challenging because traditional machine learning and early CNN-based models often fail to generalize across heterogeneous landscapes, where segmentation maps contain sparse and fragmented landslide regions under diverse geographical conditions. To address these issues, we propose a lightweight dual-stream siamese deep learning framework that integrates optical and topographical data fusion with an adaptive decoder, guided multimodal fusion, and deep supervision. The framework is built upon the synergistic combination of cross-attention, gated fusion, and sub-pixel upsampling within a unified dual-stream architecture specifically optimized for landslide segmentation, enabling efficient… More >

  • Open Access

    ARTICLE

    A Multimodal Sentiment Analysis Method Based on Multi-Granularity Guided Fusion

    Zilin Zhang1, Yan Liu1,*, Jia Liu2, Senbao Hou3, Yuping Zhang1, Chenyuan Wang1

    CMC-Computers, Materials & Continua, Vol.86, No.2, pp. 1-14, 2026, DOI:10.32604/cmc.2025.072286 - 09 December 2025

    Abstract With the growing demand for more comprehensive and nuanced sentiment understanding, Multimodal Sentiment Analysis (MSA) has gained significant traction in recent years and continues to attract widespread attention in the academic community. Despite notable advances, existing approaches still face critical challenges in both information modeling and modality fusion. On one hand, many current methods rely heavily on encoders to extract global features from each modality, which limits their ability to capture latent fine-grained emotional cues within modalities. On the other hand, prevailing fusion strategies often lack mechanisms to model semantic discrepancies across modalities and to… More >

  • Open Access

    ARTICLE

    MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding

    Ans D. Alghamdi*

    CMC-Computers, Materials & Continua, Vol.86, No.2, pp. 1-35, 2026, DOI:10.32604/cmc.2025.071210 - 09 December 2025

    Abstract Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities. Current approaches struggle with cross-modal alignment, temporal consistency, and robust handling of noisy or incomplete inputs across multiple modalities. We propose MultiAgent-Chain of Thought (CoT), a novel multi-agent chain-of-thought reasoning framework where specialized agents for text, vision, and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms. Our architecture incorporates self-reflection modules, conflict resolution protocols, and dynamic rationale alignment to enhance consistency, factual accuracy, and user engagement. More >

  • Open Access

    ARTICLE

    Bearing Fault Diagnosis Based on Multimodal Fusion GRU and Swin-Transformer

    Yingyong Zou*, Yu Zhang, Long Li, Tao Liu, Xingkui Zhang

    CMC-Computers, Materials & Continua, Vol.86, No.1, pp. 1-24, 2026, DOI:10.32604/cmc.2025.068246 - 10 November 2025

    Abstract Fault diagnosis of rolling bearings is crucial for ensuring the stable operation of mechanical equipment and production safety in industrial environments. However, due to the nonlinearity and non-stationarity of collected vibration signals, single-modal methods struggle to capture fault features fully. This paper proposes a rolling bearing fault diagnosis method based on multi-modal information fusion. The method first employs the Hippopotamus Optimization Algorithm (HO) to optimize the number of modes in Variational Mode Decomposition (VMD) to achieve optimal modal decomposition performance. It combines Convolutional Neural Networks (CNN) and Gated Recurrent Units (GRU) to extract temporal features… More >

  • Open Access

    ARTICLE

    CAPGen: An MLLM-Based Framework Integrated with Iterative Optimization Mechanism for Cultural Artifacts Poster Generation

    Qianqian Hu, Chuhan Li, Mohan Zhang, Fang Liu*

    CMC-Computers, Materials & Continua, Vol.86, No.1, pp. 1-17, 2026, DOI:10.32604/cmc.2025.068225 - 10 November 2025

    Abstract Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform, the demands of visual communication keep increasing for promoting traditional cultural artifacts online. As an effective medium, posters serve to attract public attention and facilitate broader engagement with cultural artifacts. However, existing poster generation methods mainly rely on fixed templates and manual design, which limits their scalability and adaptability to the diverse visual and semantic features of the artifacts. Therefore, we propose CAPGen, an automated aesthetic Cultural Artifacts Poster Generation framework built on a Multimodal Large Language More >

  • Open Access

    REVIEW

    Human Behaviour Classification in Emergency Situations Using Machine Learning with Multimodal Data: A Systematic Review (2020–2025)

    Mirza Murad Baig1, Muhammad Rehan Faheem2,*, Lal Khan3,*, Hannan Adeel2, Syed Asim Ali Shah4

    CMES-Computer Modeling in Engineering & Sciences, Vol.145, No.3, pp. 2895-2935, 2025, DOI:10.32604/cmes.2025.073172 - 23 December 2025

    Abstract With growing urban areas, the climate continues to change as a result of growing populations, and hence, the demand for better emergency response systems has become more important than ever. Human Behaviour Classification (HBC) systems have started to play a vital role by analysing data from different sources to detect signs of emergencies. These systems are being used in many critical areas like healthcare, public safety, and disaster management to improve response time and to prepare ahead of time. But detecting human behaviour in such stressful conditions is not simple; it often comes with noisy… More > Graphic Abstract

    Human Behaviour Classification in Emergency Situations Using Machine Learning with Multimodal Data: A Systematic Review (2020–2025)

Displaying 1-10 on page 1 of 135. Per Page