Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (677)
  • Open Access

    ARTICLE

    RSG-Conformer: ReLU-Based Sparse and Grouped Conformer for Audio-Visual Speech Recognition

    Yewei Xiao, Xin Du*, Wei Zeng

    CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.072145 - 12 January 2026

    Abstract Audio-visual speech recognition (AVSR), which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions, has attracted significant research interest. However, Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length. In addition, Conformer-based architectures may not provide sufficient flexibility for modeling local dependencies at different granularities. To mitigate these limitations, this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer (RSG-Conformer) architecture. Specifically, we propose a Global-enhanced Sparse… More >

  • Open Access

    ARTICLE

    Action Recognition via Shallow CNNs on Intelligently Selected Motion Data

    Jalees Ur Rahman1, Muhammad Hanif1, Usman Haider2,*, Saeed Mian Qaisar3,*, Sarra Ayouni4

    CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.071251 - 12 January 2026

    Abstract Deep neural networks have achieved excellent classification results on several computer vision benchmarks. This has led to the popularity of machine learning as a service, where trained algorithms are hosted on the cloud and inference can be obtained on real-world data. In most applications, it is important to compress the vision data due to the enormous bandwidth and memory requirements. Video codecs exploit spatial and temporal correlations to achieve high compression ratios, but they are computationally expensive. This work computes the motion fields between consecutive frames to facilitate the efficient classification of videos. However, contrary… More >

  • Open Access

    ARTICLE

    MDGET-MER: Multi-Level Dynamic Gating and Emotion Transfer for Multi-Modal Emotion Recognition

    Musheng Chen1,2, Qiang Wen1, Xiaohong Qiu1,2, Junhua Wu1,*, Wenqing Fu1

    CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.071207 - 12 January 2026

    Abstract In multi-modal emotion recognition, excessive reliance on historical context often impedes the detection of emotional shifts, while modality heterogeneity and unimodal noise limit recognition performance. Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions. To address these issues, we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition. A dynamic gating mechanism is applied across unimodal encoding, cross-modal alignment, and emotion transfer modeling, substantially improving noise robustness and feature alignment. First, we construct a unimodal encoder More >

  • Open Access

    ARTICLE

    Speech Emotion Recognition Based on the Adaptive Acoustic Enhancement and Refined Attention Mechanism

    Jun Li1, Chunyan Liang1,*, Zhiguo Liu1, Fengpei Ge2

    CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.071011 - 12 January 2026

    Abstract To enhance speech emotion recognition capability, this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup (AAM) and improved coordinate and shuffle attention (ICASA) methods. The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients, thus enabling information fusion of speech data with different emotions at the acoustic level. The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention (ICA) and shuffle attention (SA) techniques. The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and More >

  • Open Access

    ARTICLE

    A Hybrid Deep Learning Approach for Real-Time Cheating Behaviour Detection in Online Exams Using Video Captured Analysis

    Dao Phuc Minh Huy1, Gia Nhu Nguyen1, Dac-Nhuong Le2,*

    CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.070948 - 12 January 2026

    Abstract Online examinations have become a dominant assessment mode, increasing concerns over academic integrity. To address the critical challenge of detecting cheating behaviours, this study proposes a hybrid deep learning approach that combines visual detection and temporal behaviour classification. The methodology utilises object detection models—You Only Look Once (YOLOv12), Faster Region-based Convolutional Neural Network (RCNN), and Single Shot Detector (SSD) MobileNet—integrated with classification models such as Convolutional Neural Networks (CNN), Bidirectional Gated Recurrent Unit (Bi-GRU), and CNN-LSTM (Long Short-Term Memory). Two distinct datasets were used: the Online Exam Proctoring (EOP) dataset from Michigan State University and… More >

  • Open Access

    ARTICLE

    Deep Retraining Approach for Category-Specific 3D Reconstruction Models from a Single 2D Image

    Nour El Houda Kaiber1, Tahar Mekhaznia1, Akram Bennour1,*, Mohammed Al-Sarem2,3,*, Zakaria Lakhdara4, Fahad Ghaban2, Mohammad Nassef5,6

    CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.070337 - 12 January 2026

    Abstract The generation of high-quality 3D models from single 2D images remains challenging in terms of accuracy and completeness. Deep learning has emerged as a promising solution, offering new avenues for improvements. However, building models from scratch is computationally expensive and requires large datasets. This paper presents a transfer-learning-based approach for category-specific 3D reconstruction from a single 2D image. The core idea is to fine-tune a pre-trained model on specific object categories using new, unseen data, resulting in specialized versions of the model that are better adapted to reconstruct particular objects. The proposed approach utilizes a… More >

  • Open Access

    ARTICLE

    Automatic Recognition Algorithm of Pavement Defects Based on S3M and SDI Modules Using UAV-Collected Road Images

    Hongcheng Zhao1, Tong Yang 2, Yihui Hu2, Fengxiang Guo2,*

    Structural Durability & Health Monitoring, Vol.20, No.1, 2026, DOI:10.32604/sdhm.2025.068987 - 08 January 2026

    Abstract With the rapid development of transportation infrastructure, ensuring road safety through timely and accurate highway inspection has become increasingly critical. Traditional manual inspection methods are not only time-consuming and labor-intensive, but they also struggle to provide consistent, high-precision detection and real-time monitoring of pavement surface defects. To overcome these limitations, we propose an Automatic Recognition of Pavement Defect (ARPD) algorithm, which leverages unmanned aerial vehicle (UAV)-based aerial imagery to automate the inspection process. The ARPD framework incorporates a backbone network based on the Selective State Space Model (S3M), which is designed to capture long-range temporal dependencies.… More >

  • Open Access

    ARTICLE

    Research on Integrating Deep Learning-Based Vehicle Brand and Model Recognition into a Police Intelligence Analysis Platform

    Shih-Lin Lin*, Cheng-Wei Li

    CMC-Computers, Materials & Continua, Vol.86, No.2, pp. 1-20, 2026, DOI:10.32604/cmc.2025.071915 - 09 December 2025

    Abstract This study focuses on developing a deep learning model capable of recognizing vehicle brands and models, integrated with a law enforcement intelligence platform to overcome the limitations of existing license plate recognition techniques—particularly in handling counterfeit, obscured, or absent plates. The research first entailed collecting, annotating, and classifying images of various vehicle models, leveraging image processing and feature extraction methodologies to train the model on Microsoft Custom Vision. Experimental results indicate that, for most brands and models, the system achieves stable and relatively high performance in Precision, Recall, and Average Precision (AP). Furthermore, simulated tests… More >

  • Open Access

    ARTICLE

    MFCCT: A Robust Spectral-Temporal Fusion Method with DeepConvLSTM for Human Activity Recognition

    Rashid Jahangir1,*, Nazik Alturki2, Muhammad Asif Nauman3, Faiqa Hanif1

    CMC-Computers, Materials & Continua, Vol.86, No.2, pp. 1-20, 2026, DOI:10.32604/cmc.2025.071574 - 09 December 2025

    Abstract Human activity recognition (HAR) is a method to predict human activities from sensor signals using machine learning (ML) techniques. HAR systems have several applications in various domains, including medicine, surveillance, behavioral monitoring, and posture analysis. Extraction of suitable information from sensor data is an important part of the HAR process to recognize activities accurately. Several research studies on HAR have utilized Mel frequency cepstral coefficients (MFCCs) because of their effectiveness in capturing the periodic pattern of sensor signals. However, existing MFCC-based approaches often fail to capture sufficient temporal variability, which limits their ability to distinguish… More >

  • Open Access

    ARTICLE

    Industrial EdgeSign: NAS-Optimized Real-Time Hand Gesture Recognition for Operator Communication in Smart Factories

    Meixi Chu1, Xinyu Jiang1,*, Yushu Tao2

    CMC-Computers, Materials & Continua, Vol.86, No.2, pp. 1-23, 2026, DOI:10.32604/cmc.2025.071533 - 09 December 2025

    Abstract Industrial operators need reliable communication in high-noise, safety-critical environments where speech or touch input is often impractical. Existing gesture systems either miss real-time deadlines on resource-constrained hardware or lose accuracy under occlusion, vibration, and lighting changes. We introduce Industrial EdgeSign, a dual-path framework that combines hardware-aware neural architecture search (NAS) with large multimodal model (LMM) guided semantics to deliver robust, low-latency gesture recognition on edge devices. The searched model uses a truncated ResNet50 front end, a dimensional-reduction network that preserves spatiotemporal structure for tubelet-based attention, and localized Transformer layers tuned for on-device inference. To reduce… More >

Displaying 1-10 on page 1 of 677. Per Page