Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (29)
  • Open Access

    ARTICLE

    HLR-Net: A Hybrid Lip-Reading Model Based on Deep Convolutional Neural Networks

    Amany M. Sarhan1, Nada M. Elshennawy1, Dina M. Ibrahim1,2,*

    CMC-Computers, Materials & Continua, Vol.68, No.2, pp. 1531-1549, 2021, DOI:10.32604/cmc.2021.016509

    Abstract

    Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking. This is a task of decoding the text from the speaker’s mouth movement. This paper proposes a lip-reading model that helps deaf people and persons with hearing problems to understand a speaker by capturing a video of the speaker and inputting it into the proposed model to obtain the corresponding subtitles. Using deep learning technologies makes it easier for users to extract a large number of different features, which can then be converted to probabilities of letters to obtain accurate results. Recently proposed methods for… More >

  • Open Access

    ARTICLE

    Oral English Speech Recognition Based on Enhanced Temporal Convolutional Network

    Hao Wu1,*, Arun Kumar Sangaiah2

    Intelligent Automation & Soft Computing, Vol.28, No.1, pp. 121-132, 2021, DOI:10.32604/iasc.2021.016457

    Abstract In oral English teaching in China, teachers usually improve students’ pronunciation by their subjective judgment. Even to the same student, the teacher gives different suggestions at different times. Students’ oral pronunciation features can be obtained from the reconstructed acoustic and natural language features of speech audio, but the task is complicated due to the embedding of multimodal sentences. To solve this problem, this paper proposes an English speech recognition based on enhanced temporal convolution network. Firstly, a suitable UNet network model is designed to extract the noise of speech signal and achieve the purpose of speech enhancement. Secondly, a network… More >

  • Open Access

    ARTICLE

    A Phoneme-Based Approach for Eliminating Out-of-vocabulary Problem Turkish Speech Recognition Using Hidden Markov Model

    Erdem Yavuz1,∗, Vedat Topuz2

    Computer Systems Science and Engineering, Vol.33, No.6, pp. 429-445, 2018, DOI:10.32604/csse.2018.33.429

    Abstract Since Turkish is a morphologically productive language, it is almost impossible for a word-based recognition system to be realized to completely model Turkish language. Due to the fact that it is difficult for the system to recognize words not introduced to it in a word-based recognition system, recognition success rate drops considerably caused by out-of-vocabulary words. In this study, a speaker-dependent, phoneme-based word recognition system has been designed and implemented for Turkish Language to overcome the problem. An algorithm for finding phoneme-boundaries has been devised in order to segment the word into its phonemes. After the segmentation of words into… More >

  • Open Access

    ARTICLE

    Intelligent Speech Communication Using Double Humanoid Robots

    Li-Hong Juang1,*, Yi-Hua Zhao2

    Intelligent Automation & Soft Computing, Vol.26, No.2, pp. 291-301, 2020, DOI:10.31209/2020.100000164

    Abstract Speech recognition is one of the most convenient forms of human beings engaging in the exchanging of information. In this research, we want to make robots understand human language and communicate with each other through the human language, and to realize man–machine interactive and humanoid– robot interactive. Therefore, this research mainly studies NAO robots’ speech recognition and humanoid communication between double -humanoid robots. This paper introduces the future direction and application prospect of speech recognition as well as its basic method and knowledge of speech recognition fields. This research also proposes the application of the most advanced method—establishment of the… More >

  • Open Access

    ARTICLE

    Noise Cancellation Based on Voice Activity Detection Using Spectral Variation for Speech Recognition in Smart Home Devices

    Jeong-Sik Park1, Seok-Hoon Kim2,*

    Intelligent Automation & Soft Computing, Vol.26, No.1, pp. 149-159, 2020, DOI:10.31209/2019.100000136

    Abstract Variety types of smart home devices have a main function of a human-machine interaction by speech recognition. Speech recognition system may be vulnerable to rapidly changing noises in home environments. This study proposes an efficient noise cancellation approach to eliminate the noises directly on the devices in real time. Firstly, we propose an advanced voice activity detection (VAD) technique to efficiently detect speech and non-speech regions on the basis of spectral property of speech signals. The VAD is then employed to enhance the conventional spectral subtraction method by steadily estimating noise signals in non-speech regions. On several experiments, our approach… More >

  • Open Access

    ARTICLE

    Modified Viterbi Scoring for HMM‐Based Speech Recognition

    Jihyuck Joa, Han‐Gyu Kimb, In‐Cheol Parka, Bang Chul Jungc, Hoyoung Yooc

    Intelligent Automation & Soft Computing, Vol.25, No.2, pp. 351-358, 2019, DOI:10.31209/2019.100000096

    Abstract A modified Viterbi scoring procedure is presented in this paper based on Dijkstra’s shortest-path algorithm. In HMM-based speech recognition systems, the Viterbi scoring plays a significant role in finding the best matching model, but its computational complexity is linearly proportional to the number of reference models and their states. Therefore, the complexity is serious in implementing a high-speed speech recognition system. In the proposed method, the Viterbi scoring is translated into the searching of a minimum path, and the shortest-path algorithm is exploited to decrease the computational complexity while preventing the recognition accuracy from deteriorating. In addition, a two-phase comparison… More >

  • Open Access

    ARTICLE

    Implementation of a Biometric Interface in Voice Controlled Wheelchairs

    Lamia Bouafif1, Noureddine Ellouze2,*

    Sound & Vibration, Vol.54, No.1, pp. 1-15, 2020, DOI:10.32604/sv.2020.08665

    Abstract In order to assist physically handicapped persons in their movements, we developed an embedded isolated word speech recognition system (ASR) applied to voice control of smart wheelchairs. However, in spite of the existence in the industrial market of several kinds of electric wheelchairs, the problem remains the need to manually control this device by hand via joystick; which limits their use especially by people with severe disabilities. Thus, a significant number of disabled people cannot use a standard electric wheelchair or drive it with difficulty. The proposed solution is to use the voice to control and drive the wheelchair instead… More >

  • Open Access

    ARTICLE

    Tibetan Multi-Dialect Speech Recognition Using Latent Regression Bayesian Network and End-To-End Mode

    Yue Zhao1, Jianjian Yue1, Wei Song1,*, Xiaona Xu1, Xiali Li1, Licheng Wu1, Qiang Ji2

    Journal on Internet of Things, Vol.1, No.1, pp. 17-23, 2019, DOI:10.32604/jiot.2019.05866

    Abstract We proposed a method using latent regression Bayesian network (LRBN) to extract the shared speech feature for the input of end-to-end speech recognition model. The structure of LRBN is compact and its parameter learning is fast. Compared with Convolutional Neural Network, it has a simpler and understood structure and less parameters to learn. Experimental results show that the advantage of hybrid LRBN/Bidirectional Long Short-Term Memory-Connectionist Temporal Classification architecture for Tibetan multi-dialect speech recognition, and demonstrate the LRBN is helpful to differentiate among multiple language speech sets. More >

  • Open Access

    ARTICLE

    Tibetan Multi-Dialect Speech and Dialect Identity Recognition

    Yue Zhao1, Jianjian Yue1, Wei Song1,*, Xiaona Xu1, Xiali Li1, Licheng Wu1, Qiang Ji2

    CMC-Computers, Materials & Continua, Vol.60, No.3, pp. 1223-1235, 2019, DOI:10.32604/cmc.2019.05636

    Abstract Tibetan language has very limited resource for conventional automatic speech recognition so far. It lacks of enough data, sub-word unit, lexicons and word inventories for some dialects. And speech content recognition and dialect classification have been treated as two independent tasks and modeled respectively in most prior works. But the two tasks are highly correlated. In this paper, we present a multi-task WaveNet model to perform simultaneous Tibetan multi-dialect speech recognition and dialect identification. It avoids processing the pronunciation dictionary and word segmentation for new dialects, while, in the meantime, allows training speech recognition and dialect identification in a single… More >

Displaying 21-30 on page 3 of 29. Per Page