Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (94)
  • Open Access

    ARTICLE

    Speech-Music-Noise Discrimination in Sound Indexing of Multimedia Documents

    Lamia Bouafif1, Noureddine Ellouze2

    Sound & Vibration, Vol.52, No.6, pp. 2-10, 2018, DOI:10.32604/sv.2018.02410

    Abstract Sound indexing and segmentation of digital documents especially in the internet and digital libraries are very useful to simplify and to accelerate the multimedia document retrieval. We can imagine that we can extract multimedia files not only by keywords but also by speech semantic contents. The main difficulty of this operation is the parameterization and modelling of the sound track and the discrimination of the speech, music and noise segments. In this paper, we will present a Speech/Music/Noise indexing interface designed for audio discrimination in multimedia documents. The program uses a statistical method based on ANN and HMM classifiers. After… More >

  • Open Access

    ARTICLE

    Tibetan Multi-Dialect Speech Recognition Using Latent Regression Bayesian Network and End-To-End Mode

    Yue Zhao1, Jianjian Yue1, Wei Song1,*, Xiaona Xu1, Xiali Li1, Licheng Wu1, Qiang Ji2

    Journal on Internet of Things, Vol.1, No.1, pp. 17-23, 2019, DOI:10.32604/jiot.2019.05866

    Abstract We proposed a method using latent regression Bayesian network (LRBN) to extract the shared speech feature for the input of end-to-end speech recognition model. The structure of LRBN is compact and its parameter learning is fast. Compared with Convolutional Neural Network, it has a simpler and understood structure and less parameters to learn. Experimental results show that the advantage of hybrid LRBN/Bidirectional Long Short-Term Memory-Connectionist Temporal Classification architecture for Tibetan multi-dialect speech recognition, and demonstrate the LRBN is helpful to differentiate among multiple language speech sets. More >

  • Open Access

    ARTICLE

    Tibetan Multi-Dialect Speech and Dialect Identity Recognition

    Yue Zhao1, Jianjian Yue1, Wei Song1,*, Xiaona Xu1, Xiali Li1, Licheng Wu1, Qiang Ji2

    CMC-Computers, Materials & Continua, Vol.60, No.3, pp. 1223-1235, 2019, DOI:10.32604/cmc.2019.05636

    Abstract Tibetan language has very limited resource for conventional automatic speech recognition so far. It lacks of enough data, sub-word unit, lexicons and word inventories for some dialects. And speech content recognition and dialect classification have been treated as two independent tasks and modeled respectively in most prior works. But the two tasks are highly correlated. In this paper, we present a multi-task WaveNet model to perform simultaneous Tibetan multi-dialect speech recognition and dialect identification. It avoids processing the pronunciation dictionary and word segmentation for new dialects, while, in the meantime, allows training speech recognition and dialect identification in a single… More >

  • Open Access

    ARTICLE

    Speech Resampling Detection Based on Inconsistency of Band Energy

    Zhifeng Wang1, Diqun Yan1,*, Rangding Wang1, Li Xiang1, Tingting Wu1

    CMC-Computers, Materials & Continua, Vol.56, No.2, pp. 247-259, 2018, DOI: 10.3970/cmc.2018.02902

    Abstract Speech resampling is a typical tempering behavior, which is often integrated into various speech forgeries, such as splicing, electronic disguising, quality faking and so on. By analyzing the principle of resampling, we found that, compared with natural speech, the inconsistency between the bandwidth of the resampled speech and its sampling ratio will be caused because the interpolation process in resampling is imperfect. Based on our observation, a new resampling detection algorithm based on the inconsistency of band energy is proposed. First, according to the sampling ratio of the suspected speech, a band-pass Butterworth filter is designed to filter out the… More >

Displaying 91-100 on page 10 of 94. Per Page