Table of Content

Open Access iconOpen Access

ARTICLE

Tibetan Multi-Dialect Speech Recognition Using Latent Regression Bayesian Network and End-To-End Mode

Yue Zhao1, Jianjian Yue1, Wei Song1,*, Xiaona Xu1, Xiali Li1, Licheng Wu1, Qiang Ji2

School of Information and Engineering, Minzu University of China , Beijing, 100081, China.
Rensselaer Polytechnic Institute, 110 Eighth Street, Troy NY 12180-3590, USA.

*Corresponding Author: Wei Song. Email: email.

Journal on Internet of Things 2019, 1(1), 17-23. https://doi.org/10.32604/jiot.2019.05866

Abstract

We proposed a method using latent regression Bayesian network (LRBN) to extract the shared speech feature for the input of end-to-end speech recognition model. The structure of LRBN is compact and its parameter learning is fast. Compared with Convolutional Neural Network, it has a simpler and understood structure and less parameters to learn. Experimental results show that the advantage of hybrid LRBN/Bidirectional Long Short-Term Memory-Connectionist Temporal Classification architecture for Tibetan multi-dialect speech recognition, and demonstrate the LRBN is helpful to differentiate among multiple language speech sets.

Keywords


Cite This Article

Y. Zhao, J. Yue, W. Song, X. Xu, X. Li et al., "Tibetan multi-dialect speech recognition using latent regression bayesian network and end-to-end mode," Journal on Internet of Things, vol. 1, no.1, pp. 17–23, 2019.



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 3358

    View

  • 1672

    Download

  • 0

    Like

Share Link