Table of Content

Open Access


Tibetan Multi-Dialect Speech Recognition Using Latent Regression Bayesian Network and End-To-End Mode

Yue Zhao1, Jianjian Yue1, Wei Song1,*, Xiaona Xu1, Xiali Li1, Licheng Wu1, Qiang Ji2
School of Information and Engineering, Minzu University of China , Beijing, 100081, China.
Rensselaer Polytechnic Institute, 110 Eighth Street, Troy NY 12180-3590, USA.
*Corresponding Author: Wei Song. Email: .

Journal on Internet of Things 2019, 1(1), 17-23.


We proposed a method using latent regression Bayesian network (LRBN) to extract the shared speech feature for the input of end-to-end speech recognition model. The structure of LRBN is compact and its parameter learning is fast. Compared with Convolutional Neural Network, it has a simpler and understood structure and less parameters to learn. Experimental results show that the advantage of hybrid LRBN/Bidirectional Long Short-Term Memory-Connectionist Temporal Classification architecture for Tibetan multi-dialect speech recognition, and demonstrate the LRBN is helpful to differentiate among multiple language speech sets.


Multi-dialect speech recognition, Tibetan language, latent regression bayesian network, end-to-end model

Cite This Article

Y. Zhao, J. Yue, W. Song, X. Xu, X. Li et al., "Tibetan multi-dialect speech recognition using latent regression bayesian network and end-to-end mode," Journal on Internet of Things, vol. 1, no.1, pp. 17–23, 2019.

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2743


  • 1403


  • 0


Share Link

WeChat scan