Tech Science Press - Publisher of Open Access Journals

News & Announcements

07 May 2024
Tech Science Press Partners with Morressier to Provide Editorial Teams with Integrity Intelligence at Scale
23 April 2024
Revue Internationale de Géomatique (RIG) welcomes its new Editor-in-Chief Prof. Manchun Li
22 March 2024
Henderson Office Address Change Notification
19 March 2024
Frontiers in Heat and Mass Transfer Welcomes Prof. Chun Yang as Editor-in-Chief
24 January 2024
In Memoriam: Professor Kazuo Umezawa
15 January 2024
Tech Science Press Collaborates with STM to Promote Open Access Publishing

Show export options

Articles
Online

Search Results (94)

Open Access

ARTICLE

Speech-Music-Noise Discrimination in Sound Indexing of Multimedia Documents

Lamia Bouafif¹, Noureddine Ellouze²

Sound & Vibration, Vol.52, No.6, pp. 2-10, 2018, DOI:10.32604/sv.2018.02410

Abstract Sound indexing and segmentation of digital documents especially in the internet and digital libraries are very useful to simplify and to accelerate the multimedia document retrieval. We can imagine that we can extract multimedia files not only by keywords but also by speech semantic contents. The main difficulty of this operation is the parameterization and modelling of the sound track and the discrimination of the speech, music and noise segments. In this paper, we will present a Speech/Music/Noise indexing interface designed for audio discrimination in multimedia documents. The program uses a statistical method based on ANN and HMM classifiers. After… More >

View
1676

Download
1391

Like
0
Open Access

ARTICLE

Tibetan Multi-Dialect Speech Recognition Using Latent Regression Bayesian Network and End-To-End Mode

Yue Zhao¹, Jianjian Yue¹, Wei Song^1,*, Xiaona Xu¹, Xiali Li¹, Licheng Wu¹, Qiang Ji²

Journal on Internet of Things, Vol.1, No.1, pp. 17-23, 2019, DOI:10.32604/jiot.2019.05866

Abstract We proposed a method using latent regression Bayesian network (LRBN) to extract the shared speech feature for the input of end-to-end speech recognition model. The structure of LRBN is compact and its parameter learning is fast. Compared with Convolutional Neural Network, it has a simpler and understood structure and less parameters to learn. Experimental results show that the advantage of hybrid LRBN/Bidirectional Long Short-Term Memory-Connectionist Temporal Classification architecture for Tibetan multi-dialect speech recognition, and demonstrate the LRBN is helpful to differentiate among multiple language speech sets. More >

View
3419

Download
1708

Like
0
Open Access

ARTICLE

Tibetan Multi-Dialect Speech and Dialect Identity Recognition

Yue Zhao¹, Jianjian Yue¹, Wei Song^1,*, Xiaona Xu¹, Xiali Li¹, Licheng Wu¹, Qiang Ji²

CMC-Computers, Materials & Continua, Vol.60, No.3, pp. 1223-1235, 2019, DOI:10.32604/cmc.2019.05636

Abstract Tibetan language has very limited resource for conventional automatic speech recognition so far. It lacks of enough data, sub-word unit, lexicons and word inventories for some dialects. And speech content recognition and dialect classification have been treated as two independent tasks and modeled respectively in most prior works. But the two tasks are highly correlated. In this paper, we present a multi-task WaveNet model to perform simultaneous Tibetan multi-dialect speech recognition and dialect identification. It avoids processing the pronunciation dictionary and word segmentation for new dialects, while, in the meantime, allows training speech recognition and dialect identification in a single… More >

View
2603

Download
1556

Like
0

Cited by
2
Open Access

ARTICLE

Speech Resampling Detection Based on Inconsistency of Band Energy

Zhifeng Wang¹, Diqun Yan^1,*, Rangding Wang¹, Li Xiang¹, Tingting Wu¹

CMC-Computers, Materials & Continua, Vol.56, No.2, pp. 247-259, 2018, DOI: 10.3970/cmc.2018.02902

Abstract Speech resampling is a typical tempering behavior, which is often integrated into various speech forgeries, such as splicing, electronic disguising, quality faking and so on. By analyzing the principle of resampling, we found that, compared with natural speech, the inconsistency between the bandwidth of the resampled speech and its sampling ratio will be caused because the interpolation process in resampling is imperfect. Based on our observation, a new resampling detection algorithm based on the inconsistency of band energy is proposed. First, according to the sampling ratio of the suspected speech, a band-pass Butterworth filter is designed to filter out the… More >

View
1989

Download
1383

Like
0

Displaying 91-100 on page 10 of 94. Per Page

First Pre 8 910

Speech-Music-Noise Discrimination in Sound Indexing of Multimedia Documents

View

1676

Download

1391

Like

0

Tibetan Multi-Dialect Speech Recognition Using Latent Regression Bayesian Network and End-To-End Mode

View

3419

Download

1708

Like

0

Tibetan Multi-Dialect Speech and Dialect Identity Recognition

View

2603

Download

1556

Like

0

Cited by

2

Speech Resampling Detection Based on Inconsistency of Band Energy

View

1989

Download

1383

Like

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: