Deep learning based large vocabulary continuous speech recognition of an under-resourced language Bangladeshi Bangla
-
- Samin Ahnaf Mozib
- Department of Computer Science and Engineering, Shahjalal University of Science and Technology
-
- Kobir M. Humayon
- Department of Computer Science and Engineering, Shahjalal University of Science and Technology
-
- Kibria Shafkat
- Department of Computer Science and Engineering, Shahjalal University of Science and Technology
-
- Rahman M. Shahidur
- Department of Computer Science and Engineering, Shahjalal University of Science and Technology
この論文をさがす
説明
<p>Research in corpus-driven Automatic Speech Recognition (ASR) is advancing rapidly towards building a robust Large Vocabulary Continuous Speech Recognition (LVCSR) system. Under-resourced languages like Bangla require benchmarking large corpora for more research on LVCSR to tackle their limitations and avoid the biased results. In this paper, a publicly published large-scale Bangladeshi Bangla speech corpus is used to implement deep Convolutional Neural Network (CNN) based model and Recurrent Neural Network (RNN) based model with Connectionist Temporal Classification (CTC) loss function for Bangla LVCSR. In experimental evaluations, we find that CNN-based architecture yields superior results over the RNN-based approach. This study also emphasizes assessing the quality of an open-source large-scale Bangladeshi Bangla speech corpus and investigating the effect of the various high-order N-gram Language Models (LM) on a morphologically rich language Bangla. We achieve 36.12% word error rate (WER) using CNN-based acoustic model and 13.93% WER using beam search decoding with 5-gram LM. The findings demonstrate by far the state-of-the-art performance of any Bangla LVCSR system on a specific benchmarked large corpus.</p>
収録刊行物
-
- Acoustical Science and Technology
-
Acoustical Science and Technology 42 (5), 252-260, 2021-09-01
一般社団法人 日本音響学会
- Tweet
キーワード
詳細情報 詳細情報について
-
- CRID
- 1390852182143251328
-
- NII論文ID
- 130008082504
-
- NII書誌ID
- AA11501808
-
- ISSN
- 13475177
- 03694232
- 13463969
-
- NDL書誌ID
- 031887206
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- NDLサーチ
- Crossref
- CiNii Articles
-
- 抄録ライセンスフラグ
- 使用不可