Towards better language modeling for Thai LVCSR

Sadaoki Furui, Issara Thienlikit, Markpong Jongtaveesataporn, Chai Wutiwiwatchai

doi:10.21437/interspeech.2007-447

【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
【Updated on June 30, 2025】Suspension and deletion of data provided by Nikkei BP
Regarding the recording of “Research Data” and “Evidence Data”

Towards better language modeling for Thai LVCSR

DOI Open Access

Description

One of the difficulties of Thai language modeling is the process of text corpus preparation. Because there is no explicit word boundary marker in written Thai text, word segmentation must be performed prior to training a language model. This paper presents two approaches to language model construction for Thai LVCSR based on pseudo-morpheme merging. The first approach merges pseudo-morphemes using forward and reverse bi-grams. The second approach utilizes the C4.5 decision tree to merge pseudo-morphemes based on multiple features. The performance of ASR systems with language models built using these methods are better than systems which use only pseudo-morpheme or lexicon-based word segmentation. These approaches produce results which are are also comparable to the system which utilizes manual segmentation.

Towards better language modeling for Thai LVCSR

Description

Journal

Details 詳細情報について

Export

Report a problem