JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research
-
- Takamichi Shinnosuke
- Graduate School of Information Science and Technology, The University of Tokyo
-
- Sonobe Ryosuke
- Graduate School of Information Science and Technology, The University of Tokyo
-
- Mitsui Kentaro
- Graduate School of Information Science and Technology, The University of Tokyo
-
- Saito Yuki
- Graduate School of Information Science and Technology, The University of Tokyo
-
- Koriyama Tomoki
- Graduate School of Information Science and Technology, The University of Tokyo
-
- Tanji Naoko
- Graduate School of Information Science and Technology, The University of Tokyo
-
- Saruwatari Hiroshi
- Graduate School of Information Science and Technology, The University of Tokyo
この論文をさがす
説明
<p>In this paper, we develop two corpora for speech synthesis research. Thanks to improvements in machine learning techniques, including deep learning, speech synthesis is becoming a machine learning task. To accelerate speech synthesis research, we aim at developing Japanese voice corpora reasonably accessible from not only academic institutions but also commercial companies. In this paper, we construct the JSUT and JVS corpora. They are designed mainly for text-to-speech synthesis and voice conversion, respectively. The JSUT corpus contains 10 hours of reading-style speech uttered by a single speaker, and the JVS corpus contains 30 hours containing three styles of speech uttered by 100 speakers. This paper describes how we designed the corpora and summarizes the specifications. The corpora are available at our project pages.</p>
収録刊行物
-
- Acoustical Science and Technology
-
Acoustical Science and Technology 41 (5), 761-768, 2020-09-01
一般社団法人 日本音響学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390566775163782016
-
- NII論文ID
- 130007895044
-
- ISSN
- 13475177
- 03694232
- 13463969
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- Crossref
- CiNii Articles
- OpenAIRE
-
- 抄録ライセンスフラグ
- 使用不可