Using WFSTs for Efficient EM Learning of Probabilistic CFGs and Their Extensions
-
- Kameya Yoshitaka
- Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology
-
- Mori Takashi
- Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology
-
- Sato Taisuke
- Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology
この論文をさがす
抄録
Probabilistic context-free grammars (PCFGs) are a widely known class of probabilistic language models. The Inside-Outside (I-O) algorithm is well known as an efficient EM algorithm tailored for PCFGs. Although the algorithm requires inexpensive linguistic resources, there remains a problem in its efficiency. This paper presents an efficient method for training PCFG parameters in which the parser is separated from the EM algorithm, assuming that the underlying CFG is given. A new EM algorithm exploits the compactness of well-formed substring tables (WFSTs) generated by the parser. Our proposal is general in that the input grammar need not take Chomsky normal form (CNF) while it is equivalent to the I-O algorithm in the CNF case. In addition, we propose a polynomial-time EM algorithm for CFGs with context-sensitive probabilities, and report experimental results with the ATR dialogue corpus and a hand-crafted Japanese grammar.
収録刊行物
-
- 自然言語処理
-
自然言語処理 21 (4), 619-658, 2014
一般社団法人 言語処理学会
- Tweet
キーワード
詳細情報 詳細情報について
-
- CRID
- 1390282679452565760
-
- NII論文ID
- 130004714335
- 40020205501
-
- NII書誌ID
- AN10472659
-
- ISSN
- 21858314
- 13407619
-
- NDL書誌ID
- 025800156
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- NDL
- Crossref
- CiNii Articles
-
- 抄録ライセンスフラグ
- 使用不可