Efficient EM learning of probabilistic CFGs and their extensions by using WFSTs
-
- KAMEYA YOSHITAKA
- Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology
-
- MORI TAKASHI
- Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology
-
- SATO TAISUKE
- Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology
Bibliographic Information
- Other Title
-
- WFSTに基づく確率文脈自由文法およびその拡張文法の高速EM学習法
- WFST ニ モトヅク カクリツ ブンミャク ジユウ ブンポウ オヨビ ソノ カクチョウ ブンポウ ノ コウソク EM ガクシュウホウ
Search this article
Abstract
Probabilistic context-free grammars (PCFGs) are a widely-known class of statistical language models. The Inside-Outside (I-O) algorithm is also well-known as an efficient EM algorithm tailored for PCFGs. Although the algorithm requires only inexpensive linguistic resources, there remains a problem in its efficiency. In this paper, we present a new framework for efficient EM learning of PCFGs in which the parser is separated from the EM algorithm, assuming the underlying CFG is given. A new EM procedure exploits the compactness of WFSTs (well-formed substring tables) generated by the parser. Our framework is quite general in the sense that the input grammar need not to be in Chomsky normal form (CNF) while the new EM algorithm is equivalent to the I-O algorithm in the CNF case. In addition, we propose a polynomial-time EM procedure for CFGs with context-sensitive probabilities, and report experimental results with ATR corpus and a hand-crafted Japanese grammar.
Journal
-
- Journal of Natural Language Processing
-
Journal of Natural Language Processing 8 (1), 49-84, 2001
The Association for Natural Language Processing
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1390282679452827904
-
- NII Article ID
- 10008830219
-
- NII Book ID
- AN10472659
-
- ISSN
- 21858314
- 13407619
- http://id.crossref.org/issn/13407619
-
- NDL BIB ID
- 5634337
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- NDL
- Crossref
- CiNii Articles
-
- Abstract License Flag
- Disallowed