Efficient EM learning of probabilistic CFGs and their extensions by using WFSTs

  • KAMEYA YOSHITAKA
    Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology
  • MORI TAKASHI
    Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology
  • SATO TAISUKE
    Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology

Bibliographic Information

Other Title
  • WFSTに基づく確率文脈自由文法およびその拡張文法の高速EM学習法
  • WFST ニ モトヅク カクリツ ブンミャク ジユウ ブンポウ オヨビ ソノ カクチョウ ブンポウ ノ コウソク EM ガクシュウホウ

Search this article

Abstract

Probabilistic context-free grammars (PCFGs) are a widely-known class of statistical language models. The Inside-Outside (I-O) algorithm is also well-known as an efficient EM algorithm tailored for PCFGs. Although the algorithm requires only inexpensive linguistic resources, there remains a problem in its efficiency. In this paper, we present a new framework for efficient EM learning of PCFGs in which the parser is separated from the EM algorithm, assuming the underlying CFG is given. A new EM procedure exploits the compactness of WFSTs (well-formed substring tables) generated by the parser. Our framework is quite general in the sense that the input grammar need not to be in Chomsky normal form (CNF) while the new EM algorithm is equivalent to the I-O algorithm in the CNF case. In addition, we propose a polynomial-time EM procedure for CFGs with context-sensitive probabilities, and report experimental results with ATR corpus and a hand-crafted Japanese grammar.

Journal

Citations (3)*help

See more

References(26)*help

See more

Details 詳細情報について

Report a problem

Back to top