【2022年1月締切】CiNii ArticlesへのCiNii Researchへの統合に伴う機関認証の移行確認について

【1/6更新】2022年4月1日からのCiNii ArticlesのCiNii Researchへの統合について

Maximum entropy models with inequality constraints: A case study on text categorization

抄録

Data sparseness or overfitting is a serious problem in natural language processing employing machine learning methods. This is still true even for the maximum entropy (ME) method, whose flexible modeling capability has alleviated data sparseness more successfully than the other probabilistic models in many NLP tasks. Although we usually estimate the model so that it completely satisfies the equality constraints on feature expectations with the ME method, complete satisfaction leads to undesirable overfitting, especially for sparse features, since the constraints derived from a limited amount of training data are always uncertain. To control overfitting in ME estimation, we propose the use of box-type inequality constraints, where equality can be violated up to certain predefined levels that reflect this uncertainty. The derived models, inequality ME models, in effect have regularized estimation with L_1 norm penalties of bounded parameters. Most importantly, this regularized estimation enables the model parameters to become sparse. This can be thought of as automatic feature selection, which is expected to improve generalization performance further. We evaluate the inequality ME models on text categorization datasets, and demonstrate their advantages over standard ME estimation, similarly motivated Gaussian MAP estimation of ME models, and support vector machines (SVMs), which are one of the state-of-the-art methods for text categorization.

identifier:08856125

identifier:https://dspace.jaist.ac.jp/dspace/handle/10119/3305

収録刊行物

  • Machine Learning

    Machine Learning 60 (1-3), 159-194, 2005-09

    Springer Science + Business Media

被引用文献 (0)*注記

もっと見る

参考文献 (0)*注記

もっと見る

関連論文

もっと見る

関連研究データ

もっと見る

関連図書・雑誌

もっと見る

関連博士論文

もっと見る

関連プロジェクト

もっと見る

関連その他成果物

もっと見る

詳細情報

ページトップへ