On the Effect of Corpus Size in Words Similarity Calculation

  • Aizawa Akiko
    National Institute of Informatics Graduate School of Advanced Studies

Bibliographic Information

Other Title
  • 類語関係抽出タスクにおけるコーパス規模拡大の影響

Search this article

Description

This paper focuses on the effect of corpus size in word similarity calculation. Recently, large-scale text corpora became available for automatic synonyms extraction. And it has been reported that the performance of simple methods adapted to large-scale corpora is sometimes comparable to the one of more elaborative methods such as LDA adapted to traditional linguistic resources. In this paper, we report our experimental results as to how the quantity of the corpus complements the quality of similarity calculation. Our results show that the similarity calculation is sometimes influenced by the absolute word frequencies and that there exists a simple filtering method that can correct the bias.

Journal

  • IPSJ SIG Notes

    IPSJ SIG Notes 94 91-98, 2006

    Information Processing Society of Japan (IPSJ)

Citations (2)*help

See more

Details 詳細情報について

  • CRID
    1573105976776148864
  • NII Article ID
    110004824264
  • NII Book ID
    AN10115061
  • Text Lang
    ja
  • Data Source
    • CiNii Articles

Report a problem

Back to top