Extraction of Similar Words Based on Time-correlation and Co-occurrence Probability from Tweets of the Same Topic

Bibliographic Information

Other Title
  • 同一ハッシュタグツイート群における時空間相関情報に基づく単語類似度の計量
  • ドウイツ ハッシュタグツイートグン ニ オケル ジクウカン ソウカン ジョウホウ ニ モトズク タンゴ ルイジド ノ ケイリョウ

Search this article

Abstract

In order to reduce various onomastic expressions for efficient tweet topic retrieval/clustering, a construction method of twitter dictionaries based on tweets extraction and their time-correlation is proposed. In the proposed method, similarities between keywords are calculated by the time-correlation of each word and co-occurrence probability. Furthermore, the proposed method divides the target time line to reduce the computational cost of twitter dictionaries construction. Through experiments with 101,714 tweets with the hashtags related to ``NHK kohaku-utagassen'', the effectiveness of the proposed division method compared with the method calculated using entire time line region is confirmed.

Journal

Details 詳細情報について

Report a problem

Back to top