A Method of Refining Topic Models Based on Term and Document Frequencies.

DOI Web Site Open Access

Bibliographic Information

Other Title
  • 単語の出現頻度と類似性に基づいたトピックモデル洗練化手法
  • タンゴ ノ シュツゲン ヒンド ト ルイジセイ ニ モトズイタ トピックモデル センレンカ シュホウ

Search this article

Abstract

<p>Software developers have made increasing use of natural language documents in many cases. Documents may contain useful information for software developers; however, it is difficult to extract such information when the number of the documents is considerably large. Latent Dirichlet Allocation (LDA) is a promising way of topic modeling. LDA-based topic modeling can be useful in facilitating comprehension of such documents. In LDA, a stop word list is used to filter general words for accurate topic classification. However, when using an existing stop word list, it is difficult to filter words that are not general but frequently appear in the target documents. In this paper, we propose a method that consists of two steps: stop word extraction from target documents and similar topic merging. We experimentally evaluate the method by applying it to mailing list. The experimental results demonstrate that our method constructs a topic model more accurately than the existing method.</p>

Journal

  • Computer Software

    Computer Software 36 (4), 4_25-4_31, 2019-10-25

    Japan Society for Software Science and Technology

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top