Automatically Collecting and Monitoring Japanese Weblogs

  • NANNO Tomoyuki
    Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
  • SUZUKI Yasuhiro
    Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
  • FUJIKI Toshiaki
    Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
  • OKUMURA Manabu
    Precision and Intelligence Laboratory, Tokyo Institute of Technology

Bibliographic Information

Other Title
  • blogの自動収集と監視
  • blog ノ ジドウ シュウシュウ ト カンシ

Search this article

Abstract

Weblogs (blogs) are now thought of as a potentially useful information source. Although the definition of blogs is not necessarily definite, it is generally understood that they are personal web pages authored by a single individual and made up of a sequence of dated entries of the author's thoughts, that are arranged chronologically. In Japan, since long before blog software became available, people have written `diaries' on the web. These web diaries are quite similar to blogs in their content, and people still write them without any blog software. As we will show, hand-edited blogs are quite numerous in Japan, though most people now think of blogs as pages usually published using one of the variants of public-domain blog software. Therefore, it is quite difficult to exhaustively collect Japanese blogs, i.e., collect blogs made with blog software and web diaries written as normal web pages. With this as the motivation for our work, we present a system that tries to automatically collect and monitor Japanese blog collections that include not only ones made with blog software but also ones written as normal web pages. Our approach is based on extraction of date expressions and analysis of HTML documents, to avoid having to depend on specific blog software, RSS, or the ping server.

Journal

Citations (14)*help

See more

References(14)*help

See more

Details 詳細情報について

Report a problem

Back to top