Automatically Collecting and Monitoring Japanese Weblogs
-
- NANNO Tomoyuki
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
-
- SUZUKI Yasuhiro
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
-
- FUJIKI Toshiaki
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
-
- OKUMURA Manabu
- Precision and Intelligence Laboratory, Tokyo Institute of Technology
Bibliographic Information
- Other Title
-
- blogの自動収集と監視
- blog ノ ジドウ シュウシュウ ト カンシ
Search this article
Abstract
Weblogs (blogs) are now thought of as a potentially useful information source. Although the definition of blogs is not necessarily definite, it is generally understood that they are personal web pages authored by a single individual and made up of a sequence of dated entries of the author's thoughts, that are arranged chronologically. In Japan, since long before blog software became available, people have written `diaries' on the web. These web diaries are quite similar to blogs in their content, and people still write them without any blog software. As we will show, hand-edited blogs are quite numerous in Japan, though most people now think of blogs as pages usually published using one of the variants of public-domain blog software. Therefore, it is quite difficult to exhaustively collect Japanese blogs, i.e., collect blogs made with blog software and web diaries written as normal web pages. With this as the motivation for our work, we present a system that tries to automatically collect and monitor Japanese blog collections that include not only ones made with blog software but also ones written as normal web pages. Our approach is based on extraction of date expressions and analysis of HTML documents, to avoid having to depend on specific blog software, RSS, or the ping server.
Journal
-
- Transactions of the Japanese Society for Artificial Intelligence
-
Transactions of the Japanese Society for Artificial Intelligence 19 (6), 511-520, 2004
The Japanese Society for Artificial Intelligence
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1390282680083027968
-
- NII Article ID
- 10014164934
- 30009884346
-
- NII Book ID
- AA11579226
-
- ISSN
- 13468030
- 13460714
-
- NDL BIB ID
- 7264454
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- NDL
- Crossref
- CiNii Articles
-
- Abstract License Flag
- Disallowed