Bibliographic Information
- Other Title
-
- 文要素解析と固有表現抽出によるメタデータ抽出
Search this article
Abstract
There are a lot of documents without metadata about their semantic content on the web, but it is impracticable to manually add metadata to them considering costs; therefore a method to automatically extract metadata is necessary. This paper proposes an approach to extract such 4W+HM metadata as <when>, <where>, <who>, <what>, and <hm> (stands for how-much and how-many) from plane texts and evaluates it to measure its retrieval effectiveness. In this approach, metadata extraction process is free from dictionaries customized for specific fields, can apply general documents, and mainly consists of three parts: First, sentence element analysis part identifies what role each chunk in sentences plays; Second, named entity extraction part finds out remarkable expressions in sentences; Third, based on information given from previous parts, a criterion-based method outputs metadata. The evaluation experiment of metadata extraction is performed by calculating recall and precision using a test set which is manually added correct metadata. Mextractr, which is related work to extract 5W1H metadata, is also evaluated with the same test set, and its results are compared with proposed method’s. The experimental results show that proposed method is almost superior to Mextractr. In particular, proposed method excels in distinguishing <who> and <where>, which means that sentence elemental analysis works well not to confuse actors and places.
Journal
-
- 法政大学大学院紀要. 情報科学研究科編
-
法政大学大学院紀要. 情報科学研究科編 8 65-68, 2013-03
法政大学大学院情報科学研究科
- Tweet
Details 詳細情報について
-
- CRID
- 1390853649760738176
-
- NII Article ID
- 120005400079
-
- NII Book ID
- AA12222297
-
- HANDLE
- 10114/8797
-
- ISSN
- 18810667
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- IRDB
- CiNii Articles
-
- Abstract License Flag
- Allowed