Metadata Extraction by Sentence Element Analysis and Named Entity Extraction

Bibliographic Information

Other Title
  • 文要素解析と固有表現抽出によるメタデータ抽出

Search this article

Abstract

There are a lot of documents without metadata about their semantic content on the web, but it is impracticable to manually add metadata to them considering costs; therefore a method to automatically extract metadata is necessary. This paper proposes an approach to extract such 4W+HM metadata as <when>, <where>, <who>, <what>, and <hm> (stands for how-much and how-many) from plane texts and evaluates it to measure its retrieval effectiveness. In this approach, metadata extraction process is free from dictionaries customized for specific fields, can apply general documents, and mainly consists of three parts: First, sentence element analysis part identifies what role each chunk in sentences plays; Second, named entity extraction part finds out remarkable expressions in sentences; Third, based on information given from previous parts, a criterion-based method outputs metadata. The evaluation experiment of metadata extraction is performed by calculating recall and precision using a test set which is manually added correct metadata. Mextractr, which is related work to extract 5W1H metadata, is also evaluated with the same test set, and its results are compared with proposed method’s. The experimental results show that proposed method is almost superior to Mextractr. In particular, proposed method excels in distinguishing <who> and <where>, which means that sentence elemental analysis works well not to confuse actors and places.

Journal

Details 詳細情報について

  • CRID
    1390853649760738176
  • NII Article ID
    120005400079
  • NII Book ID
    AA12222297
  • DOI
    10.15002/00009570
  • HANDLE
    10114/8797
  • ISSN
    18810667
  • Text Lang
    ja
  • Data Source
    • JaLC
    • IRDB
    • CiNii Articles
  • Abstract License Flag
    Allowed

Report a problem

Back to top