文要素解析と固有表現抽出によるメタデータ抽出

書誌事項

タイトル別名
  • Metadata Extraction by Sentence Element Analysis and Named Entity Extraction

この論文をさがす

抄録

There are a lot of documents without metadata about their semantic content on the web, but it is impracticable to manually add metadata to them considering costs; therefore a method to automatically extract metadata is necessary. This paper proposes an approach to extract such 4W+HM metadata as <when>, <where>, <who>, <what>, and <hm> (stands for how-much and how-many) from plane texts and evaluates it to measure its retrieval effectiveness. In this approach, metadata extraction process is free from dictionaries customized for specific fields, can apply general documents, and mainly consists of three parts: First, sentence element analysis part identifies what role each chunk in sentences plays; Second, named entity extraction part finds out remarkable expressions in sentences; Third, based on information given from previous parts, a criterion-based method outputs metadata. The evaluation experiment of metadata extraction is performed by calculating recall and precision using a test set which is manually added correct metadata. Mextractr, which is related work to extract 5W1H metadata, is also evaluated with the same test set, and its results are compared with proposed method’s. The experimental results show that proposed method is almost superior to Mextractr. In particular, proposed method excels in distinguishing <who> and <where>, which means that sentence elemental analysis works well not to confuse actors and places.

収録刊行物

詳細情報 詳細情報について

  • CRID
    1390853649760738176
  • NII論文ID
    120005400079
  • NII書誌ID
    AA12222297
  • DOI
    10.15002/00009570
  • HANDLE
    10114/8797
  • ISSN
    18810667
  • 本文言語コード
    ja
  • データソース種別
    • JaLC
    • IRDB
    • CiNii Articles
  • 抄録ライセンスフラグ
    使用可

問題の指摘

ページトップへ