文要素解析と固有表現抽出によるメタデータ抽出

齋藤 悠, SAITO Yu

doi:10.15002/00009570

There are a lot of documents without metadata about their semantic content on the web, but it is impracticable to manually add metadata to them considering costs; therefore a method to automatically extract metadata is necessary. This paper proposes an approach to extract such 4W+HM metadata as <when>, <where>, <who>, <what>, and <hm> (stands for how-much and how-many) from plane texts and evaluates it to measure its retrieval effectiveness. In this approach, metadata extraction process is free from dictionaries customized for specific fields, can apply general documents, and mainly consists of three parts: First, sentence element analysis part identifies what role each chunk in sentences plays; Second, named entity extraction part finds out remarkable expressions in sentences; Third, based on information given from previous parts, a criterion-based method outputs metadata. The evaluation experiment of metadata extraction is performed by calculating recall and precision using a test set which is manually added correct metadata. Mextractr, which is related work to extract 5W1H metadata, is also evaluated with the same test set, and its results are compared with proposed method’s. The experimental results show that proposed method is almost superior to Mextractr. In particular, proposed method excels in distinguishing <who> and <where>, which means that sentence elemental analysis works well not to confuse actors and places.

文要素解析と固有表現抽出によるメタデータ抽出

書誌事項

この論文をさがす

抄録

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

文要素解析と固有表現抽出によるメタデータ抽出

書誌事項

この論文をさがす

抄録

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について