歴史災害史料からの自動地名抽出に向けた自然言語処理システムの性能評価

武内, 樹治, 大内, 啓樹, 東山, 翔平

歴史災害について記述された歴史史料から，地名や位置情報といった地理的情報を抽出し，データベースや地図に統合・可視化することで，災害の詳細な状況の調査・分析を高度化・効率化できると期待できる．本報告では，自然言語処理技術を用いて，近世の歴史災害に関する史料から地名を抽出する取り組みを報告する．人手で地名をアノテーションしたデータセットを作成し，地名抽出に利用可能な既存の自然言語処理システム，GiNZAおよび ChatGPT（GPT -3.5，GPT -4）の抽出精度の評価と誤り事例の分析を行った．GiNZAの現代語向けモデルでは十分な精度が得られず， GPT -4では期待の持てる結果が得られたものの，コスト上の課題があることを確認した．今後は事前学習済みモデルをファインチューニングする方法などを試し，コスト・精度ともに実用性の高いモデルの実現を目指す．

In this paper, we describe a practice of using natural language processing technology to extract place names from Japanese historical documents related to historical disasters in the early modern period. We created a dataset manually annotated with place names, evaluated the extraction accuracy of existing natural language processing systems available for place name extraction, GiNZA and ChatGPT (GPT -3.5, GPT -4), and analyzed error cases. The model for modern languages of GiNZA did not provide sufficient accuracy. On the other hand, the GPT -4 model gave promising results, but we confirmed that there were some cost issues. In the future, we will experiment with methods such as fine -tuning of pre -trained models, and aim to realize a model that is practical in terms of both cost and accuracy.

歴史災害史料からの自動地名抽出に向けた自然言語処理システムの性能評価

書誌事項

抄録

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

歴史災害史料からの自動地名抽出に向けた自然言語処理システムの性能評価

書誌事項

抄録

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について