[Updated on Apr. 18] Integration of CiNii Articles into CiNii Research

共同研究プロジェクト紹介 萌芽・発掘型 : 統計と機械学習による日本語史研究 歴史的日本語資料のアノテーションと自動濁点付与

Bibliographic Information

Other Title
  • 〈共同研究プロジェクト紹介〉萌芽・発掘型 : 統計と機械学習による日本語史研究 歴史的日本語資料のアノテーションと自動濁点付与
  • キョウドウ ケンキュウ プロジェクト ショウカイ ホウガ ・ ハックツガタ : トウケイ ト キカイ ガクシュウ ニ ヨル ニホンゴシ ケンキュウ レキシテキ ニホンゴ シリョウ ノ アノテーション ト ジドウ ダクテン フヨ

Search this article




Following a survey of annotations for historical Japanese documents that are required for the construction of a diachronic corpus, I introduce the results of our research on adding dakuten (the voicing diacritic) automatically. Raw historical texts often include characters with dakuten omitted, but such texts degrade readability and retrievability and are not suitable for morphological analysis. We therefore developed an automatic annotation technique for dakuten based on statistical machine learning that has a precision rate of approximately 96% and a recall rate of approximately 98%. This technique can reduce the work involved in diachronic corpus construction. Finally, I discuss the high-level annotation that can be expected in diachronic corpora from now on.


Citations (0)*help

See more


See more

Related Articles

See more

Related Data

See more

Related Books

See more

Related Dissertations

See more

Related Projects

See more

Related Products

See more


Report a problem

Back to top