Design, Implementation, and Operation of Annotation Support System for Morphological Information of BCCWJ

Bibliographic Information

Other Title
  • 『現代日本語書き言葉均衡コーパス』形態論情報アノテーション支援システムの設計・実装・運用
  • 『 ゲンダイ ニホンゴ カキコトバ キンコウ コーパス 』 ケイタイロン ジョウホウ アノテーション シエン システム ノ セッケイ ・ ジッソウ ・ ウンヨウ

Search this article

Abstract

“Balanced Corpus of Contemporary Written Japanese” is a large-scale Japanese corpus of 100 million words. It contains 170,000 XML files annotated with two levels of morphological information: short-unit word and long-unit word. We have constructed an annotation system to compile this corpus. The system allows many users to modify corpus annotations and dictionary entries, which are related to each other, while ensuring consistency. The system consists of a relational database server called the “Morphological Information Database,” a client tool that maintains the morphological information of the corpus called “Dynagon,” and a tool that manages dictionary entries for morphological analysis called “UniDic Explorer.” This paper describes the design, implementation, and operation of this “Morphological Information Database” for BCCWJ.

Journal

References(2)*help

See more

Details 詳細情報について

Report a problem

Back to top