An Algorithm that Extracts Alphabet - Kana Rules from Name Database

増田, 恵子, Keiko, Masuda

日本人は外国語のカタカナ表記を知っていてもその綴りを知らないことが多い.そこでアルファベット表記とカタカナ表記が対応する規則があると便利である.一方外国人名のアルファベット表記とカタカナ表記が対応したデータベース(人名辞書)が存在しこれは利用可能である.本稿では人名辞書におけるアルファベット表記の部分文字列とカタカナ表記の部分文字列の組の出現頻度の変化からアルファベット表記とカタカナ表記の対応規則を自動的に得るアルゴリズムを提案する.本アルゴリズムはローマ字綴りからなる仮想の対応規則を組み合わせたデータ集合から規則を完全に再現することができる.人名辞書にアルゴリズムを適用して得られた対応規則をデータ検索システムで使われることを想定して評価した結果正当率は80% 綴りの復元率は84% 読みの復元率は48.8%の精度を得た.また人手によって規則を作る方法と比較した結果本手法は人の知識を使用しない方法にもかかわらず人間の生成する規則を含む規則を得ることができた.

It is often the case that Japanese people knows the words in Kana spelling but does not know its true spelling. It is useful if there are correspondence rules between alphabet and Kana spelling. In this paper, we propose an algorithm to extract automatically Alphabet-Kana correspondence rules from the pairs of alphabet of person names and their corresponding Kana spelling. We have checked the algorithm by the test rules that consisted of Romaji and its Kana spelling, and applied the algorithm to the data composed by this rules. We have confirmed that all of the test rules reappeared in extracting rules. We have applied the algorithm to the actual data and measured the correctness of the extracted rules, considering information retrieval application. As the result, we have obtained 80% as correctness of the rules, 84% as coverage of the rules and 48.8% as correctness of its replacement. Even if our method does not use human knowledge, we are able to get replacement rules which human generates.

An Algorithm that Extracts Alphabet - Kana Rules from Name Database

Bibliographic Information

Search this article

Description

Journal

Citations (1)*help

References(8)*help

Keywords

Details 詳細情報について

Export

Report a problem