Word Segmentations In Medical Document Using Mutual Information and N-gram

Uesugi M

doi:10.14948/jami.27.431

Bibliographic Information

Other Title

N-gramと相互情報量を用いた医療用語抽出のための分割点の探索

Search this article

Description

We tried to explore the word segmentations with statistical information from N-grams and without dictionaries. This technique must be useful for the medical term extraction as the preprocessing to analyze the medical documents. When we will extract the medical terms from the medical documents with the technique, we believe that we can construct the relationship among the medical words in the documents and build the concept like ontology easily.<br/> Mutual Information (MI) was used to decide the word segmentations from six sorts of MI values with four N-grams. The four N-grams, unigram bigram trigram and quadrigram, were calculated from 9,800 summary, 3.2M characters, on Igaku-Chuou Magazine. Each MI value was calculated using a equation log (p(x, y)/p(x)/p(y)). Hence p(x), p(y) and p(x, y) are represented N-gram. For example, the MI Iuub(x, y) is calculated using p(x)=unigram of x, p(y)=unigram of y and p(x, y) =bigram of x+y. A summation of the six MI values (Iuub+Iubt+Iutq+Ibbq+Ibut+Ituq) and changing values of them were used to segment the words. When the summation of the MI values was threshold γ or less and the summation of their changing values was threshold δ or less, we determined that the segmentation existed between p(x) and p(y). And we settled both thresholds from the rate of correct segmentations in all and from the maximum difference between %correct segmentation and %all segmentation. As the result our method provided the 63.4% accuracy when the thresholds γ=4, δ=0.

Journal

Japan Journal of Medical Informatics

Japan Journal of Medical Informatics 27 (5), 431-438, 2007

Japan Association for Medical Informatics

Keywords

Details 詳細情報について

CRID: 1390282680727653248

NII Article ID: 10022605332

NII Book ID: AN10024228

DOI: 10.14948/jami.27.431

ISSN: 21888469; 02898055

Web Site: https://search.jamas.or.jp/link/ui/2008181389

Text Lang: ja

Data Source

JaLC
CiNii Articles

Abstract License Flag: Disallowed

Export

Report a problem