- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
- 【Updated on June 30, 2025】Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Word Segmentations In Medical Document Using Mutual Information and N-gram
-
- Uesugi M
- Department of Medical Informatics Graduate School of Medicine Hokkaido University
Bibliographic Information
- Other Title
-
- N-gramと相互情報量を用いた医療用語抽出のための分割点の探索
Search this article
Description
We tried to explore the word segmentations with statistical information from N-grams and without dictionaries. This technique must be useful for the medical term extraction as the preprocessing to analyze the medical documents. When we will extract the medical terms from the medical documents with the technique, we believe that we can construct the relationship among the medical words in the documents and build the concept like ontology easily.<br/> Mutual Information (MI) was used to decide the word segmentations from six sorts of MI values with four N-grams. The four N-grams, unigram bigram trigram and quadrigram, were calculated from 9,800 summary, 3.2M characters, on Igaku-Chuou Magazine. Each MI value was calculated using a equation log (p(x, y)/p(x)/p(y)). Hence p(x), p(y) and p(x, y) are represented N-gram. For example, the MI Iuub(x, y) is calculated using p(x)=unigram of x, p(y)=unigram of y and p(x, y) =bigram of x+y. A summation of the six MI values (Iuub+Iubt+Iutq+Ibbq+Ibut+Ituq) and changing values of them were used to segment the words. When the summation of the MI values was threshold γ or less and the summation of their changing values was threshold δ or less, we determined that the segmentation existed between p(x) and p(y). And we settled both thresholds from the rate of correct segmentations in all and from the maximum difference between %correct segmentation and %all segmentation. As the result our method provided the 63.4% accuracy when the thresholds γ=4, δ=0.
Journal
-
- Japan Journal of Medical Informatics
-
Japan Journal of Medical Informatics 27 (5), 431-438, 2007
Japan Association for Medical Informatics
- Tweet
Details 詳細情報について
-
- CRID
- 1390282680727653248
-
- NII Article ID
- 10022605332
-
- NII Book ID
- AN10024228
-
- ISSN
- 21888469
- 02898055
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- CiNii Articles
-
- Abstract License Flag
- Disallowed