- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Automatic Translation feature is available on CiNii Labs
- Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Building a high-quality sense inventory for improved abbreviation disambiguation
-
- Naoaki Okazaki
- 1 Graduate School of Information Science and Technology, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan and 2 School of Computer Science, University of Manchester, National Centre for Text Mining (NaCTeM), Manchester Interdisciplinary Biocentre, 131 Princess Street, Manchester M1 7DN, UK
-
- Sophia Ananiadou
- 1 Graduate School of Information Science and Technology, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan and 2 School of Computer Science, University of Manchester, National Centre for Text Mining (NaCTeM), Manchester Interdisciplinary Biocentre, 131 Princess Street, Manchester M1 7DN, UK
-
- Jun'ichi Tsujii
- 1 Graduate School of Information Science and Technology, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan and 2 School of Computer Science, University of Manchester, National Centre for Text Mining (NaCTeM), Manchester Interdisciplinary Biocentre, 131 Princess Street, Manchester M1 7DN, UK
Search this article
Description
<jats:title>Abstract</jats:title> <jats:p>Motivation: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions of bio-medical texts. However, expanded forms extracted by abbreviation recognition are mixtures of concepts/senses and their term variations. Consequently, a list of expanded forms should be structured into a sense inventory, which provides possible concepts or senses for abbreviation disambiguation.</jats:p> <jats:p>Results: A sense inventory is a key to robust management of abbreviations. Therefore, we present a supervised approach for clustering expanded forms. The experimental result reports 0.915 F1 score in clustering expanded forms. We then investigate the possibility of conflicts of protein and gene names with abbreviations. Finally, an experiment of abbreviation disambiguation on the sense inventory yielded 0.984 accuracy and 0.986 F1 score using the dataset obtained from MEDLINE abstracts.</jats:p> <jats:p>Availability: The sense inventory and disambiguator of abbreviations are accessible at http://www.nactem.ac.uk/software/acromine/ and http://www.nactem.ac.uk/software/acromine_disambiguation/</jats:p> <jats:p>Contact: okazaki@chokkan.org</jats:p>
Journal
-
- Bioinformatics
-
Bioinformatics 26 (9), 1246-1253, 2010-03-30
Oxford University Press (OUP)