A hybrid approach to finding phenotype candidates in genetic texts

Ai Kawazoe, Martin Hall-May, Mai-Vu Tran, Hoang-Quynh Le, Dietrich Rebholz-Schuhmann, Anika Oellrich, Nigel Collier

doi:10.5167/uzh-69224

Named entity recognition (NER) has been extensively studied for the names of genes and gene products but there are few proposed solutions for phenotypes. Phenotype terms are expected to play a key role in inferring gene function in complex heritable diseases but are intrinsically difficult to analyse due to their complex semantics and scale. In contrast to previous approaches we evaluate state-of-the-art techniques involving the fusion of machine learning on a rich feature set with evidence from extant domain knowledge-sources. The techniques are validated on two gold standard collections including a novel annotated collection of 112 abstracts derived from a systematic search of the Online Mendelian Inheritance of Man database for auto-immune diseases. Encouragingly the hybrid model outperforms a HMM, a CRF and a pure knowledge-based method to achieve an F1 of 77.07. Disagreement analysis points to further improvements on this emerging NE task. The annotated corpus and guidelines are available on request.

A hybrid approach to finding phenotype candidates in genetic texts

説明

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

A hybrid approach to finding phenotype candidates in genetic texts

説明

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について