クラウドソーシングでの固有表現アノテーションにおける不良回答の検出

福光, 嘉伸, 松田, 裕貴, 諏訪, 博彦, 安本, 慶一

データのアノテーション作業をクラウドソーシングで行うことで，低コストで機械学習のための学習データを収集できる．しかし，コストと引き換えに得られるデータの品質に大きなばらつきがあり，特に対価として報酬を付与すると可能な限り速く回答を行おうとする行動により不良回答が発生する問題がある．そこで，本研究では固有表現アノテーションを対象として不良回答をリアルタイムで検出することを目的とし，作業中の画面操作から得られるカーソル移動量や操作時間などの特徴量を用いた検出手法を提案する．本稿では，クラウドワーカーを対象に検証実験を行うとともに，分類精度の改善を目的に個人差を反映できる特徴量を追加する．機械学習モデルによる分類では，学内学生とクラウドワーカーのデータ両方を用いたデータセットにおいて0.747のAccuracyが得られ，ラベル付与数に関する特徴量が分類において重要であることが分かった．

Annotation tasks can be performed via crowdsourcing to collect training data for machine learning at a low cost. However, the quality of the data obtained can vary, and there is an issue with careless responses due to workers trying to answer as quickly as possible to earn rewards. Therefore, this study proposes a real-time method for detecting careless responses in named entity annotation tasks. This method utilizes features such as cursor movement and response time, obtained from screen operations during the annotation task. The authors obtained an accuracy of 0.747 in the dataset combining data from students and crowdworkers. The authors also found that the number of assigned labels is a crucial feature of this classification.

クラウドソーシングでの固有表現アノテーションにおける不良回答の検出

書誌事項

抄録

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

クラウドソーシングでの固有表現アノテーションにおける不良回答の検出

書誌事項

抄録

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について