Estimating an author's age group by machine learning for offender profiling

Bibliographic Information

Other Title
  • 機械学習を用いた著者の年齢層推定 : 犯罪者プロファイリング実現に向けて
  • キカイ ガクシュウ オ モチイタ チョシャ ノ ネンレイソウ スイテイ : ハンザイシャ プロファイリング ジツゲン ニ ムケテ

Search this article

Description

The purpose of this study was to estimate the text authors' age group by using random forests and support vector machines on the basis of stylometric features of texts. The results showed that there were statistical significances among five age groups with next stylometric features of texts on a 100 blog; the frequency of (1) a noun, (2) a binding particle 「は」 just before commas, (3) 「ずっと (an adverb)」, and (4) bigram of parts of speech (e.g., 「noun + noun」, 「symbol + noun」, 「auxiliary verb + adjective」, etc.). In the analysis by LOOCV (Leave-One-Out-Cross-Validation) for texts on another 100 blogs, the random forest model with 13 stylometric features showed the accuracy 80.0%: 81.3% for the "20s to 40s" age group and 79.4% for the "50s and 60s" age group in the rate of precision. Furthermore, the results of the support vector machines showed the accuracy 81.0%. The rates of precision were 78.4% for the "20s to 40s" age group and 82.5% for the "50s and 60s" age group. However, there was not statistical significant difference of the accuracy between both classifiers, this study displayed the possibility for the practical use of offender profiling.

Journal

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top