Age Group Based Document Classification in Bahasa Indonesia

Dita Pramesti, M. Iqbal D. Putra, Siti Oryza Khairunnisa, Wirarama Wedashwara, Budi Irmawati

doi:10.1109/icadeis49811.2020.9277104

Internet provides articles that may be categorized to various target readers based on genders, ages, hobbies, etc. To make sure that readers consume a proper article based on their age group, methods and training data were proposed and collected to classify the articles. This paper reported a document classification based on age groups using a binary classification method for Indonesian documents. The document classification used the term frequency and inverse document frequency (TF-IDF) features run on the Multinomial Naive Bayes Classifier. The dataset was crowdsourced from three different sites: bobo.grid.id, hai.grid.id, and www.detik.com for three age group readers such as elementary school children, teenagers, and adults. The experimental results obtained 0.9406, 0.9341, and 0.9374 of precision, recall, and F-score respectively. This experiment also reported that for the datasets that were not stemmed performed better than those that were stemmed. It shows that the stemming process, which usually be done in the document classification, throws some information in the Indonesian texts. However, because this behavior was not happen on nouns, our future work is to elaborate further on the role of affixations in the lower age group documents.

Age Group Based Document Classification in Bahasa Indonesia

Description

Journal

Details 詳細情報について

Export

Report a problem