- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Automatic Translation feature is available on CiNii Labs
- Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Age Group Based Document Classification in Bahasa Indonesia
Description
Internet provides articles that may be categorized to various target readers based on genders, ages, hobbies, etc. To make sure that readers consume a proper article based on their age group, methods and training data were proposed and collected to classify the articles. This paper reported a document classification based on age groups using a binary classification method for Indonesian documents. The document classification used the term frequency and inverse document frequency (TF-IDF) features run on the Multinomial Naive Bayes Classifier. The dataset was crowdsourced from three different sites: bobo.grid.id, hai.grid.id, and www.detik.com for three age group readers such as elementary school children, teenagers, and adults. The experimental results obtained 0.9406, 0.9341, and 0.9374 of precision, recall, and F-score respectively. This experiment also reported that for the datasets that were not stemmed performed better than those that were stemmed. It shows that the stemming process, which usually be done in the document classification, throws some information in the Indonesian texts. However, because this behavior was not happen on nouns, our future work is to elaborate further on the role of affixations in the lower age group documents.
Journal
-
- 2020 International Conference on Advancement in Data Science, E-learning and Information Systems (ICADEIS)
-
2020 International Conference on Advancement in Data Science, E-learning and Information Systems (ICADEIS) 1-6, 2020-10-20
IEEE