Impact of Role Prompting on Automated Essay Scoring Using GPT Models

LI Chenhui, YOSHIDA Lui

doi:10.11517/pjsai.jsai2024.0_2q4is505

<p>Recent advancements in generative AI, particularly in Automated Essay Scoring (AES), have shown great potential, yet their accuracy remains insufficient compared to existing methods. The aim of this study is to explore the impact of role prompting on improving the performance of Large Language Models (LLMs) in AES tasks. In this research, we analyzed 240 essays written by non-native English speakers, extracted from eight prompts of the TOEFL11 corpus. Using each three versions of GPT-3.5 and GPT-4, essays were scored employing prompts representing seven different roles, and the results were evaluated against human ratings using Quadratic Weighted Kappa (QWK). The findings indicate that roles presumed to be advantageous did not necessarily enhance AES performance. Moreover, the gpt-4-0613 model demonstrated the highest effectiveness. This study contributes to the ongoing discussion on optimizing LLMs for AES, providing insights into their potential and limitations.</p>

Impact of Role Prompting on Automated Essay Scoring Using GPT Models

Description

Journal

Keywords

Details 詳細情報について

Export

Report a problem