Impact of Role Prompting on Automated Essay Scoring Using GPT Models

Description

<p>Recent advancements in generative AI, particularly in Automated Essay Scoring (AES), have shown great potential, yet their accuracy remains insufficient compared to existing methods. The aim of this study is to explore the impact of role prompting on improving the performance of Large Language Models (LLMs) in AES tasks. In this research, we analyzed 240 essays written by non-native English speakers, extracted from eight prompts of the TOEFL11 corpus. Using each three versions of GPT-3.5 and GPT-4, essays were scored employing prompts representing seven different roles, and the results were evaluated against human ratings using Quadratic Weighted Kappa (QWK). The findings indicate that roles presumed to be advantageous did not necessarily enhance AES performance. Moreover, the gpt-4-0613 model demonstrated the highest effectiveness. This study contributes to the ongoing discussion on optimizing LLMs for AES, providing insights into their potential and limitations.</p>

Journal

Details 詳細情報について

Report a problem

Back to top