Hallucination Detection on Code Generation with SelfCheckGPT

Ito Waka, Obara Yui, Sato Miyu, Kuramitsu Kimio

doi:10.2197/ipsjjip.33.487

説明

<p>Large language models (LLMs) are expected to bring automation and efficiency to software development, including programming. However, an LLM encounters a challenge known as “hallucination, ” where it produces incorrect content or outputs that deviate from input requirements. SelfCheckGPT is one of the methods designed to detect hallucinations. Its key feature lies in its ability to infer the occurrence of hallucinations without requiring reference data or test cases. Although SelfCheckGPT has been evaluated and applied in natural language processing tasks such as text summarization and question answering, its performance in code generation has not yet been explored. In this study, we applied SelfCheckGPT to the HumanEval dataset, a standard benchmark for code generation, and investigated its evaluation performance by comparing it with execution-based evaluations. The results revealed that calculating similarity using BLEU, ROUGE-L, and EditSim is adequate for predicting the correctness of code or, in other words, hallucinations.</p>

収録刊行物

Journal of Information Processing

Journal of Information Processing 33 (0), 487-493, 2025

一般社団法人情報処理学会

キーワード

詳細情報詳細情報について

CRID: 1390305201349619072

DOI: 10.2197/ipsjjip.33.487

ISSN: 18826652

Web Site: https://www.jstage.jst.go.jp/article/ipsjjip/33/0/33_487/_pdf

本文言語コード: en

データソース種別

JaLC
Crossref

抄録ライセンスフラグ: 使用不可

書き出し

問題の指摘

Hallucination Detection on Code Generation with SelfCheckGPT

書誌事項

説明

収録刊行物

参考文献 (15)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

Hallucination Detection on Code Generation with SelfCheckGPT

書誌事項

説明

収録刊行物

参考文献 (15)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について