{"@context":{"@vocab":"https://cir.nii.ac.jp/schema/1.0/","rdfs":"http://www.w3.org/2000/01/rdf-schema#","dc":"http://purl.org/dc/elements/1.1/","dcterms":"http://purl.org/dc/terms/","foaf":"http://xmlns.com/foaf/0.1/","prism":"http://prismstandard.org/namespaces/basic/2.0/","cinii":"http://ci.nii.ac.jp/ns/1.0/","datacite":"https://schema.datacite.org/meta/kernel-4/","ndl":"http://ndl.go.jp/dcndl/terms/","jpcoar":"https://github.com/JPCOAR/schema/blob/master/2.0/"},"@id":"https://cir.nii.ac.jp/crid/1390305201349619072.json","@type":"Article","productIdentifier":[{"identifier":{"@type":"DOI","@value":"10.2197/ipsjjip.33.487"}},{"identifier":{"@type":"URI","@value":"https://www.jstage.jst.go.jp/article/ipsjjip/33/0/33_487/_pdf"}}],"dc:title":[{"@language":"en","@value":"Hallucination Detection on Code Generation with SelfCheckGPT"}],"dc:language":"en","description":[{"type":"abstract","notation":[{"@language":"en","@value":"<p>Large language models (LLMs) are expected to bring automation and efficiency to software development, including programming. However, an LLM encounters a challenge known as “hallucination, ” where it produces incorrect content or outputs that deviate from input requirements. SelfCheckGPT is one of the methods designed to detect hallucinations. Its key feature lies in its ability to infer the occurrence of hallucinations without requiring reference data or test cases. Although SelfCheckGPT has been evaluated and applied in natural language processing tasks such as text summarization and question answering, its performance in code generation has not yet been explored. In this study, we applied SelfCheckGPT to the HumanEval dataset, a standard benchmark for code generation, and investigated its evaluation performance by comparing it with execution-based evaluations. The results revealed that calculating similarity using BLEU, ROUGE-L, and EditSim is adequate for predicting the correctness of code or, in other words, hallucinations.</p>"}],"abstractLicenseFlag":"disallow"}],"creator":[{"@id":"https://cir.nii.ac.jp/crid/1410305201349619075","@type":"Researcher","foaf:name":[{"@language":"en","@value":"Ito Waka"}],"jpcoar:affiliationName":[{"@language":"en","@value":"Graduate School of Science Division of Mathematical and Physical Sciences, Japan Women's University"}]},{"@id":"https://cir.nii.ac.jp/crid/1410305201349619074","@type":"Researcher","foaf:name":[{"@language":"en","@value":"Obara Yui"}],"jpcoar:affiliationName":[{"@language":"en","@value":"Graduate School of Science Division of Mathematical and Physical Sciences, Japan Women's University"}]},{"@id":"https://cir.nii.ac.jp/crid/1410305201349619073","@type":"Researcher","foaf:name":[{"@language":"en","@value":"Sato Miyu"}],"jpcoar:affiliationName":[{"@language":"en","@value":"Graduate School of Science Division of Mathematical and Physical Sciences, Japan Women's University"}]},{"@id":"https://cir.nii.ac.jp/crid/1410305201349619072","@type":"Researcher","foaf:name":[{"@language":"en","@value":"Kuramitsu Kimio"}],"jpcoar:affiliationName":[{"@language":"en","@value":"Department of Mathematics, Physics, and Computer Science, Japan Women's University, Bunkyo"}]}],"publication":{"publicationIdentifier":[{"@type":"EISSN","@value":"18826652"}],"prism:publicationName":[{"@language":"en","@value":"Journal of Information Processing"},{"@language":"en","@value":"Journal of Information Processing"}],"dc:publisher":[{"@language":"en","@value":"Information Processing Society of Japan"},{"@language":"ja","@value":"一般社団法人 情報処理学会"}],"prism:publicationDate":"2025","prism:volume":"33","prism:number":"0","prism:startingPage":"487","prism:endingPage":"493"},"reviewed":"false","url":[{"@id":"https://www.jstage.jst.go.jp/article/ipsjjip/33/0/33_487/_pdf"}],"availableAt":"2025","foaf:topic":[{"@id":"https://cir.nii.ac.jp/all?q=LLMs","dc:title":"LLMs"},{"@id":"https://cir.nii.ac.jp/all?q=generative%20AI","dc:title":"generative AI"},{"@id":"https://cir.nii.ac.jp/all?q=code%20generation","dc:title":"code generation"},{"@id":"https://cir.nii.ac.jp/all?q=hallucination","dc:title":"hallucination"},{"@id":"https://cir.nii.ac.jp/all?q=evaluation%20metrics","dc:title":"evaluation metrics"}],"relatedProduct":[{"@id":"https://cir.nii.ac.jp/crid/1360011145753572480","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"DeepBugs: a learning approach to name-based bug detection"}]},{"@id":"https://cir.nii.ac.jp/crid/1360020701022781952","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"CodeBERT: A Pre-Trained Model for Programming and Natural Languages"}]},{"@id":"https://cir.nii.ac.jp/crid/1360022501345155968","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models"}]},{"@id":"https://cir.nii.ac.jp/crid/1360024022340832128","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Out of the BLEU: How should we assess quality of the Code Generation models?"}]},{"@id":"https://cir.nii.ac.jp/crid/1360024025226188928","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation"}]},{"@id":"https://cir.nii.ac.jp/crid/1360298345090256128","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"IntelliCode compose: code generation using transformer"}]},{"@id":"https://cir.nii.ac.jp/crid/1360305497316018304","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code"}]},{"@id":"https://cir.nii.ac.jp/crid/1360305497604995456","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Developer Testing in the IDE: Patterns, Beliefs, and Behavior"}]},{"@id":"https://cir.nii.ac.jp/crid/1360579820494762752","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Survey of Hallucination in Natural Language Generation"}]},{"@id":"https://cir.nii.ac.jp/crid/1360586971786245248","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"CodeJudge: Evaluating Code Generation with Large Language Models"}]},{"@id":"https://cir.nii.ac.jp/crid/1360586972544462208","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Using LLMs in Software Requirements Specifications: An Empirical Evaluation"}]},{"@id":"https://cir.nii.ac.jp/crid/1360868448240871680","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Advancing Requirements Engineering Through Generative AI: Assessing the Role of LLMs"}]},{"@id":"https://cir.nii.ac.jp/crid/1361699995767595392","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"A Normalized Levenshtein Distance Metric"}]},{"@id":"https://cir.nii.ac.jp/crid/1362544418386190976","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Texygen"}]},{"@id":"https://cir.nii.ac.jp/crid/1364233270606638080","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"BLEU"}]}],"dataSourceIdentifier":[{"@type":"JALC","@value":"oai:japanlinkcenter.org:2014517682"},{"@type":"CROSSREF","@value":"10.2197/ipsjjip.33.487"}]}