{"@context":{"@vocab":"https://cir.nii.ac.jp/schema/1.0/","rdfs":"http://www.w3.org/2000/01/rdf-schema#","dc":"http://purl.org/dc/elements/1.1/","dcterms":"http://purl.org/dc/terms/","foaf":"http://xmlns.com/foaf/0.1/","prism":"http://prismstandard.org/namespaces/basic/2.0/","cinii":"http://ci.nii.ac.jp/ns/1.0/","datacite":"https://schema.datacite.org/meta/kernel-4/","ndl":"http://ndl.go.jp/dcndl/terms/","jpcoar":"https://github.com/JPCOAR/schema/blob/master/2.0/"},"@id":"https://cir.nii.ac.jp/crid/1360025430639525504.json","@type":"Article","productIdentifier":[{"identifier":{"@type":"DOI","@value":"10.1177/20552076241265215"}},{"identifier":{"@type":"URI","@value":"https://journals.sagepub.com/doi/pdf/10.1177/20552076241265215"}},{"identifier":{"@type":"URI","@value":"https://journals.sagepub.com/doi/full-xml/10.1177/20552076241265215"}}],"resourceType":"学術雑誌論文(journal article)","dc:title":[{"@value":"Diagnostic performance of generative artificial intelligences for a series of complex case reports"}],"description":[{"type":"abstract","notation":[{"@value":"<jats:sec>\n                    <jats:title>Background</jats:title>\n                    <jats:p>Diagnostic performance of generative artificial intelligences (AIs) using large language models (LLMs) across comprehensive medical specialties is still unknown.</jats:p>\n                  </jats:sec>\n                  <jats:sec>\n                    <jats:title>Objective</jats:title>\n                    <jats:p>We aimed to evaluate the diagnostic performance of generative AIs using LLMs in complex case series across comprehensive medical fields.</jats:p>\n                  </jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods</jats:title>\n                    <jats:p>\n                      We analyzed published case reports from the\n                      <jats:italic toggle=\"yes\">American Journal of Case Reports</jats:italic>\n                      from January 2022 to March 2023. We excluded pediatric cases and those primarily focused on management. We utilized three generative AIs to generate the top 10 differential-diagnosis (DDx) lists from case descriptions: the fourth-generation chat generative pre-trained transformer (ChatGPT-4), Google Gemini (previously Bard), and LLM Meta AI 2 (LLaMA2) chatbot. Two independent physicians assessed the inclusion of the final diagnosis in the lists generated by the AIs.\n                    </jats:p>\n                  </jats:sec>\n                  <jats:sec>\n                    <jats:title>Results</jats:title>\n                    <jats:p>\n                      Out of 557 consecutive case reports, 392 were included. The inclusion rates of the final diagnosis within top 10 DDx lists were 86.7% (340/392) for ChatGPT-4, 68.6% (269/392) for Google Gemini, and 54.6% (214/392) for LLaMA2 chatbot. The top diagnoses matched the final diagnoses in 54.6% (214/392) for ChatGPT-4, 31.4% (123/392) for Google Gemini, and 23.0% (90/392) for LLaMA2 chatbot. ChatGPT-4 showed higher diagnostic accuracy than Google Gemini (\n                      <jats:italic toggle=\"yes\">P</jats:italic>\n                       < 0.001) and LLaMA2 chatbot (\n                      <jats:italic toggle=\"yes\">P</jats:italic>\n                       < 0.001). Additionally, Google Gemini outperformed LLaMA2 chatbot within the top 10 DDx lists (\n                      <jats:italic toggle=\"yes\">P</jats:italic>\n                       < 0.001) and as the top diagnosis (\n                      <jats:italic toggle=\"yes\">P</jats:italic>\n                       = 0.010).\n                    </jats:p>\n                  </jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions</jats:title>\n                    <jats:p>This study demonstrated the diagnostic performance of generative AIs including ChatGPT-4, Google Gemini, and LLaMA2 chatbot. ChatGPT-4 exhibited higher diagnostic accuracy than the other platforms. These findings suggest the importance of understanding the differences in diagnostic performance among generative AIs, especially in complex case series across comprehensive medical fields, like general medicine.</jats:p>\n                  </jats:sec>"}]}],"creator":[{"@id":"https://cir.nii.ac.jp/crid/1420564276191067904","@type":"Researcher","personIdentifier":[{"@type":"KAKEN_RESEARCHERS","@value":"90810549"},{"@type":"NRID","@value":"1000090810549"},{"@type":"NRID","@value":"9000257904894"},{"@type":"NRID","@value":"9000347077714"},{"@type":"RESEARCHMAP","@value":"https://researchmap.jp/t.hirosawa1983"}],"foaf:name":[{"@value":"Takanobu Hirosawa"}],"jpcoar:affiliationName":[{"@value":"Dokkyo Medical University"}]},{"@id":"https://cir.nii.ac.jp/crid/1380025430639525508","@type":"Researcher","foaf:name":[{"@value":"Yukinori Harada"}],"jpcoar:affiliationName":[{"@value":"Dokkyo Medical University"}]},{"@id":"https://cir.nii.ac.jp/crid/1380025430639525505","@type":"Researcher","foaf:name":[{"@value":"Kazuya Mizuta"}],"jpcoar:affiliationName":[{"@value":"Dokkyo Medical University"}]},{"@id":"https://cir.nii.ac.jp/crid/1380025430639525506","@type":"Researcher","foaf:name":[{"@value":"Tetsu Sakamoto"}],"jpcoar:affiliationName":[{"@value":"Dokkyo Medical University"}]},{"@id":"https://cir.nii.ac.jp/crid/1380025430639525504","@type":"Researcher","foaf:name":[{"@value":"Kazuki Tokumasu"}],"jpcoar:affiliationName":[{"@value":"Okayama University"}]},{"@id":"https://cir.nii.ac.jp/crid/1380025430639525509","@type":"Researcher","foaf:name":[{"@value":"Taro Shimizu"}],"jpcoar:affiliationName":[{"@value":"Dokkyo Medical University"}]}],"publication":{"publicationIdentifier":[{"@type":"PISSN","@value":"20552076"},{"@type":"EISSN","@value":"20552076"}],"prism:publicationName":[{"@value":"DIGITAL HEALTH"}],"dc:publisher":[{"@value":"SAGE Publications"}],"prism:publicationDate":"2024-07-21","prism:volume":"10"},"reviewed":"false","dcterms:accessRights":"http://purl.org/coar/access_right/c_abf2","dc:rights":["https://creativecommons.org/licenses/by-nc/4.0/","https://journals.sagepub.com/page/policies/text-and-data-mining-license"],"url":[{"@id":"https://journals.sagepub.com/doi/pdf/10.1177/20552076241265215"},{"@id":"https://journals.sagepub.com/doi/full-xml/10.1177/20552076241265215"}],"createdAt":"2024-07-21","modifiedAt":"2026-01-05","foaf:topic":[{"@id":"https://cir.nii.ac.jp/all?q=Computer%20applications%20to%20medicine.%20Medical%20informatics","dc:title":"Computer applications to medicine. Medical informatics"},{"@id":"https://cir.nii.ac.jp/all?q=R858-859.7","dc:title":"R858-859.7"},{"@id":"https://cir.nii.ac.jp/all?q=Original%20Research%20Article","dc:title":"Original Research Article"}],"project":[{"@id":"https://cir.nii.ac.jp/crid/1040291932575819904","@type":"Project","projectIdentifier":[{"@type":"KAKEN","@value":"22K10421"},{"@type":"JGN","@value":"JP22K10421"},{"@type":"URI","@value":"https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-22K10421/"}],"notation":[{"@language":"ja","@value":"オンラインで身体診察を可能にする「リアルタイム遠隔聴診」システムの開発と臨床応用"},{"@language":"en","@value":"Development and clinical implementation of a \"real-time remote auscultation\" system that facilitates online physical examinations."}]},{"@id":"https://cir.nii.ac.jp/crid/1040299749911570176","@type":"Project","projectIdentifier":[{"@type":"KAKEN","@value":"24K13372"},{"@type":"JGN","@value":"JP24K13372"},{"@type":"URI","@value":"https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-24K13372/"}],"notation":[{"@language":"ja","@value":"非典型的な臨床像と診断エラーの関係の分析と応用性の高い対策の立案"},{"@language":"en","@value":"Analyzing the association between atypical presenations and diagnostic errors to develop efficient strategies to address diagnostic errors"}]}],"relatedProduct":[{"@id":"https://cir.nii.ac.jp/crid/1360016870443528704","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study"}]},{"@id":"https://cir.nii.ac.jp/crid/1360021390762617728","@type":"Article","resourceType":"学術雑誌論文(journal article)","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Comparative Evaluation of Diagnostic Accuracy Between Google Bard and Physicians"}]},{"@id":"https://cir.nii.ac.jp/crid/1360021393299442432","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Artificial Intelligence and Machine Learning in Clinical Medicine, 2023"}]},{"@id":"https://cir.nii.ac.jp/crid/1360025430194676224","@type":"Article","resourceType":"学術雑誌論文(journal article)","relationType":["isReferencedBy"],"jpcoar:relatedTitle":[{"@value":"Comparative Analysis of Diagnostic Performance: Differential Diagnosis Lists by LLaMA3 Versus LLaMA2 for Case Reports"}]},{"@id":"https://cir.nii.ac.jp/crid/1360025430640615424","@type":"Article","resourceType":"学術雑誌論文(journal article)","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Evaluating ChatGPT-4’s Diagnostic Accuracy: Impact of Visual Data Integration"}]},{"@id":"https://cir.nii.ac.jp/crid/1360025430640624384","@type":"Article","resourceType":"学術雑誌論文(journal article)","relationType":["isReferencedBy"],"jpcoar:relatedTitle":[{"@value":"Comparative Study to Evaluate the Accuracy of Differential Diagnosis Lists Generated by Gemini Advanced, Gemini, and Bard for a Case Report Series Analysis: Cross-Sectional Study"}]},{"@id":"https://cir.nii.ac.jp/crid/1360025438655666176","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Performance of a Web-Based Clinical Diagnosis Support System for Internists"}]},{"@id":"https://cir.nii.ac.jp/crid/1360025439365265792","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study"}]},{"@id":"https://cir.nii.ac.jp/crid/1360294647363271936","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Avoiding premature closure and reaching diagnostic accuracy: some key predictive factors"}]},{"@id":"https://cir.nii.ac.jp/crid/1360294647449823616","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"The Effectiveness of Electronic Differential Diagnoses (DDX) Generators: A Systematic Review and Meta-Analysis"}]},{"@id":"https://cir.nii.ac.jp/crid/1360298345004751744","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Large language models in medicine"}]},{"@id":"https://cir.nii.ac.jp/crid/1360298345004883584","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"How AI Responds to Common Lung Cancer Questions: ChatGPT versus                     Google Bard"}]},{"@id":"https://cir.nii.ac.jp/crid/1360300467680247040","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge"}]},{"@id":"https://cir.nii.ac.jp/crid/1360302865716285568","@type":"Article","resourceType":"学術雑誌論文(journal article)","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Diagnostic excellence in primary care"}]},{"@id":"https://cir.nii.ac.jp/crid/1360306909107191680","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Online Symptom Checkers: Recommendations for a Vignette-Based Clinical Evaluation Standard"}]},{"@id":"https://cir.nii.ac.jp/crid/1360306912222975104","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Radiological Differential Diagnoses Based on Cardiovascular and Thoracic Imaging Patterns: Perspectives of Four Large Language Models"}]},{"@id":"https://cir.nii.ac.jp/crid/1360306913718928256","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"The key role of differential diagnosis in diagnosis"}]},{"@id":"https://cir.nii.ac.jp/crid/1360306914350578304","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Utility of ChatGPT in Clinical Practice"}]},{"@id":"https://cir.nii.ac.jp/crid/1360306914401116288","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Specialization in Medicine"}]},{"@id":"https://cir.nii.ac.jp/crid/1360576122008209664","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Differential Diagnosis Generators: an Evaluation of Currently Available Computer Programs"}]},{"@id":"https://cir.nii.ac.jp/crid/1360579820396469376","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models"}]},{"@id":"https://cir.nii.ac.jp/crid/1360579820499108096","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Diagnostic Excellence"}]},{"@id":"https://cir.nii.ac.jp/crid/1360580235943312640","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Artificial intelligence in healthcare: transforming the practice of medicine"}]},{"@id":"https://cir.nii.ac.jp/crid/1360580236809908864","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Triage Accuracy of Symptom Checker Apps: 5-Year Follow-up Evaluation"}]},{"@id":"https://cir.nii.ac.jp/crid/1360580237161970944","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"An overview of clinical decision support systems: benefits, risks, and strategies for success"}]},{"@id":"https://cir.nii.ac.jp/crid/1360588383961336320","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis"}]},{"@id":"https://cir.nii.ac.jp/crid/1360588387221006336","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Effect on diagnostic accuracy of cognitive reasoning tools for the workplace setting: systematic review and meta-analysis"}]},{"@id":"https://cir.nii.ac.jp/crid/1360857596909208192","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Diagnostic accuracy in Family Medicine residents using a clinical decision support system (DXplain): a randomized-controlled trial"}]},{"@id":"https://cir.nii.ac.jp/crid/1360861711976936320","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Five strategies for clinicians to advance diagnostic excellence"}]},{"@id":"https://cir.nii.ac.jp/crid/1360865816813947136","@type":"Article","resourceType":"学術雑誌論文(journal article)","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"ChatGPT-Generated Differential Diagnosis Lists for Complex Case–Derived Clinical Vignettes: Diagnostic Accuracy Evaluation"}]},{"@id":"https://cir.nii.ac.jp/crid/1360869864299837440","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Large Language Models Answer Medical Questions Accurately, but Can’t Match Clinicians’ Knowledge"}]},{"@id":"https://cir.nii.ac.jp/crid/1360870230673666944","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Artificial intelligence in healthcare: Complementing, not replacing, doctors and healthcare providers"}]},{"@id":"https://cir.nii.ac.jp/crid/1361699994066783360","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"When to use the\n                    <scp>B</scp>\n                    onferroni correction"}]},{"@id":"https://cir.nii.ac.jp/crid/1363951795508084224","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Statistical Methods for Rates and Proportions"}]},{"@id":"https://cir.nii.ac.jp/crid/1370025797874696065","@type":"Product","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"ChatGPT's response consistency: a study on repeated queries of medical examination questions"}]},{"@id":"https://cir.nii.ac.jp/crid/1370025797874696066","@type":"Product","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"A 75-year-old woman with a 5-year history of controlled type 2 diabetes Mellitus presenting with polydipsia and polyuria and a diagnosis of central diabetes insipidus"}]},{"@id":"https://cir.nii.ac.jp/crid/1370025797874696068","@type":"Product","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Performance of ChatGPT, GPT-4, and Google Gemini on a Neurosurgery Oral Boards Preparation Question Bank"}]}],"dataSourceIdentifier":[{"@type":"CROSSREF","@value":"10.1177/20552076241265215"},{"@type":"KAKEN","@value":"PRODUCT-25529449"},{"@type":"KAKEN","@value":"PRODUCT-25770742"},{"@type":"OPENAIRE","@value":"doi_dedup___::43e5816b8ec1d13c19908f31c0b68ead"},{"@type":"CROSSREF","@value":"10.2196/64844_references_DOI_RavWDAMuZAuwZdbfm7aSa63hyKX"},{"@type":"CROSSREF","@value":"10.2196/63010_references_DOI_RavWDAMuZAuwZdbfm7aSa63hyKX"}]}