Performance of ChatGPT and Bard in Self-Assessment Questions for Nephrology Board Renewal

Ryunosuke Noda, Yuto Izaki, Fumiya Kitano, Jun Komatsu, Daisuke Ichikawa, Yugo Shibagaki

doi:10.1007/s10157-023-02451-w

<jats:title>ABSTRACT</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>Large language models (LLMs) pretrained on vast amounts of data have significantly influenced recent advances in artificial intelligence. While GPT-4 has demonstrated high performance in general medical examinations, its performance in specialised areas such as nephrology is unclear. This study aimed to compare ChatGPT and Bard and their potential clinical applications in nephrology.</jats:p></jats:sec><jats:sec><jats:title>Methods</jats:title><jats:p>Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the overall correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 demonstrated significantly higher performance than GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 met the passing criteria in three years. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. The correct answer rate for GPT-4 was intermediate between the rates for third- and fourth-year nephrology residents.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>GPT-4 significantly outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in three of five years. These findings underline the potential applications of LLMs in nephrology as well as their advantages and disadvantages. As LLMs advance, nephrologists must understand their performance and reliability for future applications.</jats:p></jats:sec>

Performance of ChatGPT and Bard in Self-Assessment Questions for Nephrology Board Renewal

この論文をさがす

説明

収録刊行物

被引用文献 (1)*注記

詳細情報詳細情報について

書き出し

問題の指摘

Performance of ChatGPT and Bard in Self-Assessment Questions for Nephrology Board Renewal

この論文をさがす

説明

収録刊行物

被引用文献 (1)*注記

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について