ChatGPT bombs test on diagnosing kids’ medical cases with 83% error rate | It was bad at recognizing relationships and needs selective training, researchers say.

L4sBot@lemmy.world · 10 months ago

ChatGPT bombs test on diagnosing kids’ medical cases with 83% error rate | It was bad at recognizing relationships and needs selective training, researchers say.

kromem@lemmy.world · 10 months ago

Because when you use the SotA model and best practices in prompting it actually can do a lot of things really well, including diagnose medical cases:

We assessed the performance of the newly released AI GPT-4 in diagnosing complex medical case challenges and compared the success rate to that of medical-journal readers. GPT-4 correctly diagnosed 57% of cases, outperforming 99.98% of simulated human readers generated from online answers. We highlight the potential for AI to be a powerful supportive tool for diagnosis

Use of GPT-4 to Diagnose Complex Clinical Cases

The OP study isn’t using GPT-4. It’s using GPT-3.5, which is very dumb. So the finding is less “LLMs can’t diagnose pediatric cases” and more “we don’t know how to do meaningful research on LLMs in medicine.”