论著

基于超声与钼靶报告及影像的大模型诊断性能评估

Evaluation of large language models’ diagnostic performance based on ultrasound and mammography reports and images

:70-76
 
       目的   评估ChatGPT 4与Llama 3微调模型在乳腺癌诊断中的应用效果,特别是在超声、钼靶及超声联合钼靶的非结构化报告和影像诊断方面。方法   回顾性收集了689例同时接受乳腺超声和钼靶检查的患者数据,比较两种模型在文本和图像模态下的诊断性能,并探讨乳腺密度对模型表现的影响。结果   在文本模态下,微调Llama 3表现优异,联合诊断准确率达91.7%,优于ChatGPT 4的71.7%。图像模态中两模型准确率均低于70%,但ChatGPT 4灵敏度较高(78.3%),Llama 3特异度突出(98.3%)。分组分析表明,在非致密型乳腺中钼靶表现更佳,而致密型乳腺中超声诊断更具优势。   大语言模型在医学图像处理和多模态整合方面仍需进一步优化,医学领域微调的大语言模型在处理非结构化临床文本方面具有潜力。
       Objective  To evaluate the application effectiveness of ChatGPT 4 and the fine-tuned Llama 3 model in breast cancer diagnosis,particularly in processing unstructured reports and diagnostic imaging of ultrasound,mammography,and their combined modalities.Methods  Retrospective data from 689 patients who underwent both breast ultrasound and mammography examinations were collected.The diagnostic performance of the two models was compared across text and image modalities,and the impact of breast density on model performance was explored.Results  In the text modality,the fine-tuned Llama 3 model performed excellently,achieving a combined diagnostic accuracy of 91.7%,outperforming 71.7% of ChatGPT 4.In the image modality,both models had accuracies below 70%,but ChatGPT 4 exhibited higher sensitivity(78.3%),while Llama 3 demonstrated outstanding specificity(98.3%).Subgroup analysis indicated that mammography performed better in non-dense breasts,whereas ultrasound was more advantageous in dense breasts.Conclusions  The large language models  still  require further optimization in medical image processing and multimodal integration,but fine-tuned large language models in the medical field show potential in handling unstructured clinical texts.
出版者信息








《广州医药》公众号