首页

论著

基于超声与钼靶报告及影像的大模型诊断性能评估

：70-76

摘要

浏览

PDF

大语言模型乳腺癌超声钼靶

目的评估ChatGPT 4与Llama 3微调模型在乳腺癌诊断中的应用效果，特别是在超声、钼靶及超声联合钼靶的非结构化报告和影像诊断方面。方法回顾性收集了689例同时接受乳腺超声和钼靶检查的患者数据，比较两种模型在文本和图像模态下的诊断性能，并探讨乳腺密度对模型表现的影响。结果在文本模态下，微调Llama 3表现优异，联合诊断准确率达91.7%，优于ChatGPT 4的71.7%。图像模态中两模型准确率均低于70%，但ChatGPT 4灵敏度较高（78.3%），Llama 3特异度突出（98.3%）。分组分析表明，在非致密型乳腺中钼靶表现更佳，而致密型乳腺中超声诊断更具优势。结论大语言模型在医学图像处理和多模态整合方面仍需进一步优化，医学领域微调的大语言模型在处理非结构化临床文本方面具有潜力。

Objective To evaluate the application effectiveness of ChatGPT 4 and the fine-tuned Llama 3 model in breast cancer diagnosis，particularly in processing unstructured reports and diagnostic imaging of ultrasound，mammography，and their combined modalities．Methods Retrospective data from 689 patients who underwent both breast ultrasound and mammography examinations were collected．The diagnostic performance of the two models was compared across text and image modalities，and the impact of breast density on model performance was explored．Results In the text modality，the fine-tuned Llama 3 model performed excellently，achieving a combined diagnostic accuracy of 91.7%，outperforming 71.7% of ChatGPT 4．In the image modality，both models had accuracies below 70%，but ChatGPT 4 exhibited higher sensitivity（78.3%），while Llama 3 demonstrated outstanding specificity（98.3%）．Subgroup analysis indicated that mammography performed better in non-dense breasts，whereas ultrasound was more advantageous in dense breasts．Conclusions The large language models still require further optimization in medical image processing and multimodal integration，but fine-tuned large language models in the medical field show potential in handling unstructured clinical texts．