广州医药 ›› 2025, Vol. 56 ›› Issue (10): 1346-1352.DOI: 10.20223/j.cnki.1000-8535.2025.10.004

• 人工智能与医学 • 上一篇    下一篇

基于ChatGPT-4o与DeepSeek的虚拟标准化患者系统在医学问诊教学中的比较研究

李婕1,2, 梁国强3, 王飞4, 林泽宇5, 陈柏权1,2, 刘雪萍1,2, 孟洋阳1,2   

  1. 1 中山大学附属第六医院教务处(广东广州 510655)
    2 广州市中六黄埔区生物医学创新研究院(广东广州 510799)
    3 中国人民解放军南部战区总医院妇产科(广东广州 510000)
    4 南方医科大学继续教育学院(广东广州 510515)
    5 中山大学附属第六医院肝胆胰脾外科(广东广州 510655)
  • 收稿日期:2025-05-07 出版日期:2025-10-20 发布日期:2025-11-28
  • 通讯作者: 孟洋阳,E-mail:mengyy3@mail.sysu.edu.cn
  • 基金资助:
    广东省高等教育学会十四五规划2025年度高等教育研究课题(25GYB003)-《AI驱动的医学生临床思维能力支持机制与教学模式创新路径探索》; 2024年中山大学产学合作育人项目(241202943180205)-基于标准化病人(SP)的多学科视角下医患沟通模拟训练课程

A comparative study of ChatGPT-4o and DeepSeek-based virtual standardized patient systems in medical interview training

LI Jie1,2, LIANG Guoqiang3, WANG Fei4, LIN Zeyu5, CHEN Baiquan1,2, LIU Xueping1,2, MENG Yangyang1,2   

  1. 1 Department of Academic Affairs,the Sixth Affiliated Hospital of Sun Yat-sen University,Guangzhou 510655,China
    2 Guangzhou Huangpu Biomedical Innovation Research Institute,the Sixth Affiliated Hospital of Sun Yat-sen University,Guangzhou 510799,China
    3 Department of Obstetrics and Gynecology,General Hospital of Southern Theater Command of the Chinese People’s Liberation Army,Guangzhou 510000,China
    4 School of Continuing Education,Southern Medical University,Guangzhou 510515,China
    5 Department of Hepatobiliary,Pancreatic and Spleen Surgery,the Sixth Affiliated Hospital of Sun Yat-sen University,Guangzhou 510655,China
  • Received:2025-05-07 Online:2025-10-20 Published:2025-11-28

摘要: 背景 虚拟标准化患者作为医学教育中的新型教学工具,已广泛用于提升学生的临床问诊能力。随着生成式人工智能的快速发展,基于大语言模型(LLMs)构建的VSP系统成为研究热点。然而,目前尚缺乏对不同LLM在模拟患者角色方面表现的系统比较。目的 比较ChatGPT-4o与DeepSeek两种主流LLM在VSP模拟中的适用性,评估其在病史采集、语言自然度、线索引导能力及教学辅助效果等方面的表现差异。方法 采用类实验研究,参与者为某医学院校临床医学专业本科四年级学生,所有参与者均已修完《诊断学》课程,具备基础问诊技能,研究对象共60人,按学号尾数单双分为两组,分别与ChatGPT-4o或DeepSeek驱动的VSP系统进行交互。进行模拟急性阑尾炎问诊,并在完成病史采集后提交诊断判断与体验问卷。结果 ChatGPT-4o在结构化信息整合、线索引导及技术稳定性方面更为优越,而DeepSeek则在语言亲和力与情感回应方面表现更具人文关怀色彩。结论 不同LLM在VSP中的优势方向不同,可根据教学目标进行有针对性地系统选择与设计。未来研究可进一步拓展至不同病种、交互方式及评估维度,以全面评估LLM驱动VSP在医学教育场景下的适应性与教学成效。

关键词: 虚拟标准化患者, 大语言模型, AI, 医患沟通, 生成式人工智能

Abstract: Background Virtual standardized patients(VSPs)have emerged as a novel tool in medical education,widely adopted to enhance students’ clinical interview skills.With the rapid development of generative artificial intelligence,VSP systems powered by large language models(LLMs)have become a new focus of research.However,few studies have systematically compared the performance of different LLMs in simulating patient roles.Objective This study aims to compare the applicability of two mainstream LLMs,ChatGPT-4o and DeepSeek,in VSP-based medical interview simulations,focusing on their differences in history-taking performance,linguistic naturalness,clue guidance,and educational support.Methods A quasi-experimental study was conducted involving 60 fourth-year clinical medicine undergraduates from a medical school.All participants had completed a diagnostics course and possessed basic interviewing skills.Students were assigned to either the ChatGPT-4o or DeepSeek group based on the parity of their student ID numbers.Each participant conducted a text-based simulated interview with a VSP presenting with acute appendicitis,then submitted both a preliminary diagnosis and a structured satisfaction questionnaire.Results ChatGPT-4o demonstrated superior performance in structured information integration,clue-based prompting,and system stability.In contrast,DeepSeek showed more natural language affinity and emotional responsiveness,reflecting stronger humanistic communication traits.The two models displayed divergent strengths within the VSP framework,suggesting that system selection and integration should be tailored to specific teaching objectives.Conclusions Future research should expand the scope to include diverse disease scenarios,interaction modalities,and evaluation dimensions,to comprehensively assess the educational utility and adaptability of LLM-driven VSP systems in medical training.

Key words: virtual standardized patients, large language models, artificial intelligence, medical communication, generative AI