人工智能与医学

基于ChatGPT-4o与DeepSeek的虚拟标准化患者系统在医学问诊教学中的比较研究

A comparative study of ChatGPT-4o and DeepSeek-based virtual standardized patient systems in medical interview training

:1346-1352
 
背景 虚拟标准化患者作为医学教育中的新型教学工具, 已广泛用于提升学生的临床问诊能力。随着生成式人工智能的快速发展, 基于大语言模型(LLMs)构建的VSP系统成为研究热点。然而, 目前尚缺乏对不同LLM在模拟患者角色方面表现的系统比较。目的 比较ChatGPT-4o与DeepSeek两种主流LLM在VSP模拟中的适用性, 评估其在病史采集、语言自然度、线索引导能力及教学辅助效果等方面的表现差异。方法 采用类实验研究,参与者为某医学院校临床医学专业本科四年级学生, 所有参与者均已修完《诊断学》课程, 具备基础问诊技能, 研究对象共60人, 按学号尾数单双分为两组, 分别与ChatGPT-4o或DeepSeek驱动的VSP系统进行交互。进行模拟急性阑尾炎问诊, 并在完成病史采集后提交诊断判断与体验问卷。结果 ChatGPT-4o在结构化信息整合、线索引导及技术稳定性方面更为优越, 而DeepSeek则在语言亲和力与情感回应方面表现更具人文关怀色彩。结论 不同LLM在VSP中的优势方向不同, 可根据教学目标进行有针对性地系统选择与设计。未来研究可进一步拓展至不同病种、交互方式及评估维度,以全面评估LLM驱动VSP在医学教育场景下的适应性与教学成效。
Background Virtual standardized patients(VSPs)have emerged as a novel tool in medical education, widely adopted to enhance students’ clinical interview skills.With the rapid development of generative artificial intelligence, VSP systems powered by large language models(LLMs)have become a new focus of research.However, few studies have systematically compared the performance of different LLMs in simulating patient roles.Objective This study aims to compare the applicability of two mainstream LLMs, ChatGPT-4o and DeepSeek, in VSP-based medical interview simulations, focusing on their differences in history-taking performance,linguistic naturalness, clue guidance,and educational support.Methods A quasi-experimental study was conducted involving 60 fourth-year clinical medicine undergraduates from a medical school.All participants had completed a diagnostics course and possessed basic interviewing skills.Students were assigned to either the ChatGPT-4o or DeepSeek group based on the parity of their student ID numbers.Each participant conducted a text-based simulated interview with a VSP presenting with acute appendicitis, then submitted both a preliminary diagnosis and a structured satisfaction questionnaire.Results ChatGPT-4o demonstrated superior performance in structured information integration, clue-based prompting, and system stability.In contrast, DeepSeek showed more natural language affinity and emotional responsiveness,reflecting stronger humanistic communication traits.The two models displayed divergent strengths within the VSP framework, suggesting that system selection and integration should be tailored to specific teaching objectives.Conclusions Future research should expand the scope to include diverse disease scenarios, interaction modalities, and evaluation dimensions, to comprehensively assess the educational utility and adaptability of LLM-driven VSP systems in medical training.
出版者信息








《广州医药》公众号