广州医药 ›› 2023, Vol. 54 ›› Issue (7): 16-24.DOI: 10.3969/j.issn.1000-8535.2023.07.003

• 论著 • 上一篇    下一篇

基于随机森林算法建立甲状腺功能减退患病风险预测模型

杨正霞1, 王和勇1, 贺施琪2, 刘城3, 王天逸4, 张圣辉3, 毛晓健5   

  1. 1 华南理工大学电子商务系(广州 510006);
    2 广州华银医学检验中心有限公司(广州 510670);
    3 广州市第一人民医院心内科,华南理工大学(广州 510180);
    4 广州医科大学(广州 510660);
    5 广州医科大学附属广州市妇女儿童医疗中心(广州 510623)
  • 收稿日期:2023-01-13 出版日期:2023-07-20 发布日期:2023-08-15
  • 通讯作者: 毛晓健,E-mail:xjamao@tom.com

Establishing a hypothyroidism risk prediction model based on random forest algorithm

YANG Zhengxia1, WANG Heyong1, HE Shiqi2, LIU Cheng3, WANG Tiangyi4, ZHANG Shenghui3, MAO Xiaojian5   

  1. 1 Department of Electronic Business, South China University of Technology, Guangzhou 510006, China;
    2 Guangzhou Huayin Medical Laboratory Center Co., Ltd., Guangzhou 510670, China;
    3 Department of Cardiology, Guangzhou First People's Hospital, South China University of Technology, Guangzhou 510180, China;
    4 Guangzhou Medical University, Guangzhou 510660, China;
    5 Guangzhou Women and Children's Medical Center Affiliated to Guangzhou Medical University, Guangzhou 510623, China
  • Received:2023-01-13 Online:2023-07-20 Published:2023-08-15

摘要: 目的 基于随机森林方法构建甲状腺功能减退(简称甲减)患病风险预测模型。方法 从MIMIC-IV数据库纳入5 735名甲减患者为病例组,4 803名非甲减患者为对照组,基于随机森林模型进行建模。同时利用逻辑回归、贝叶斯正则化神经网络、XGBoost作为比较模型。最后用准确率、F1分数、精确率、召回率、特异性以及AUC值评价四个机器学习模型性能。结果 随机森林模型准确率为0.85,F1分数为0.84,精确率为0.84,召回率为0.84,特异性为0.86,AUC值为0.91。在该模型中,促甲状腺激素、年龄、绝对淋巴细胞计数、血液中红细胞数、中性白细胞、性别、碱性磷酸酶、丙氨酸氨基转移酶、嗜酸性粒细胞绝对计数、尿素氮为甲减患者诊断重要性排前10的指标。结论 采用随机森林方法构建的甲减患病预测模型为甲减的早期诊断有潜在应用价值。

关键词: 甲状腺功能减退症, 随机森林, 预测模型, MIMIC-IV数据库

Abstract: Objective To construct a risk prediction model for hypothyroidism based on the random forest model.Methods A total of 5 735 hypothyroidism patients were included from the MIMIC-IV database as the case group, and 4 803 non-hypothyroidism patients were included as the control group.Random forest models were constructed for both groups, and logistic regression, Bayesian regularized neural network, and XGBoost were used as comparative models.The performance of the four machine learning models was evaluated using accuracy, F1 score, precision, recall, specificity, and AUC value.Results The random forest model had an accuracy of 0.85, an F1 score of 0.84, a precision of 0.84, a recall of 0.84, a specificity of 0.86, and an AUC value of 0.91.In this model, thyroid-stimulating hormone, age, absolute lymphocyte count, red blood cell count in blood, neutrophil, gender, alkaline phosphatase, aspartate aminotransferase, absolute eosinophil count, and blood urea nitrogen were the top 10 indicators for diagnosing hypothyroidism patients.Conclusions The hypothyroidism disease prediction model constructed using the random forest method has potential application value for the early diagnosis of hypothyroidism.

Key words: hypothyroidism, random forest, predictive model, MIMIC-IV database