论著

基于随机森林算法建立甲状腺功能减退患病风险预测模型

Establishing a hypothyroidism risk prediction model based on random forest algorithm

:16-24
 
目的 基于随机森林方法构建甲状腺功能减退(简称甲减)患病风险预测模型。方法 从MIMIC-IV数据库纳入5 735名甲减患者为病例组,4 803名非甲减患者为对照组,基于随机森林模型进行建模。同时利用逻辑回归、贝叶斯正则化神经网络、XGBoost作为比较模型。最后用准确率、F1分数、精确率、召回率、特异性以及AUC值评价四个机器学习模型性能。结果 随机森林模型准确率为0.85,F1分数为0.84,精确率为0.84,召回率为0.84,特异性为0.86,AUC值为0.91。在该模型中,促甲状腺激素、年龄、绝对淋巴细胞计数、血液中红细胞数、中性白细胞、性别、碱性磷酸酶、丙氨酸氨基转移酶、嗜酸性粒细胞绝对计数、尿素氮为甲减患者诊断重要性排前10的指标。结论 采用随机森林方法构建的甲减患病预测模型为甲减的早期诊断有潜在应用价值。
Objective To construct a risk prediction model for hypothyroidism based on the random forest model.Methods A total of 5 735 hypothyroidism patients were included from the MIMIC-IV database as the case group, and 4 803 non-hypothyroidism patients were included as the control group.Random forest models were constructed for both groups, and logistic regression, Bayesian regularized neural network, and XGBoost were used as comparative models.The performance of the four machine learning models was evaluated using accuracy, F1 score, precision, recall, specificity, and AUC value.Results The random forest model had an accuracy of 0.85, an F1 score of 0.84, a precision of 0.84, a recall of 0.84, a specificity of 0.86, and an AUC value of 0.91.In this model, thyroid-stimulating hormone, age, absolute lymphocyte count, red blood cell count in blood, neutrophil, gender, alkaline phosphatase, aspartate aminotransferase, absolute eosinophil count, and blood urea nitrogen were the top 10 indicators for diagnosing hypothyroidism patients.Conclusions The hypothyroidism disease prediction model constructed using the random forest method has potential application value for the early diagnosis of hypothyroidism.
出版者信息








《广州医药》公众号