COVID-19 risk stratification among older adults: a machine learning approach to identify personal and health-related risk factors.

Publication date: Jul 29, 2025

The COVID-19 pandemic highlighted the need to understand factors influencing individuals’ risk perceptions and health behaviors. This study aimed to explore the roles of individuals’ knowledge, perception, and health-related issues in determining COVID-19 risk by developing a predictive model for classifying individuals into the risk categories, incorporating both clustering and model interpretation techniques. To identify distinct COVID-19 risk groups, clustering analysis was applied using the demographic, health, and behavioral data. Subsequently, several machine learning models-including CatBoost, XGBoost, Random Forest, Generalized Linear Model (GLM), Decision Tree, H2O Deep Neural Network (DNN), and L2 SVM-were used to predict risk classifications. SHAP (SHapley Additive exPlanations) analysis was applied to interpret the contribution of individual features in model predictions. Three distinct risk classes were identified: Class 0 (high knowledge, low-risk factors, no household COVID-19 diagnosis), Class 1 (health-related issues (e. g., hypertension), low lnowldge), and Class 2 (high knowledge, higher health risks (e. g., hypertension, household COVID-19 diagnosis)). L2 SVM achieved the highest accuracy (0. 9724), followed by XGBoost (0. 9301) and CatBoost (0. 9265). SHAP analysis revealed that household hygiene practices and health-related issues, such as hypertension and Gastrointestinal symptoms were key drivers of risk classification. Integrating individuals’ knowledge, perception, and health-related issues into COVID-19 risk assessments enhances predictive accuracy. Public health policies should focus on both physical and psychological factors to effectively mitigate the spread and impact of COVID-19. Data-driven models may inform future efforts to prioritize resource allocation and improve public health responses for vulnerable populations.

Open Access PDF

Concepts Keywords
Catboost COVID-19
Drivers Health behavior
Forest Machine learning
Gastrointestinal Perception
Predictive learning models

Semantics

Type Source Name
disease MESH COVID-19
disease MESH hypertension
disease MESH Long Covid
pathway REACTOME Reproduction
drug DRUGBANK Coenzyme M
disease MESH infection
drug DRUGBANK Piroxicam
drug DRUGBANK Fenamole
disease IDO symptom
disease IDO algorithm
disease IDO quality
disease IDO process
drug DRUGBANK Flunarizine
drug DRUGBANK Isoxaflutole
disease MESH cancer
disease MESH pulmonary diseases
disease MESH confusion
drug DRUGBANK Saquinavir
disease IDO role
drug DRUGBANK Tretamine
disease IDO history
disease IDO susceptibility
disease MESH lifestyle
disease MESH anxiety
disease MESH cardiovascular disease
disease MESH causality
disease IDO intervention
drug DRUGBANK Ademetionine
disease MESH morbidity
drug DRUGBANK Methyl isocyanate
disease MESH syndrome
drug DRUGBANK Guanosine
drug DRUGBANK Alteplase
drug DRUGBANK Carboxyamidotriazole
disease MESH Death
drug DRUGBANK Trestolone
drug DRUGBANK Etoperidone
disease MESH social vulnerability
disease MESH pulmonary hypertension

Original Article

(Visited 2 times, 1 visits today)