Publication date: Jul 29, 2025
The COVID-19 pandemic highlighted the need to understand factors influencing individuals’ risk perceptions and health behaviors. This study aimed to explore the roles of individuals’ knowledge, perception, and health-related issues in determining COVID-19 risk by developing a predictive model for classifying individuals into the risk categories, incorporating both clustering and model interpretation techniques. To identify distinct COVID-19 risk groups, clustering analysis was applied using the demographic, health, and behavioral data. Subsequently, several machine learning models-including CatBoost, XGBoost, Random Forest, Generalized Linear Model (GLM), Decision Tree, H2O Deep Neural Network (DNN), and L2 SVM-were used to predict risk classifications. SHAP (SHapley Additive exPlanations) analysis was applied to interpret the contribution of individual features in model predictions. Three distinct risk classes were identified: Class 0 (high knowledge, low-risk factors, no household COVID-19 diagnosis), Class 1 (health-related issues (e. g., hypertension), low lnowldge), and Class 2 (high knowledge, higher health risks (e. g., hypertension, household COVID-19 diagnosis)). L2 SVM achieved the highest accuracy (0. 9724), followed by XGBoost (0. 9301) and CatBoost (0. 9265). SHAP analysis revealed that household hygiene practices and health-related issues, such as hypertension and Gastrointestinal symptoms were key drivers of risk classification. Integrating individuals’ knowledge, perception, and health-related issues into COVID-19 risk assessments enhances predictive accuracy. Public health policies should focus on both physical and psychological factors to effectively mitigate the spread and impact of COVID-19. Data-driven models may inform future efforts to prioritize resource allocation and improve public health responses for vulnerable populations.
Open Access PDF
| Concepts | Keywords |
|---|---|
| Catboost | COVID-19 |
| Drivers | Health behavior |
| Forest | Machine learning |
| Gastrointestinal | Perception |
| Predictive learning models |
Semantics
| Type | Source | Name |
|---|---|---|
| disease | MESH | COVID-19 |
| disease | MESH | hypertension |
| disease | MESH | Long Covid |
| pathway | REACTOME | Reproduction |
| drug | DRUGBANK | Coenzyme M |
| disease | MESH | infection |
| drug | DRUGBANK | Piroxicam |
| drug | DRUGBANK | Fenamole |
| disease | IDO | symptom |
| disease | IDO | algorithm |
| disease | IDO | quality |
| disease | IDO | process |
| drug | DRUGBANK | Flunarizine |
| drug | DRUGBANK | Isoxaflutole |
| disease | MESH | cancer |
| disease | MESH | pulmonary diseases |
| disease | MESH | confusion |
| drug | DRUGBANK | Saquinavir |
| disease | IDO | role |
| drug | DRUGBANK | Tretamine |
| disease | IDO | history |
| disease | IDO | susceptibility |
| disease | MESH | lifestyle |
| disease | MESH | anxiety |
| disease | MESH | cardiovascular disease |
| disease | MESH | causality |
| disease | IDO | intervention |
| drug | DRUGBANK | Ademetionine |
| disease | MESH | morbidity |
| drug | DRUGBANK | Methyl isocyanate |
| disease | MESH | syndrome |
| drug | DRUGBANK | Guanosine |
| drug | DRUGBANK | Alteplase |
| drug | DRUGBANK | Carboxyamidotriazole |
| disease | MESH | Death |
| drug | DRUGBANK | Trestolone |
| drug | DRUGBANK | Etoperidone |
| disease | MESH | social vulnerability |
| disease | MESH | pulmonary hypertension |