Unsupervised machine learning clustering approach for hospitalized COVID-19 pneumonia patients.

Publication date: Feb 08, 2025

Identification of distinct clinical phenotypes of diseases can guide personalized treatment. This study aimed to classify hospitalized COVID-19 pneumonia subgroups using an unsupervised machine learning approach. We included hospitalized COVID-19 pneumonia patients from July to September 2021. K-means clustering, an unsupervised machine learning method, was performed to identify clinical phenotypes based on clinical and laboratory variables collected within 24 hours of admission. Variables were normalized before clustering to ensure equal contribution to the analysis. The optimal number of clusters was determined using the elbow method and Silhouette scores. Cox proportional hazard models were used to compare the risk of intubation and 90-day mortality across the identified clusters. Three clinically distinct clusters were identified among 538 hospitalized COVID-19 pneumonia patients. Cluster 1 (N = 27) consisted predominantly of males and showed significantly elevated serum liver enzymes and LDH levels. Cluster 2 (N = 370) was characterized by lower chest x-ray scores and higher serum albumin levels. Cluster 3 (N = 141) was characterized by older age, diabetes mellitus, higher chest x-ray scores, more severe vital signs, higher creatinine levels, lower hemoglobin levels, lower lymphocyte counts, higher C-reactive protein, higher D-dimer, and higher LDH levels. When compared to cluster 2, cluster 3 was significantly associated with increased risk of 90-day mortality (HR, 6. 24; 95% CI, 2. 42-16. 09) and intubation (HR, 5. 26; 95% CI 2. 37-11. 72). In contrast, cluster 1 had a 100% survival rate with a non-significant increase in intubation risk compared to cluster 2 (HR, 1. 40, 95% CI, 0. 18-11. 04). We identified three distinct clinical phenotypes of COVID-19 pneumonia patients, with cluster 3 associated with an increased risk of respiratory failure and mortality. These findings may guide tailored clinical management strategies.

Open Access PDF

Concepts Keywords
Enzymes Aged
Hemoglobin Cluster Analysis
July Clustering analysis
Males COVID-19
Pneumonia COVID-19
Female
Hospitalization
Humans
Machine learning
Male
Middle Aged
Mortality
Pneumonia
Proportional Hazards Models
Retrospective Studies
SARS-CoV-2
Unsupervised Machine Learning

Semantics

Type Source Name
disease MESH COVID-19
disease MESH pneumonia
drug DRUGBANK Aspartame
drug DRUGBANK Human Serum Albumin
disease MESH diabetes mellitus
drug DRUGBANK Creatinine
disease MESH respiratory failure
pathway REACTOME Reproduction
drug DRUGBANK Coenzyme M
disease MESH death
disease MESH heart disease
disease MESH obesity
disease MESH cancer
disease MESH chronic kidney disease
disease MESH symptom clusters
drug DRUGBANK Trestolone
disease IDO blood
drug DRUGBANK Oxygen
drug DRUGBANK Urea
drug DRUGBANK Nitrogen
drug DRUGBANK Honey
disease IDO algorithm
drug DRUGBANK Medical air
disease MESH lung diseases
disease MESH Cerebrovascular diseases
disease MESH Chronic renal failure
disease MESH Chest pain
disease MESH Dyspnea
disease MESH Sore throat
disease IDO symptom
disease IDO protein
drug DRUGBANK Alkaline Phosphatase
disease IDO cell
drug DRUGBANK Piroxicam
disease MESH nutritional status
disease MESH inflammation
drug DRUGBANK Fibrinogen Human
disease MESH Comorbidity
drug DRUGBANK Troleandomycin
disease MESH Mody
drug DRUGBANK Guanosine

Original Article

(Visited 3 times, 1 visits today)