From Sequences to Strategies: Early Detection of New SARS-CoV-2 Variants via Genetic Distance to Reduce Hospitalizations

Publication date: Sep 04, 2025

The COVID-19 pandemic highlighted the critical need for robust methods to monitor viral evolution and detect emerging variants of concern (VOCs). Traditional genomic surveillance often lacks predictive power. This study expanded an unsupervised machine learning clustering algorithm, based on SARS-CoV-2 Spike protein Levenshtein distance, to track and predict variant predominance across six European countries from 2020 to January 2024. We also investigated the influence of genetic distances and containment strategies on hospitalization rates. Sequences were transformed into temporal chains, and growth parameters were extracted via sigmoid fitting. A deep neural network (DNN) was trained to classify emerging chains as likely dominant, while a CatBoost model assessed variable importance for predicting weekly hospitalizations in Denmark. Simulations explored modifying vaccine genetic distance, containment measures, and VCR. Approximately 5,000 sequences per week enabled early chain detection within four weeks. The DNN achieved near-perfect classification of chain predominance within 3-4 weeks of appearance. Genetic distances within consecutive chains and with vaccine strains were significant predictors of hospitalizations. Simulations suggest that better-matched vaccines or stricter containment measures could reduce hospitalizations. Doubling vaccination coverage alone had minimal effect but showed additional reductions when combined with strict containment. This integrated framework demonstrates the utility of combining unsupervised and supervised machine learning for real-time tracking and prediction of SARS-CoV-2 variant dynamics and their impact on public health. Our findings underscore the critical role of genetic distances and effective public health interventions in mitigating the burden of emerging variants, supporting timely genomic surveillance and adaptive public health strategies.

PDF

Concepts Keywords
Bach Chain
Hospitalization Chains
Italy Cov
Early
Genetic
Hospitalizations
Https
Medrxiv
Preprint
Prevalence
Sars
Sequences
Vaccine
Variant
Variants

Semantics

Type Source Name
disease MESH COVID-19 pandemic
disease IDO algorithm
drug DRUGBANK Tropicamide
disease IDO role
disease MESH infectious diseases
drug DRUGBANK Diethylstilbestrol
disease IDO host
disease MESH infections
disease MESH emergency
disease MESH reinfection
disease IDO country
drug DRUGBANK Saquinavir
disease MESH confusion
disease IDO intervention
disease IDO infected population
drug DRUGBANK Coenzyme M
disease IDO infection
disease MESH influenza
disease MESH uncertainty

Download Document

(Visited 4 times, 1 visits today)

Leave a Comment

Your email address will not be published. Required fields are marked *