Extracting circumstances of Covid-19 transmission from free text with large language models.

Publication date: Jul 01, 2025

Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.

Open Access PDF

Concepts Keywords
Agnostically Adult
France COVID-19
Friends Female
Nationwide France
Humans
Language
Large Language Models
Male
Middle Aged
SARS-CoV-2
Surveys and Questionnaires

Semantics

Type Source Name
disease MESH Covid-19
disease MESH emerging infectious disease
pathway REACTOME SARS-CoV-2 Infection
disease MESH infection
disease MESH pathogen transmission
drug DRUGBANK L-Citrulline
drug DRUGBANK Huperzine B
drug DRUGBANK Diethylstilbestrol
pathway REACTOME Translation
disease IDO contact tracing
disease MESH infectious diseases
drug DRUGBANK Ranitidine
disease MESH uncertainty

Original Article

(Visited 3 times, 1 visits today)