Detecting Conversation Topics in Recruitment Calls of African American Participants to the All of Us Research Program Using Machine Learning: Model Development and Validation Study.

Publication date: Jul 17, 2025

Advancements in science and technology can exacerbate health disparities, particularly when there is a lack of diversity in clinical research, which limits the benefits of innovations for underrepresented communities. Programs like the All of Us Research Program (AoURP) are actively working to address this issue by ensuring that underrepresented populations are represented in biomedical research, promoting equitable participation, and advancing health outcomes for all. African American communities have been particularly underrepresented in clinical research, often due to historical instances of research misconduct, such as the Tuskegee Syphilis Study, which have deeply impacted trust and willingness to participate in research studies. With the US population becoming increasingly diverse, it is crucial that clinical research studies reflect this diversity to improve health outcomes. However, limited data and small sample sizes in qualitative studies on the inclusion of underrepresented groups hinder progress in this area. The goal of this paper is to analyze recruitment conversations between research assistants (RAs) and potential participants in the AoURP to identify key topics that influence enrollment. By examining these interactions, we aim to provide insights that can improve engagement strategies and recruitment practices for underrepresented groups in biomedical research. Our study design was an observational, retrospective approach using machine learning for content analysis. Specifically, we used structural topic modeling to identify and compare latent topics of conversation in recruitment calls by Morehouse School of Medicine RAs between February 2021 and April 2022 by estimating expected topic proportions in the corpus as a function of enrollment and participation in AoURP. In total, our model estimated 45 topics of which 12 coherent topics were identified. Notable topics, that were more likely to occur in conversations between RAs and participants that enrolled and participated, include closing or following up to schedule an appointment, COVID-19 protocols for in-person visits, explaining precision medicine and the need for representation, and working through objections, including concerns about costs, insurance, care changes, and health fears. Topics among potential participants who did not enroll include technical challenges and describing physical measurement visits (eg, collection of basic physical data, such as height, weight, and blood pressure). Using an approach that leverages machine learning to identify topical structure and themes with limited human subjectivity is a promising strategy to identify gaps in, and opportunities to improve, the recruitment of underserved communities into clinical trials.

Open Access PDF

Concepts Keywords
African Adult
Biomedical Biomedical Research
Recruitment clinical trials
Tuskegee Communication
Underserved COVID-19
diversity
Female
Humans
Machine Learning
Male
Patient Selection
precision medicine
recruitment
trust
United States
White

Semantics

Type Source Name
disease MESH health disparities
disease MESH Syphilis
drug DRUGBANK Rasagiline
disease MESH COVID-19
disease IDO blood
drug DRUGBANK Methylphenidate
drug DRUGBANK Coenzyme M
disease IDO history
disease IDO intervention
disease IDO process
drug DRUGBANK Dimethyl sulfone
drug DRUGBANK Etoperidone
drug DRUGBANK Methionine
disease MESH Lifestyle
disease MESH privacy
drug DRUGBANK Nonoxynol-9
disease IDO country
disease IDO facility
drug DRUGBANK Pentaerythritol tetranitrate
disease MESH atrocities
drug DRUGBANK Spinosad
disease MESH Cancer
pathway REACTOME Reproduction

Original Article

(Visited 3 times, 1 visits today)