A textual dataset of de-identified health records in Spanish and Catalan for medical entity recognition and anonymization.

Publication date: Jul 01, 2025

The advancement of clinical natural language processing systems is crucial to exploit the wealth of textual data contained in medical records. Diverse data sources are required in different languages and from different sites to represent global health services. To this end, we have released CARMEN-I, a corpus of anonymized clinical records from the Hospital Clinic of Barcelona written during the COVID-19 pandemic spanning a period of two years. In addition to COVID-19 cases of adult patients, CARMEN-I features multiple comorbidities such as cardiovascular conditions, oncology treatments, post-transplant complications, and infectious diseases. This resource is publicly accessible together with detailed annotation guidelines and granular text-bound annotations generated in a collaborative effort between clinicians, linguists, and engineers to enable training and evaluation of automatic anonymization systems. Moreover, for information extraction purposes, a subset of 500 records is annotated with six relevant clinical concept classes: diseases, symptoms, procedures, medications, pathogens and humans.

Open Access PDF

Concepts Keywords
Clinicians COVID-19
Global Data Anonymization
Pandemic Electronic Health Records
Spanish Humans
Medical Records
Natural Language Processing
Spain

Semantics

Type Source Name
disease IDO entity
disease MESH data sources
disease MESH COVID-19 pandemic
disease MESH complications
disease MESH infectious diseases
drug DRUGBANK Coenzyme M
disease MESH kidney failure
disease MESH respiratory diseases
disease MESH malignancies
disease IDO immunosuppression
disease MESH privacy
disease IDO process
drug DRUGBANK Gold
drug DRUGBANK Esomeprazole
disease IDO history
disease MESH death
drug DRUGBANK Chloroquine
drug DRUGBANK Azithromycin
disease MESH Tachypnea
disease IDO symptom
disease MESH Chest pain
drug DRUGBANK Oxygen
drug DRUGBANK Epinephrine
drug DRUGBANK Dobutamine
disease MESH clinical relevance
disease IDO quality
drug DRUGBANK L-Valine
disease IDO country
drug DRUGBANK Indoleacetic acid
pathway REACTOME Translation
disease IDO facility
disease MESH bacterial pneumonia
drug DRUGBANK Acetaminophen
drug DRUGBANK Ivermectin
drug DRUGBANK Amoxicillin
drug DRUGBANK Clavulanic acid
drug DRUGBANK Piperacillin
drug DRUGBANK Pentaerythritol tetranitrate
disease MESH COPD
disease MESH cardiovascular diseases
disease IDO healthcare facility
disease IDO blood
disease IDO intervention
drug DRUGBANK Meticillin
disease MESH weight loss
drug DRUGBANK Phenolphthalein
drug DRUGBANK Alpha-1-proteinase inhibitor
disease MESH uncertainty
drug DRUGBANK Efavirenz
pathway REACTOME Reproduction

Original Article

(Visited 1 times, 1 visits today)