CTGAN-driven synthetic data generation: A multidisciplinary, expert-guided approach (TIMA).

Publication date: Feb 01, 2025

We generated synthetic data starting from a population of two hundred thirty-eight adults SARS-CoV-2 positive patients admitted to the University Hospital of Brussels, Belgium, in 2020, utilizing a Conditional Tabular Generative Adversarial Network (CTGAN)-based technique with the aim of testing the performance, representativeness, realism, novelty, and diversity of synthetic data generated from a small patient sample. A Multidisciplinary Approach (TIMA) incorporates active participation from a medical team throughout the various stages of this process. The TIMA committee scrutinized data for inconsistencies, implementing stringent rules for variables unlearned by the system. A sensitivity analysis determined 100,000 epochs, leading to the generation of 10,000 synthetic data. The model’s performance was tested using a general-purpose dataset, comparing real and synthetic data. Outcomes indicate the robustness of our model, with an average contingency score of 0. 94 across variable pairs in synthetic and real data. Continuous variables exhibited a median correlation similarity score of 0. 97. Novelty received a top score of 1. Principal Component Analysis (PCA) on synthetic values demonstrated diversity, as no patient pair displayed a zero or close-to-zero value distance. Remarkably, the TIMA committee’s evaluation revealed that synthetic data was recognized as authentic by nearly 100%. Our trained model exhibited commendable performance, yielding high representativeness in the synthetic dataset compared to the original. The synthetic dataset proved realistic, boasting elevated levels of novelty and diversity.

Concepts Keywords
Belgium Adult
Biomed Algorithms
Multidisciplinary Belgium
Principal COVID-19
COVID-19
Generative artificial intelligence
Humans
Neural Networks, Computer
Pandemic
Principal Component Analysis
SARS-CoV-2
SARS-CoV-2
Synthetic structured data

Semantics

Type Source Name
disease IDO process
drug DRUGBANK Pidolic Acid
disease MESH COVID-19

Original Article

(Visited 1 times, 1 visits today)