Publication date: Jun 02, 2025
Clinical artificial intelligence (AI) systems are susceptible to performance degradation due to data shifts, which can lead to erroneous predictions and potential patient harm. Proactively detecting and mitigating these shifts is crucial for maintaining AI effectiveness and safety in clinical practice. To develop and evaluate a proactive, label-agnostic monitoring pipeline to detect and mitigate harmful data shifts in clinical AI systems and to assess the use of transfer learning and continual learning strategies in maintaining model performance. This prognostic study was conducted using electronic health record data for admissions to general internal medicine wards of 7 large hospitals (5 academic and 2 community) in Toronto, Canada, between January 1, 2010, to August 31, 2020. Inpatients (aged ≥18 years) with a hospital stay of at least 24 hours were included. Data analysis was performed from January to August 2022. Data shifts due to changes in hospital type, critical laboratory assays, patient demographics, admission type, and the COVID-19 pandemic. The primary outcome was predictive performance for all-cause in-hospital mortality within the next 2 weeks, evaluated using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Data shifts were detected using a label-agnostic monitoring pipeline employing a black box shift estimator with maximum mean discrepancy testing. Data were available for 143 049 adult inpatients (mean [SD] age, 67. 8 [19. 6] years; 50. 7% female). Significant data shifts were detected as a result of changes in younger age groups and admissions from nursing homes and acute care centers, transferring from community to academic hospitals, and changes in brain natriuretic peptide and D-dimer. Transfer learning improved model performance of community hospitals in a hospital type-dependent manner (Delta AUROC [SD], 0. 05 [0. 03]; Delta AUPRC [SD], 0. 06 [0. 04]). During the COVID-19 pandemic, drift-triggered continual learning improved overall model performance (Delta AUROC [SD], 0. 44 [0. 02]; P = . 007, Mann-Whitney U test). In this prognostic study, a proactive, label-agnostic monitoring pipeline detected harmful data shifts for a clinical AI system predicting in-hospital mortality. Transfer learning and drift-triggered continual learning strategies mitigated performance degradation, maintaining model performance across health care settings. These findings suggest that the approach used here may ensure the robust and equitable deployment of clinical AI models. Future research should explore the generalizability of this framework across diverse clinical domains, data modalities, and longer deployment periods to further validate its effectiveness.
Open Access PDF
Semantics
Type | Source | Name |
---|---|---|
disease | MESH | COVID-19 pandemic |