Publication date: Feb 02, 2025
Binary outcomes in electronic health records (EHR) derived using automated phenotype algorithms may suffer from phenotyping error, resulting in bias in association estimation. Huang et al. [1] proposed the Prior Knowledge-Guided Integrated Likelihood Estimation (PIE) method to mitigate the estimation bias, however, their investigation focused on point estimation without statistical inference, and the evaluation of PIE therein using simulation was a proof-of-concept with only a limited scope of scenarios. This study aims to comprehensively assess PIE’s performance including (1) how well PIE performs under a wide spectrum of operating characteristics of phenotyping algorithms under real-world scenarios (e. g. , low prevalence, low sensitivity, high specificity); (2) beyond point estimation, how much variation of the PIE estimator was introduced by the prior distribution; and (3) from a hypothesis testing point of view, if PIE improves type I error and statistical power relative to the nacEFve method (i. e., ignoring the phenotyping error). Synthetic data and real-world EHR data from the Children’s Hospital of Philadelphia were utilized to evaluate PIE. The synthetic data were generated under diverse outcome prevalence, phenotyping algorithm sensitivity, and association effect sizes. Simulation studies compared PIE under different prior distributions with the nacEFve method, assessing bias, variance, type I error, and power. Real-world analysis compared the performance of PIE and the nacEFve method in estimating the association of multiple predictors with COVID-19 infection. PIE exhibited reduced bias compared to the nacEFve method across varied simulation settings, with comparable type I error and power. As the effect size became larger, the bias reduced by PIE was larger. PIE has superior performance when prior distributions aligned closely with true phenotyping algorithm characteristics. Impact of prior quality was minor for low-prevalence outcomes but large for common outcomes. In real-world analysis, PIE maintains a relatively accurate estimation across different scenarios, particularly outperforming the nacEFve approach under large effect sizes. PIE effectively mitigates estimation bias in a wide spectrum of real-world settings, particularly with accurate prior information. Its main benefit lies in bias reduction rather than hypothesis testing. The impact of the prior is small for low-prevalence outcomes.
Concepts | Keywords |
---|---|
Algorithms | Association study |
Hospital | Bias reduction |
Lies | Electronic health record |
Philadelphia | Phenotyping error |
Pie |
Semantics
Type | Source | Name |
---|---|---|
disease | IDO | algorithm |
disease | MESH | COVID-19 |
disease | MESH | infection |
disease | IDO | quality |