Tuesday, October 17, 2017
3:30 pm - 4:30 pm
701 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104
Title: A general framework for selection bias due to missing data in EHR-based researchAbstract: Electronic health records (EHR) data provide unique opportunities for public health research in part because they typically contain rich information on large populations. Notwithstanding their benefits, however, EHR-based studies may suffer from a number of sources of bias. Among these selection bias due to incomplete data is an underappreciated source of bias in analyzing EHR data. When framed as a missing data problem, standard methods are often applied to control for selection bias. In EHR-based studies, however, the provenance of the observed data generally involves the interplay of many clinical decisions made by patients, health care providers, and the health system; thus standard methods fail to capture the complexity of the mechanism that give rise to the observed data. In this work we use a novel framework for selection bias in EHR-based research that allows for a hierarchy of missingness mechanisms to inform an inverse-probability weighted estimator that better aligns with the complex nature of EHR data. We show that this estimator is consistent and asymptotically normal. Based off extensive simulations, a key insight is the bias-variance trade-off in using this framework when the data provenance is functionally misspecified. We use this approach to adjust for selection in an on-going, multi-site EHR-based study of bariatric surgery on BMI.