Download PDFOpen PDF in browserUsing the Literature to Identify ConfoundersEasyChair Preprint 1555 pages•Date: May 23, 2018AbstractWe introduce an approach to causal modeling that uses Literature-Based Discovery (LBD) to identify salient domain knowledge in observational data. Causal models represent a marriage between graph theory, probability, and domain knowledge. We hypothesize that the LBD paradigm can be applied to identify variables of interest for the automated construction of causal models of observational data, and that causal models thus informed will improve upon the performance of purely statistical techniques. We evaluated our hypothesis with a pharmacovigilance (PV) use case. In PV, the task is to discriminate between drug/side-effect signals and noise. We analyzed observational clinical data derived from electronic health records (EHR) and constructed causal models. We used logistic regression coefficients as our baseline and calculated estimated controlled direct effect from the LBD-informed causal models. Causal models improved upon unadjusted statistical models by 8.64% using Area under the Curve of the Receiver Operating Characteristic. Improving upon previous work in PV using EHR as the primary data source, our results establish the utility of the approach. Keyphrases: Adverse Drug Reaction, Electronic Health Record, Predication-based Semantic Indexing, causal model, causality, feature selection, literature-based discovery, observational clinical data
|