Skip Navigation

Colloquium Details

Surfacing an Iceberg: Meaningful Use of Clinical Notes Advances Drug Safety

Author:Paea LePendu Stanford University
Date:June 01, 2012
Location:220 Deschutes
Host:Dejing Dou


Although relatively rare once a drug is marketed, adverse drug reactions constitute a major cause of morbidity and mortality worldwide. The current state-of-the-art in post-marketing drug surveillance utilizes large collections of voluntarily submitted reports to detect adverse drug reactions. However, given the limitations of reporting systems, researchers increasingly look toward electronic health records (EHRs) for next-generation signal detection. Yet, the promise of "big data" from EHRs is only the tip of an enormous iceberg: the vast majority of a patient?s history lies entombed in clinical-textual descriptions. Achievements in natural language processing (NLP) are rapidly closing this gap, and we present a simple approach that transforms rapidly the unstructured patient notes taken by doctors, nurses and other clinicians into a de-identified, temporally ordered, patient-feature matrix that is sorted according to standardized medical terminologies. We demonstrate how to use this high-throughput data to monitor actively for adverse drug reactions in the EHR. Overall, simple methods applied to the text show strong predictive value (72% PPV, 17% FDR). In one particular case that was actually flagged as a false negative, our follow-up analysis reveals that proton-pump inhibitors (PPIs) appear strongly associated with heart attacks, which could be explained by new results (see companion study by Ghebremariam et al) on the biological mechanisms of PPIs. We conclude that next-generation, post-marketing drug surveillance using EHRs will benefit from simple methods that surface the enormous quantity and value of unstructured clinical text.


Paea LePendu graduated from the UO CIS program in 2010 and is currently a research scientist at Stanford. His team at Stanford is pioneering efforts in extracting and analyzing knowledge from clinical text --- the most underutilized yet most valuable part of electronic healthcare records. The Stanford team won the International Semantic Web Challenge in 2010 based on their work on large-scale, integrated knowledge systems. Paea is a core inventor on numerous patents and disclosures that are currently being licensed by biotech companies interested in advancing patient healthcare through massive data analytics.