OnSIDES database: Extracting adverse drug events from drug labels using natural language processing models

Tanaka, Yutaro; Chen, Hsin Yi; Belloni, Pietro; Gisladottir, Undina; Kefeli, Jenna; Patterson, Jason; Srinivasan, Apoorva; Zietz, Michael; Sirdeshmukh, Gaurav; Berkowitz, Jacob; Larow Brown, Kathleen; Tatonetti, Nicholas P.

doi:10.1016/j.medj.2025.100642

Background: Adverse drug events (ADEs) are the fourth leading cause of death in the US and cost billions of dollars annually in increased healthcare costs. However, few machine-readable databases of ADEs exist, limiting our capacity to study drug safety on a broader, systematic scale. Recent advances in natural language processing methods, such as BERT models, present an opportunity to accurately extract relevant information from unstructured biomedical text. Methods: We fine-tune a PubMedBERT model to extract ADE terms from text in FDA Structured Product Labels for prescription drugs. Here, we present OnSIDES (on-label side effects resource), a compiled, machine-friendly database of drug-ADE pairs generated with this method. We further utilize this method to extract pediatric-specific ADEs, serious ADEs from labels' "Boxed Warnings" section, and ADEs from drug labels of other major nations-the UK, the European Union, and Japan-to build a complementary OnSIDES-INTL database. To present OnSIDES' potential applications, we leverage the database to predict novel drug targets and indications, analyze enrichment of ADEs across drug classes, and predict novel ADEs from chemical compound structures. Findings: We achieve an F1 score of 0.90, AUROC of 0.92, and AUPR of 0.95 at extracting ADEs from the labels' "Adverse Reactions" section. OnSIDES contains over 3.6 million drug-ADE pairs for 3,233 unique drug ingredient combinations extracted from 47,211 labels. Conclusions: OnSIDES can be used as a comprehensive resource to study and enhance drug safety. Funding: R35GM131905 to N.P.T.; T32GM145440 to H.Y.C.; and T15LM007079 to U.G., M.Z., and K.L.B.