Text Mining for Drug Development: Gathering Insights to Support Decision Making Sherri Matis-Mitchell Consultant, DataStar Insights Oxford PA , USA Sherrimatismitchell@gmail.com Abstract— Drug discovery in Pharma R&D is an information finding the right compound, and, identifying potential risk driven process requiring many disparate bits of data from earlier in the process to help make it more efficient and many different sources, both structured and unstructured. Text shorten the timeline to a safe medicine. We need to select the mining is the key methodology used to extract entities and right dose to maximize efficacy and minimize risk and relationships from unstructured text in the quest for the develop meaningful trials in the right patient populations to knowledge needed to bring a safe and effective drug to market ensure success Finally, even after the drug is launched, and beyond. Much of the insight needed in early drug research companies need to monitor for reports of adverse events that to identify drug target to disease relationships and progress a arise following drug treatment but also need to understand potential drug target, comes from published literature and what patients are saying to minimize risk, understand internal reports. Later stage drug development requires many competing therapies, and alleviate any new issues arising. additional sources of information including case reports, Because much of this information is present in written reports, clinical trials, competitive intelligence and other diverse published literature or study reports, text mining can help sources. In this publication, I will present 4 different use cases wade thru the pages and help to uncover facts and on how text mining is used to drive decision making in drug relationships in unstructured text3 discovery and development and also how it can be used to identify patient insights from sources such as social media II. USE CASES FOR TEXT MINING Keywords—Text Mining; Drug Discovery; Pharma R&D; Social media; drug safety; patient journey; A. Text Mining in Early Drug Discovery I. INTRO TO DRUG DISCOVERY Many diseases like diabetes or cancer arise from a complex series of events involving multiple genes and pathways but Drug discovery began with the use of medical plants others, including many rare diseases are associated with a to treat illness. Later drug discovery began by extracting the single gene or even a single mutation in that gene. With the pharmacologically active compound from nature products. help of semantically enriched taxonomies of genes, diseases Accompanying the genomic revolution, modern drug and drugs and biologics, text mining can identify both old and discovery methods shifted to identification of disease new relationships between genes and diseases, genes and associated genes as potential drug targets, followed by drugs and drugs and disease4. New drug to disease discovery of a compound or biologic that would interact relationships can represent potential repositioning favorably with the target to treat disease. Because of the opportunities. Most drug projects are designed to cure diseases variability of the human population, a drug can vary in that affect large populations but the rare disease patient efficacy and safety so extensive preclinical and clinical testing community is empowered and are pushing for answers, and is require by the regulatory agencies before a drug is launched initiatives like the Orphan drug act offer incentives to address on to market. The modern drug discovery process takes place these rare diseases. Understanding these less complicated but over an average of 10.7 years and at a cost of about 2.6 billion rare diseases that often arise from a single gene mutation can dollars1,2. To ensure drug safety, constant post launch, shed light on more complex diseases and text mining has great monitoring of a drug or biologic is required. Finally, drugs can utility in this use case5. and do fail at any stage along the process costing millions or B. Mining for Preclinical and Clinical Drug Safety billion or in lost revenue and in some cases, causing harm or even death. Drug discovery requires a lot of information to The ultimate goal of preclinical testing is to accurately model succeed and asking the right question can go very far in the drug’s safety in animals to predict what will happen in ensuring a quicker and surer outcome. By providing quicker humans. The risk for adverse events can vary across different understanding of disease and find better disease targets, and therapeutic areas and some drug classes inherently have liabilities for certain adverse events6. The tolerance of side patient receives a life altering disease diagnosis and effect can also vary across therapeutic areas. Because only a subsequent treatment, many will turn to social media for fraction of drugs succeeds, there is a large amount of data support and additional information. As an industry, pharma from failed compound in unstructured internal reports and has been using social media to communicate with patients via study documents as well as published literature. channels like Twitter, but this has largely been driven by the In a small number of cases, unsafe medicines have been commercial to inform the public. While pharmaceutical progressed into human trials and beyond due to lack of a companies are using social media to provide product preclinical safety “signal” in and much is being done to information and promotional materials, they should also be prevent this. In the 1990’s, a number of drugs were found to using it to better understand patients' needs and experiences, cause a life threatening cardiac arrhythmia caused by QT and to provide additional education, particularly to those with prolongation and were withdrawn from the market.7. This has chronic illnesses. One emerging trend is to use text mining to led to testing of all drugs for this liability. One example of analyze sentiments, identify adverse events and glean insights how text mining can benefit is in the building of a reference from social media. Social media conversations also can inform compound set for evaluation of QT prolongation. In 2015, The R&D and pharmacovigilance efforts.10 Social media is here to HESI Pro-Arrhythmia Working Group published on using text stay and pharma should be responsibly engaging in it to get in mining to identify both human and non-rodent animal studies touch with patients. When companies engage, and have the that assessed QT signal concordance between species and right tools and analytics capabilities technologies like text identified drugs that prolonged the QT interval.8 In this work, mining they can gain valuable insight into what patients are text mining was essential to identifying compound to saying and use those insights to make better treatments that biological effect to species relationships in the published improve the quality of lives. literature for expert review. REFERENCES C. The Role of Text Mining in Drug Submission The submission of a new drug to the FDA or requires [1] J. A DiMasi,, H. G Grabowski, Tufts CSDD briefing on R&D cost proof that the medicine is safe and effective as demonstrated study, 2014 http://csdd.tufts.edu/news/complete_story/pr_tufts_csdd_2014_cost_stu by non-clinical testing and clinical trials. The submission dy package can contain thousands of pages of written material. [2] Z. Bian, S. Chen, C. Cheng, J. Wang, H. Xiao, H. Qin, Developing new Text mining can support this process in a number of ways and drugs from annals of Chinese medicine, Acta Pharmaceutica Sinica B can positively impact the process by saving the time of project 2012;2(1):1–7 teams and potential reviewers. [3] R. McEntire, D. Szalkowski, J. Butler, MS Kuo, M Chang, D Freeman, S McQuay, J Patel, M McGlashen, WD Cornell, JJ Xu, Application of A real world example of how text mining can impact the an automated natural language processing (NLP) workflow to enable submission process will be discussed and while the example federated search of external biomedical content in drug discovery and has been stripped of proprietary details, it should still development. Drug Discovery Today 2016, 21 (5) :826–835 demonstrate a tangible value. In this case, the team was filing [4] D. Rebholz-Schuhmann, R. Oellrich A, Hoehndorf Text-mining for a waiver for additional safety studies and based on their solutions for biomedical research: enabling integrative biology. Nat Rev Genet. 2012 Dec;13(12):829-39 knowledge of the drug’s pharmacology and that of other drugs [5] D Sardana , C Zhu, M Zhang, RC Gudivada, L Yang, AG Jegga. Drug in the same class, the team felt this was warranted but still repositioning for orphan diseases. Brief needed to convince the regulatory agency. A keyword based Bioinform (2011) 12 (4):346-356. literature search found 3000 full text documents that needed [6] Cronin MT1, Jaworska JS, Walker JD, Comber MH, Watts CD, Worth further review to identify the smaller set of documents AP.” Use of QSARs in international decision-making frameworks to relevant to the specific question. The completed review and predict health effects of chemical substances. Environ Health Perspect. 2003 Aug; 111(10): 1391–1401. summary report was due in 7 months and an outside vendor [7] CE Pollard, N Abi Gerges, MH Bridgland-Taylor, A Easter, TG quoted a figure of 9 months and 180,000$ to complete the Hammond, and J-P Valentin “An introduction to QT interval review. Text mining of the full text documents and subsequent prolongation and non-clinical approaches to assessing and reducing review was then completed in 2 months saving 5 months’ time risk”. Br J Pharmacol. 2010 Jan; 159(1): 12–21. and 180,000$. While the monetary impact of text mining and [8] HM Vargas, AS Bass, J Koerner, S Matis-Mitchell, MK Pugsley, M Skinner, M Burnham, M Bridgland-Taylor, S Pettit, JP Valentin other informatics processes R&D processes can be hard to “Evaluation of drug-induced QT interval prolongation in animal and quantitate, this example demonstrates a clear value of text human studies: a literature review of concordance”. Br J Pharmacol. mining. 2015 Aug;172(16):4002-11. [9] M. Larkin, “Social media for Pharma- an Experts View” 2014 https://www.elsevier.com/connect/social-media-for-pharma-an-experts- D. Post-launch, Mining Social Media for Patient Insights. view [10] A Sarker, R Ginn, A Nikfarjam, K O'Connor, K Smith, S Jayaraman, T Upadhaya, G Gonzalez. “Utilizing social media data for Social media is a largely untapped source of information pharmacovigilance: A review.” J Biomed Inform. 2015 Apr;54:202-12. and insights for pharma, on therapy efficacy and safety, patient journey, unmet need, and customer reputation. When a