Practice-based Evidence in Medicine: Where Information Retrieval Meets Data Mining Karin M. Verspoor1,2 1 Department of Computing and Information Systems 2 Health and Biomedical Informatics Centre The University of Melbourne Melbourne, Victoria, Australia karin.verspoor@unimelb.edu.au 1. INTRODUCTION biomedical literature have been developed that aim to im- A new approach in medical practice is emerging thanks prove search effectiveness for biomedical researchers [4]. Sev- to the increasing availability of large-scale clinical data in eral incorporate the results of information extraction, such electronic form. In practice-based evidence [5, 6], the clin- as named entity recognition for specific relevant entity types ical record is mined to identify patterns of health charac- (e.g., drugs and diseases), with the objective of enabling teristics, such as diseases that co-occur, side-effects of treat- concept-based indexing of the literature. ments, or more subtle combinations of patient attributes that might explain a particular health outcome. This ap- 3. IR AND PRACTICE-BASED EVIDENCE proach contrasts with what has been the standard of care Data mining of electronic health records for medical evi- in medicine, evidence-based practice, in which treatment de- dence demands processing of the wealth of clinical data now cisions are based on (quantitative) evidence derived from recorded in natural language text. Transformation of this targeted research studies, specifically, randomised controlled unstructured data into a structured representation is needed trials. Advantages of consulting the clinical record for evi- for incorporation of the information it contains into broader dence rather than relying solely on structured research in- data mining. Many transformations can be cast as informa- clude avoiding the selection bias of the inclusion criteria for tion retrieval tasks: for instance, identifying patients sat- a clinical trial and monitoring of longer-term outcomes and isfying particular profiles (e.g., for recruitment into clinical effects [5]. The two approaches are, of course, complemen- trials or registries), or retrieval of case histories correspond- tary — a hypothesis derived from large-scale data mining ing to specific treatment protocols. Development of general could in turn form the starting point for the design of a approaches to such tasks will likely require a mix of informa- clinical trial to rigorously investigate that hypothesis. tion retrieval and domain-specific information extraction. Information retrieval can play an important role in both approaches to collecting medical evidence. However, the use 4. CONCLUSION of information retrieval methods in collecting practice-based The boundaries between information retrieval, informa- evidence requires moving away from traditional document- tion extraction, and data mining are blurring; bringing them oriented retrieval as the end goal in itself, to viewing that together, in an activity commonly referred to as text mining, retrieval as an intermediate step towards knowledge discov- can result in heterogeneous methods that will enable sifting ery and population-scale data mining. Furthermore, it may through the entirety of the clinical record, including both its require the development of more context-specific retrieval unstructured and structured components. This in turn will strategies, designed to identify specific characteristics of in- enable clinical decision making based on data derived from terest and support particular tasks in the medical context. large populations in the “laboratory” of the natural world. 2. IR AND EVIDENCE-BASED PRACTICE 5. REFERENCES [1] Cochrane Collaboration. http://www.cochrane.org. In evidence-based medicine, collection and meta-analysis [2] T. Guy et al. The automation of systematic reviews. BMJ, of the published literature of clinical trials form the foun- 346, 2013. dation of systematic reviews (e.g., Cochrane Reviews [1]). [3] S. Kim et al. Automatic classification of sentences to The production of such reviews has traditionally been done support evidence based medicine. BMC Bioinformatics, using painstaking exhaustive searches of the literature and 12(Suppl 2):S5, 2011. human synthesis of published experimental results. It has [4] Z. Lu. Pubmed and beyond: a survey of web tools for been argued that automation is both necessary and possible searching biomedical literature. Database, baq036, 2011. [2, 7]. There is a clear role for information retrieval in this [5] T. Pincus and T. Sokka. Evidence-based practice and process, to identify publications relevant to a given review, practice-based evidence. Nat Clin Pract Rheum, 2(3):114–115, 2006. although further structuring of the information within the [6] N. H. Shah. Mining the ultimate phenome repository. Nat documents retrieved is also needed [3]. Biotech, 31(12):1095–1097, 2013. A number of targeted search engines for the published [7] I. Shemilt et al. Pinpointing needles in giant haystacks: Use of text mining to reduce impractical screening workload in Copyright is held by the author. extremely large scoping reviews. Research Synthesis MedIR 2014 July 11, 2014, Gold Coast, Australia Methods, 2013. online preprint. 4