Leveraging Post-marketing Drug Safety Research through Semantic Technologies: The PharmacoVigilance Signal Detectors Ontology Vassilis Koutkias1 and Marie-Christine Jaulent1 1 INSERM, U1142, LIMICS, F-75006, Paris, France; Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1142, LIMICS, F-75006, Paris, France; Université Paris 13, Sorbonne Paris Cité, LIMICS, (UMR_S 1142), F-93430, Villetaneuse, France {vasileios.koutkias, marie-christine.jaulent}@inserm.fr Abstract. Accurate and timely identification of post-marketing drug safety risks (the so-called “signals” in pharmacovigilance) is an important public health issue. While various computational methods have been proposed to analyze the diverse data sources employed for signal detection, still the challenge of effective drug monitoring and surveillance remains. On the other hand, there is an emerging belief that the synthesis of all possible information sources is necessary to achieve further advancements. Aiming to facilitate integrated signal detection by concurrently exploring various data sources via respective analysis methods in a systematic way, we propose the PharmacoVigilance Signal Detectors Ontology (PV-SDO). PV-SDO constitutes the backbone of a semantically-enriched platform for this integration and aims to: (a) semantically harmonize heterogeneous data sources and analysis methods in the field, (b) facilitate their joint exploitation through mappings between reference terminologies that the data sources rely on, and (c) provide an exploitable knowledge base of signal analysis methods, experiments and their outcomes including provenance information. PV-SDO has been populated with a significant number of individuals using data from open-source signal detection method implementations, and assessed via data-driven and logic- based techniques, while an evaluation with experts is currently being conducted with well-promising results. Keywords: Pharmacovigilance, Heterogeneous Data Sources, Computational Signal Detection, Semantic Integration, Ontologies. 1 Introduction Identification of post-marketing drug safety risks is an important public health issue requiring accurate and timely evidence [1]. Pharmacovigilance is the science that studies all the activities related with the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem [2]. The detection of the so-called “signals”, which are typically defined as “information on a possible causal relationship between an adverse event and a drug, the relationship being unknown or incompletely documented previously [1, 3]”, relies on the analysis of data originated from different sources [4]. These sources vary, spanning from spontaneous reporting systems (SRS) and the literature, to social media platforms. Nevertheless, while various computational methods have been proposed to analyze the diverse data sources employed for signal detection, optimal drug monitoring and surveillance remains an open issue [5]. On the other hand, there is an emerging belief that the synthesis of all possible information sources is necessary to achieve further advancements [6, 7]. Aiming to implement signal detection by concurrently exploring various data sources and the respective computational analysis methods in a systematic way, we present in this paper the PharmacoVigilance Signal Detectors Ontology (PV-SDO). First, we provide background information as regards post-marketing signal detection. Next, we present PV-SDO in terms of its scope, the employed material for its construction, its overall schema, main entities and attributes, referring also to development and evaluation aspects. We finally discuss the contribution that PV-SDO aspires to provide towards more automated, systematic tools for signal detection. 2 Post-marketing Signal Detection 2.1 Rationale Systematic drug surveillance requires the ability to test all drugs for any outcome. However, the drugs for which surveillance is most crucial are those newly introduced in the market, having a limited safety profile from clinical studies. When a novel, credible signal is detected, it triggers an evaluation procedure performed by regulatory authorities that involves a detailed review of the indicated association. This may determine that the causal relationship is sufficiently supported to warrant some action (e.g. labelling amendment), that the relationship is non-causal, or that it is unclear (thus, further monitoring and/or analysis studies are required). As the volume of raw data for signal detection is enormous and various data sources are being considered for analysis, it is imperative to support signal detection via appropriate analysis methods. Next, we provide background information on the typical data sources that are employed for signal detection, and on the respective methods that are available for their analysis. 2.2 Data Sources: Merits and Limitations The dominant source of post-marketing drug safety signals is Spontaneous Reporting Systems (SRS) through which individual case safety reports are submitted by healthcare professionals and patients to pharmaceutical companies, regulatory authorities and other bodies. Well-known SRSs are VigiBaseTM from the Uppsala Monitoring Centre of WHO that gathers reports from more than 100 countries [8], and FAERS (the Food and Drug Administration’s Adverse Event Reporting System) in the US [9]. Despite their explicit focus on drug safety, underreporting, inconsistent reporting, bias and latency are important limitations of SRSs [10]. The biomedical literature has been often employed for signal detection, but also as a mean to support the evaluation of potential signals obtained by other sources [11]. Although significant, this source is limited by the study context (scope, papers selection protocol, etc.), reducing the generalization perspective of the findings. Observational healthcare data obtained from Electronic Health Record systems and administrative claim databases, have been employed quite recently (as secondary use / repurposing) for signal detection [12]. Such data contain more information than spontaneous reports, e.g. drug exposure/non-exposure time and laboratory examination results, and they bring the potential to enable active and real-time surveillance. However, among the drawbacks for their adoption is the replication of findings due to parameters like the sample size and the considered population. Patient self-reports about drug concerns/problems that are shared among networked communities using social media (blogs, messaging platforms, etc.) are an emerging source of signals [13], which can provide important insights in some cases, e.g. for drugs used for the treatment of rare diseases. Inherent limitations of patient self-reports for signal detection are the subjective information provided by the patients and the lack of quality control in the reporting. 2.2 Signal Detection Methods: Towards Combinatorial Analysis For each one of the above data sources, various analysis methods have been proposed. For example, a typical strategy to explore SRS-based data is to pursuit statistical investigations for measures of disproportionality, i.e. assess whether a specific event is reported more frequently with a specific drug than would be expected if randomly occurring [14]. Examples of computational methods proposed for the analysis of observational data are case-control series, cohort methods and self-controlled studies, to name a few [15]. For the analysis of biomedical literature and patient self-reports in social media, natural language processing (NLP) techniques have been employed, driven by standard terminologies and controlled vocabularies [16]. Given the inherent limitations of the data sources and the technical limitations of signal detection methods, comparative studies illustrated that there is no optimal method for signal detection and that all methods provide a high rate of false-positive indications [17, 18]. On the other hand, it is reasonable to argue that combining information across data sources can lead to more effective and accurate signal detection. This can be achieved by either increasing the evidence and the statistical power of findings, or by facilitating discoveries that could not be possible with the analysis of a single source [19]. However, the heterogeneity and fragmentation characterizing the available data sources and signal detection methods avert the realization of such a synthesis at large- scale. To reach this level, a semantic harmonization layer is required. Thus, we propose PV-SDO as the mean for semantic description of signal sources, detection methods and relevant analysis experiments. 3 The PV-SDO Ontology 3.1 Scope and Source Knowledge The primary aim of PV-SDO is to support the development of an integrated platform for signal detection. The platform aspires to enable the exploration of data originated from different sources with diverse computational signal detection methods in a combinatory fashion [20]. In this regard, the sub-objectives of PV-SDO are manifold, as it shall: (a) classify and annotate through quality attributes (e.g. the coverage, the system organ classes included, the population period, etc.) the data sources that can be used in the detection of potential signals, either for discovery or filtering; (b) classify currently available signal detection methods, along with the analysis parameters that they offer; (c) embody performance metrics and ranking criteria/measures that can be employed for combinatorial signal detection; (d) support the conduction of analysis experiments involving multiple sources and detection methods, through mappings among the input data and the output of signal detection methods, e.g. drugs in RxNorm1 vs. ATC2 (Anatomical Therapeutic Chemical classification system), health outcomes of interest in ICD3 (International Classification of Diseases) vs. MedDRA4 (Medical Dictionary for Regulatory Activities); (e) support the annotation of signal detection analysis experiments with provenance information (i.e. from which dataset, method(s) and parameterization(s) were the outcomes generated) and, overall, (f) provide an exploitable knowledge base of available data sources, signal detection methods, analysis experiments and outcomes. The primary knowledge sources that were employed for defining PV-SDO were: (a) the scientific literature with major emphasis on papers describing signal detection methods and the results obtained from their use in various analysis experiments, and (b) documentation of signal detection method implementations, such as those available in the PhViD R package [21], and the OMOP5 (Observational Medical Outcomes Partnership) methods library [22]. It is important to note that classification schemas related with the field of signal detection in pharmacovigilance are quite missing. In the scope of the literature review that we conducted, we were only able to identify the so-called “taxonomy for monitoring methods within a medical product safety surveillance system” that has been proposed in the Mini-Sentinel project [23]. This taxonomy, however, considers only methods appropriate for the analysis of observational data. 1 http://www.nlm.nih.gov/research/ umls/rxnorm/ 2 http://www.whocc.no/atc_ddd_index/ 3 http://www.who.int/classifications/icd/en/ 4 http://www.meddra.org/ 5 http://omop.org/ Fig. 1. An overview of PV-SDO (only major entities and object properties are depicted). 3.2 Overall Schema, Key Entities and Properties In order to fulfill its aims, PV-SDO includes concepts that are related with: (a) the domain of signal detection (e.g. Drug, Health Outcome, Pharmacovigilance Signal, etc.), and (b) the construction of the integrated platform for signal detection (e.g. Signal Source, Signal Detector, Analysis Experiment, Analysis Experiment Target, etc.). Figure 1 illustrates an overview of PV-SDO. The schema depicts the 1st-level entities of the hierarchy, some example 2nd- and 3rd-level entities, the main linkage among entities via object properties, as well as example individuals that have been defined. In more detail, Table 1 provides (in alphabetical order) definitions of the key PV-SDO concepts in natural language, including examples. Likewise, Table 2 provides a list of key object properties. Considering a platform in which Datasets originated from diverse types of Signal Sources and respective Signal Detection Methods will be registered as accessible resources for signal detection, PV-SDO may support a matchmaking mechanism (through ontology querying) between the user’s requirements for a particular analysis and the resources that are appropriate from those available. These requirements are either implicit or explicit. For example, selecting the Parameter “Time at Risk” for an Analysis Experiment, would implicitly result in invoking only Signal Detectors that support this Parameter, or selecting disproportionality-based methods would launch all the available Signal Detectors taking advantage of the class-subclass hierarchy. Of course, the user can explicitly define resources to be used for their analysis experiments. PV-SDO can express provenance information as regards a potential signal like in the following example: Pharmacovigilance Signal6:“Drug:Drg_134 – Health Outcome:Acute Myocardial Infarction” is result of Analysis Experiment:AB4567, generated by Signal Detector:BCPNN8_45 and Signal Detector:MGPS9_103, validated by Reference Source:Epocrates10 and accompanied by Supporting Evidence Source:ChemIDPlus11. 3.3 Design and Development Aspects Taking into account the multi-facet scope of PV-SDO, we carefully analyzed the requirements that it has to fulfill with respect to the application logic that the integrated platform shall support [20]. For the design of PV-SDO we followed an incremental strategy and the post-coordination principle [24]. We also applied common practices in ontology modeling that span from formatting and naming conventions, to version control. In addition, we investigated the potential for ontology reuse that was nevertheless restricted to the adoption of some property definitions and 6 The data cited in this example are fictive. 7 Codes like these are expected to be attributed by the proposed signal detection platform. 8 Denotes the Bayesian Confidence Neural Network, a well-known signal detection method. 9 Denotes the Multi-Item Gamma Poisson Shrinkage, a well-known signal detection method. 10 http://www.epocrates.com/ 11 http://chem.sis.nlm.nih.gov/chemidplus/ modeling approaches, e.g. the approach of representing provenance information with the PROV ontology (PROV-O) [25]. A first version of the ontology has been outlined based on which brainstorming sessions took place with ontology engineers from our organization. As a result, where necessary, ambiguities in concept definitions have been addressed in order to clarify the conceptualization of the model. In this regard, PV-SDO evolved and instantiated with sample data. In the current stage, PV-SDO comprises of 1,312 axioms, 101 classes, 34 object properties, 32 data properties and 168 individuals. PV-SDO has been encoded in OWL2 (Web Ontology Language 2) using Protégé [26], in the entire lifecycle of ontology modeling, population and assessment. Table 1. List of key entities defined in PV-SDO (entities marked in bold). Entity Name Description Analysis A constraint/requirement that has been defined either implicitly (through Criterion computational inference) or explicitly (by the user of the platform) for an Analysis Experiment. It can be a Dataset, a Parameter, and an Analysis Experiment Target. Example: “Decision Rule for Signal Generation in SRS: > 3 reports”. Analysis A computational process involving the analysis of Datasets through the Experiment execution of at least one Signal Detector that is/are being launched according to a set of Analysis Criteria. Analysis Experiments are set by users of the platform. Analysis A specific Drug (or Drug Class), Health Outcome, “Drug – Health Experiment Outcome” pair or “Drug - Drug - Health Outcome” triplet that is being Target investigated in the scope of an Analysis Experiment for Pharmacovigilance Signal detection. Example: “Acute Myocardial Infarction - DrugX”. Dataset A collection of data originated in the general case from a combination of Signal Sources. Example: “FAERS_ASCII_2013q3.zip and PubMed_Extract_2012-14.txt”. Drug A pharmaceutical product (or class of products). It may be referred using standard terminologies/classifications, like ATC and RxNorm. Example: “Heparin group / ATC: B01AB01”. Health An observation in the health condition of a subject. It may be expressed Outcome using standard terminologies such as UMLS (Unified Medical Language System), MedDRA (preferred terms), and LOINC (Logical Observation Identifiers Names and Codes) value ranges. Example: “Gastrointestinal bleeding / MedDRA: 10017936, 10005116”. Parameter A configurable option that is available for the instantiation and execution of a Signal Detection Method. Examples: “Time at Risk”, “Decision rule for signal generation”. Performance Measure for assessing the effectiveness of Signal Detectors. Metric Examples: “Area Under the Curve”, “F-score”. Pharmacovi- A “Drug - Health Outcome” pair or a “Drug - Drug - Health Outcome” gilance Signal triplet indicating a possible causal relation among drug(s) and outcome, generated within an Analysis Experiment by one or more Signal Detectors. Ranking A feature that is used to sort (and implicitly filter) potential Criterion Pharmacovigilance Signals generated in the scope of an Analysis Experiment. It is specialized into two types, Domain-specific Ranking Criterion and Computation Ranking Criterion. Entity Name Description Examples: Computational Ranking Criterion: “Precision at K”; Domain- specific Ranking Criterion: “Adverse Drug Reaction seriousness”. Reference An information repository that is used for validating the novelty of a Source potential Pharmacovigilance Signal. Signal An implementation of a computational method aiming to identify potential Detection Pharmacovigilance Signals via its application to data originated from an Method appropriate type of Signal Source. A Signal Detection Method offers a set of Parameters for its fine-tuning that can be set. Example: “The BCPNN (Bayesian Confidence Propagation Neural Network) implementation contained in [21]”. The concrete runtime instantiation of a Signal Detection Method, according Signal to specific Parameter values and input Dataset(s). Detector Example: “The MGPS (Multi-Item Gamma Poisson Shrinkage) implementation contained in [21] with its default parameter values applied on the French, national SRS database”. Signal Source A data repository that can be explored for Pharmacovigilance Signal detection. It has specific characteristics with respect to the type of data that it offers (structured/unstructured), the quality of its data, the type of Signal Detection Methods that can be used for its analysis, etc. Examples: “Spontaneous Reporting System”, “Electronic Health Record system”, “Bibliographic database”, “Social media platform”. Supporting Information source that can either facilitate the understanding, or Evidence complement the indications of Pharmacovigilance Signals generated by an Source Analysis Experiment. Examples: “Drugbank12”, “ChemIDPlus”. 4 PV-SDO Evaluation Evaluation is an integral part of ontology design and development. While various approaches and methods have been proposed in the literature, there is no single best or preferred evaluation methodology/approach in general. The selection relies primarily on the purpose/focus of the evaluation per se and the application scope of the ontology. Brank et al. proposed four types of ontology evaluation approaches, namely, “gold standard based”, “application-based”, “data-driven” and “assessment by humans” [27]. For the evaluation of PV-SDO the “gold standard based” approach cannot be employed, since no standard exists that covers its scope and purpose. The “application-based” approach could be used at a later stage, once the envisaged integrated platform for signal detection is fully operational. Thus, in the current stage we employed: (a) the “data-driven” approach and (b) “assessment by humans”. Details about these methods are provided in the following. In addition, we employed the reasoning mechanisms embodied in Protégé to assess PV-SDO for its logical consistency and the correctness of its taxonomy, besides for obtaining inferred types. 12 http://www.drugbank.ca/ Table 2. List of indicative properties defined in PV-SDO (properties marked in bold). Object Properties (label, domain and range) A Signal Detection Method can analyze multiple Signal Sources A Dataset originates from Signal Source(s) A Pharmacovigilance Signal concerns a “Drug - Health Outcome” pair or a “Drug - Drug - Health Outcome” triplet A Signal Detection Method is implemented for a Computing Environment A Pharmacovigilance Signal is generated by Signal Detector(s) A Signal Detector has parameter value Parameter A Signal Detector instantiates a Signal Detection Method A Signal Detector is used in Analysis Experiment(s) An Analysis Experiment may have Analysis Experiment Target An Analysis Experiment uses for ranking Ranking Criterion A Pharmacovigilance Signal is validated by Reference Source(s) Datatype Properties (label, domain and range) (Clinical Narrative or Literature Source or Observational Data Source or SRS Data Source) encodes conditions in {"ICD9", …} Observational Data Source has population size integer Drug has ATC code / has OMOP code / has RxNORM code / has SNOMED-CT13 code / has UMLS code string (Dataset or Signal Detection Method or Signal Source) has license {"Apache License Version 2.0", "GPL-2", …} (Dataset) has format {"JSON (JavaScript Object Notation)", "OMOP CDM (Common Data Model) v.4.0", "OMOP CDM v3.0", …} Analysis Experiment has result XMLLiteral Analysis Experiment was launched/terminated on dateTimeStamp 4.1 Data-driven Assessment In the “data-driven” assessment we investigated whether PV-SDO is sufficient to describe signal detection methods, which were not part of the source knowledge that we employed in our design. In particular, we explored the potential of populating PV- SDO with methods that are not included in the PhViD R package [21], or in the methods library and the analysis results obtained in the 2011-2012 Experiments of the OMOP project [22]. For example, we elaborated on instantiating PV-SDO for a recently published signal detection method, namely, vigiRank [28]. In the vigiRank algorithm, a signal detection measure originated from disproportionality analysis is being used as input parameter; a case that had not been taken into account in PV-SDO beforehand. Thus, appropriate revisions have been applied in PV-SDO. Overall, while PV-SDO is still evolving, the data-driven assessment demonstrated that there is a solid basis for the description of various types of signal detection methods. 13 Systematized Nomenclature of Medicine--Clinical Terms: http://ihtsdo.org/snomed-ct/ 4.2 Human Assessment This involves the conduction of an international, anonymous survey by inviting experts to provide their feedback on PV-SDO through an online questionnaire. The invitees have been organized in two groups: (a) Experts in the field of knowledge engineering and ontologies, expected to provide feedback on the survey setup and its comprehension, besides the pure ontology modeling aspects. (b) Experts in the field of signal detection with know-how in the design and development of computational signal detection methods. These experts have been identified from their participation in relevant scientific publications. Ten knowledge engineers and 15 signal detection experts with different levels of awareness concerning the scope of our work have been invited. For the moment, the survey with the knowledge engineers has been concluded (response rate: 80%), while the survey with signal detection experts is still in progress. The questionnaire was structured in sections containing: - Part I with questions concerning the main concepts definition in PV-SDO; - Part II with questions on object property definitions; - Part III with questions on data property definitions; - Part IV referring to the major parts of the class hierarchy that was defined in PV- SDO, and - Part V referring to an overall assessment of PV-SDO with respect to its usefulness and fulfillment of its aims. The most important remarks of the survey with the knowledge engineers group are summarized below: - Part I: The overall agreement with the definitions of concepts was 72%. In the rest 28% of the cases, the experts hesitated to express an opinion for some definitions (selected the “I don’t know” answer), as they did not know the exact details underlying signal detection. Regarding the definition of Pharmacovigilance Signal one expert commented that a signal could be rather a mathematical variable (or set of variables) the value of which (under the circumstances considered) suggest(s) a possible causal relation between drug(s) and health outcome. - Part II: The overall agreement with the provided definitions of object properties was 78%. Again, in the rest of the cases, the experts hesitated to express an opinion. For example, concerning the relation “A Signal Detection Method can analyze multiple Signal Sources”, there was a comment that this depends on what the term “multiple Signal Sources” exactly means. Also, some experts faced difficulty in discriminating the concepts Dataset and Signal Source. - Part III: The overall agreement with the provided definitions of data properties was 96%, and no significant objections were expressed. - Part IV: The overall agreement concerning the defined hierarchy was 67%. In the remaining 33% the experts expressed their hesitation due to their lack of knowledge in the domain of signal detection. Among the important remarks, one expert proposed to distinguish between private SRS signal sources (e.g. the SRSs maintained by pharmaceutical companies) and public ones, which is indeed applicable. - Part V: Experts ranked with 4.5/5 the statement “PV-SDO provides the basis to semantically describe signal detection methods and analysis experiments”, while they were quite neutral in their answers regarding the comprehension of PV-SDO. They were clearly positive (4/5) in the statement “The concepts/properties of PV-SDO cover the required scope”. 4.3 Revisions Based on the evaluation outcomes up to now, the main corrective actions that have been applied in PV-SDO involved: 1) Addressing ambiguous domain and range definitions, as indicated by the knowledge engineers assessment group. 2) Revising misleading names of class labels and property names. 3) Assessing all the annotations that were included in PV-SDO for clarity and comprehension. For the assessment phase with the participation of signal detection experts, we adopted a shorter, more targeted version of the questionnaire. We particularly focused on receiving feedback as regards the organization of the Parameter class and the Signal Detection Methods classification, which appeared to be the most specialized aspects for the knowledge engineers to assess. This group corresponds to the ultimate user target for our evaluation, since they represent providers of the signal detection methods that our platform aims to integrate, as well as potential end-users of the platform. 5 Discussion and Conclusions Accurate and timely signal detection poses important research challenges. As new data sources are being considered as useful in providing evidence for pharmacovigilance, such as observational databases and even social media platforms, the field is lately very active [4]. Despite the well-promising results, it has become evident that each data source exhibits specific potential and limitations and, similarly, each computational detection method has strengths and weaknesses [5, 17, 18]. Thus, combined signal detection strategies are emerging [6, 7]. In order to leverage such strategies at large-scale, the harmonization among the underlying information models corresponding to the respective data sources and signal detection methods shall be performed. Information Technologies (IT) play an important role towards this aim by developing and applying appropriate data and knowledge engineering methods. Aligned with combined signal detection strategies, we propose PV-SDO as the backbone for the development of such a semantically- enriched, integrated IT platform. PV-SDO is the first attempt for constructing a formal model to describe data sources and signal detection methods in terms of their types, underlying computational model, input, output, analysis parameters, requirements of use, strengths and weaknesses. To the best of our knowledge, the only systematic effort to categorize signal detection methods is the taxonomy of observational analysis methods elaborated in the scope of Mini-Sentinel [23]. Interestingly, the Mini-Sentinel taxonomy has been employed to design a tool (available in a spreadsheet form) for guiding researchers in the appropriate selection of signal detection methods. Notably, it remains questionable at which level this selection process can be generic, since most of the findings supporting this approach originate from empirical studies. Nevertheless, we aim to explore this perspective and embody such a feature for the users of our platform, in order to support them in the selection of the appropriate signal detection method(s), given their analysis scope, e.g. drug of interest, health outcome of interest, available data for analysis etc. This feature will be implemented via semantic rules that will be introduced in PV-SDO. PV-SDO will be enriched during our platform implementation phase, taking into account the forthcoming evaluation outcomes. It has been uploaded in BioPortal14 [29], currently in private access. Our plan is to make it publicly available, right after the conclusion of its evaluation and the revisions, in order to pursuit synergies with other teams developing ontologies in the domain of pharmacovigilance, but also to enable its maintenance and evolution as a reference resource for describing signal detection. The availability of drug-related datasets through semantic Web standard formats (e.g. via Bio2RDF [30] and the EBI RDF platform [31]) and open drug safety data (like FAERS data published through the openFDA initiative [32]), illustrates that there is a strong potential for application of semantic technologies in the field of drug safety. Given the fact that signal detection is a very active research field, PV-SDO constitutes a novel contribution in the domain of computational signal detection, aspiring to reinforce existing efforts in a systematic way. Acknowledgments. This research was supported by a Marie Curie Intra European Fellowship within the 7th European Community Framework Programme FP7/2007- 2013 under REA grant agreement n° 330422 – the SAFER project. We would like to thank the knowledge engineers and signal detection experts who participated in the evaluation of PV-SDO. References 1. Institute of Medicine: Preventing Medication Errors. The National Academic Press, Washington DC (2007) 2. World Health Organization: A practical handbook on the pharmacovigilance of antimalarial medicines. Geneva, Switzerland (2008) 3. Linquist, M.: The Need for Definitions in Pharmacovigilance. Drug Saf. 30(10) (2007) 825– 830 4. Harpaz, R., DuMouchel, W., Shah, N.H., Madigan, D., Ryan, P., Friedman, C.: Novel data mining methodologies for adverse drug event discovery and analysis. Clin. Pharmacol. Ther. 91(6) (2012) 1010–1021 5. Hauben, M., Norén, G.N.: A decade of data mining and still counting. Drug Saf. 33 (2010), 527–534 6. Harpaz, R., et al.: Combing signals from spontaneous reports and electronic health records for detection of adverse drug reactions. JAMIA 20 (2013) 413–419 14 http://bioportal.bioontology.org/ontologies/PV-SDO 7. Liu, M., et al.: Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs. JAMIA 19 (2012) e28–e35 8. The VigiBaseTM Web site: http://www.umc-products.com/, last access: September 30, 2014. 9. The FDA Adverse Event Reporting System (FAERS) Web site: http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/Advers eDrugEffects/default.htmAdverseDrugEffects/, last access: September 30, 2014. 10. Goldman, S.A.: Limitations and strengths of spontaneous reports data. Clin Ther. 20(Suppl C) (1998) C40–C44 11. Shang, N., Xu, H., Rindflesch, T.C., Cohen, T.: Identifying plausible adverse drug reactions using knowledge extracted from the literature. J. Biomed. Inform. DOI: 10.1016/j.jbi.2014.07.011 [ahead of print] 12. Studying the Science of Observational Research: Empirical findings from the Observational Medical Outcomes Partnership. Drug Saf. 36(Supp 1) (2013) 13. Freifeld, C.C., et al.: Digital drug safety surveillance: monitoring pharmaceutical products in Twitter. Drug Saf. 37, 343–350 (2014) 14. Hauben, M., Bate, A.: Decision support methods for the detection of adverse events in post- marketing data. Drug Discov. Today 14 (2009) 343–357 15. Suling, M., Pigeot, I.: Signal detection and monitoring based on longitudinal healthcare data. Pharmaceutics 4 (2012) 607–640 16. Harpaz, R. et al.: Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf. 37 (2014) 777–790 17. van Holle, L., Bauchau, V.: Signal detection on spontaneous reports of adverse events following immunisation: a comparison of the performance of a disproportionality-based algorithm and a time-to-onset-based algorithm. Pharmacoepidemiol. Drug Saf. 23 (2014) 178–185 18. Ryan, P.B., Madigan, D., Stang, P.E., Overhage, J.M., Racoosin, J.A., Hartzema, A.G.: Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the observational medical outcomes partnership. Stat. Med. 31 (2012), 4401– 4415 19. Harpaz, R. et al.: Combing signals from spontaneous reports and electronic health records for detection of adverse drug reactions. JAMIA 20 (2013) 413–419 20. Koutkias, V., Jaulent, M.-C.: An agent-based approach for integrated pharmacovigilance signal detection. In Proc. of the Multi-Agent Systems for Healthcare (MASH) Workshop, 13th Int. Conf. on Autonomous Agents & Multiagent Systems (AAMAS), Paris, France, May 6, 2014. 21. Ahmed, I., Poncet, A.: PhViD: an R package for PharmacoVigilance signal Detection. R package version 1.0.6. (2013). Available at: http://cran.r-project.org/web/packages/PhViD/, last access: September 30, 2014. 22. The Observational Medical Outcomes Partnership (OMOP) Methods Library: http://omop.org/MethodsLibrary, last access: September 30, 2014. 23. Gagne, J.J., et al.: Taxonomy for Monitoring Methods within a Medical Product Safety Surveillance System: Year Two Report of the Mini-Sentinel Taxonomy Project Workgroup. Available at: http://www.mini-sentinel.org/work_products/Statistical_Methods/Mini- Sentinel_Methods_Taxonomy-Year-2-Report.pdf, last access: September 30, 2014. 24. Rector, A., Iannone, L.: Lexically suggest, logically define: quality assurance of the use of qualifiers and expected results of post-coordination in SNOMED CT. J. Biomed. Inform. 45 (2012) 199–209 25. W3C Working Group Note 30 April 2013: PROV Overview: An Overview of the PROV Family of Documents”. Groth, P., Moreau, L. (Eds.). Available at: http://www.w3.org/TR/prov-overview/, last access: September 30, 2014 26. The Protégé knowledge modeling tool: Available at: http://protege.stanford.edu/, last access: September 30, 2014 27. Brank, J., Grobelnik, M., Mladenić, D.: A survey of ontology evaluation techniques. In: Proc. of Conf. on Data Mining and Data Warehouses (SiKDD), 2005, Ljubljana, Slovenia. 28. Caster, O., Juhlin, K., Watson, S., Norén, G.N.: Improved statistical signal detection in pharmacovigilance by combining multiple strength-of-evidence aspects in vigiRank. Drug Saf. 37 (2014) 617–628 29. Whetzel, P.L., et al.: BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 39(Web Server issue) (2011) W541–W545 30. The Bio2RDF Web site: http://bio2rdf.org/, last access: September 30, 2014 31. Jupp, S., et al.: The EBI RDF platform: linked open data for the life sciences. Bioinformatics 30 (2014) 1338–1339. 32. The openFDA Web site: https://open.fda.gov/, last access: September 30, 2014