GOCI: An Ontology-Driven Search and Curation Infrastructure for the NHGRI GWAS Catalog Danielle Welter1,*, Tony Burdett1, Lucia Hindorff2, Heather Junkins2, Jackie MacArthur1 and Helen Parkinson1 1 EMBL-­‐EBI,  Wellcome  Trust  Genome  Campus,  Hinxton,  UK   2  Office  of  Population  Genomics, National Human Genome Research Institute, NIH, Bethesda   traits to EFO in the future is under development in line with 1 INTRODUCTION on-going development in EFO, which requires this provi- We present the GWAS Ontology and Curation Infrastruc- sion for some of its other use cases as well. ture (GOCI), a collection of modules and features for the improvement of the curation, trait organization and querying 3 GWAS DIAGRAM of the NHGRI GWAS catalog (Hindorff et al., 2010). Aug- The GWAS catalogue produces a quarterly diagram of all menting the catalogue’s phenotypic traits with the semantic SNP-trait associations mapped onto their chromosomal lo- framework of an ontology will increase the range of possi- cations. Due to the considerable increase in SNP-trait asso- ble catalogue queries, facilitate the creation of a dynamic ciations since the first version of the diagram and the num- version of the iconic GWAS diagram and accelerate the ber of different phenotypes in the colour-coded legend, it is integration of catalogue data with other sources. GOCI also currently almost impossible to identify traits by visually includes a tracking system to support the complex curation analysing the diagram, which is only available in static PDF process and safeguard the extremely high quality of curation or Powerpoint format. the catalogue is known for. A key feature of GOCI implements a novel approach for automating the creation of the GWAS diagram using an ontology and scalable vector graphics (SVG), an XML- 2 ONTOLOGY based language for describing geometric objects. This Until recently, the phenotypic traits in the GWAS catalogue makes it possible to create an up-to-date, dynamic diagram were available only as an unstructured flat list partially that can be filtered and searched at different levels of granu- mapped to MeSH (Rogers, 1963). In order to formalise the larity and by different criteria, including trait, chromosomal trait representation, the GWAS traits were integrated into region and time. It is possible to zoom in over chromosomes the Experimental Factor Ontology (EFO) (Malone et al., in order to allow users to see all SNP-trait associations for a 2010). Representing GWAS traits in an expressive given region. SNP-trait associations are also interactive, knowledge representation language like OWL will allow for providing summary information on mouse-over as well as much richer queries over the GWAS catalogue. By choosing being clickable to allow the user to proceed from an associa- an established ontology like EFO, the long-term mainte- tion to the catalogue entry and to the publication. nance of these terms is assured and it also provides the po- tential for future integration of the GWAS catalogue with 4 TRACKING SYSTEM AND AUTOMATION other resources already consuming EFO. Much of the cov- erage provided by EFO meets the needs of the GWAS cata- Finally, GOCI also contains an online tracking system to logue in describing diverse concepts ranging from diseases support the highly complex curation process, as well as to measurements to complex, often context-dependent phe- some other support tools such as a batch loader to upload a notypes. EFO not only contains disease categories (such as set of SNPs to the catalog from a spreadsheet. MeSH), but also phenotypic descriptions, compound treat- ments, and so on. EFO’s policy of reuse also ensures longer REFERENCES term issues of integration are accounted for. Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., At the start of the integration process, around 20% of all Collins, F.S. and Manolio, T.A. (2009) Potential Etiologic and Func- GWAS traits were already described in EFO. New traits are tional Implications of Genome-Wide Association Loci of Human Dis- added either by importing appropriate classes from other eases and Traits. Proc. Natl. Acad. Sci. USA. 106, 9362–9367. ontologies or, in cases where no appropriate class can be Malone, J., Holloway, E., Adamusiak, T., Kapushesky, M., Zheng, J., identified in a reference ontology, created directly in EFO. Kolesnikov, N., Zhukova, A., Brazma, A. and Parkinson, H. (2010) Full coverage of existing GWAS traits is expected to be Modeling Sample Variables with an Experimental Factor Ontology. Bi- achieved by September 2012. A provision for adding further oinformatics. 26, 1112–1118. Rogers, F.B. (1963) Medical Subject Headings. Bull. Med. Libr. Assoc. 51, 114–116. * To whom correspondence should be addressed: dwelter@ebi.ac.uk 1