OWLmaker: An Application for Generating OWL Files from Tabular Text Jie Zhenga, John Judkinsb, Christian Stoeckerta a Department of Genetics, Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA, b Department of Biology, University of Pennsylvania, Philadelphia, PA, USA Abstract term’s label. The aforementioned tools also are more focused on axiom creation than annotation creation, even though the Tools have already been developed that allow an OWL file to be development of many ontologies, such as the EuPathDB generated from a table as input that contains the necessary ontology, involves generation of new terms with associated information to build an ontology. However, conversion using annotations (e.g. definition, definition source, term editor, etc.) these tools is time-consuming if the terms in the input file must be OWLmaker was created to address this need and is available at assigned to an existing IRI in an external ontology (rather than [https://github.com/EuPath-ontology/OWLmaker]. being assigned a new IRI) and manually assigned new IRIs. We developed OWLmaker to generate an RDF/XML format OWL file from tabular text, with the option of automating IRI assignment Methods with reference to an existing ontology. OWLmaker is a JAR file that works with a tab-delimited setting Keywords: file to convert the input table to RDF/XML format OWL. Before Table to OWL format conversion; ontology development; the user executes the application, adjustments may need to be automation made to both the input file and the setting file. The input file can be in either tab-delimited or csv format and should, for each term, Introduction have a row identifying the term’s label, parent IRI, and parent label to ensure the hierarchy is complete in the output OWL file. Part of our standard procedure for the EuPathDB project (The term IRI can be manually entered also, if required.) To [http://eupathdb.org/] is to create an RDF/XML format OWL file attribute annotation properties to terms in the output file, the user for each study whose results we intend to incorporate, from a first provides a column of values for each property that has the spreadsheet of variables with extensive annotations used by data name of the property as the column header. Optional annotation providers. An OWL file may be built and edited one term at a time values for each term also must be identified in its row to be added using ontology development tools such as Stanford University’s to the OWL file. Protégé [1]. The Cellfie plugin (based on MappingMaster [2]) for The setting file is arranged in tab-delimited format with two Protégé offers automation for this process by producing OWL columns such that the left column names parameters used by the from a spreadsheet, but this plugin requires the use of its own application and the right column contains the parameters’ values. mapping language [2]. Ontorat [3] is a web application that The values can be adjusted by the user and include the names of generates an OWL file from tabular input with customizable the input and output files, the ontology IRI, and the IRI prefix for settings to provide annotation properties and automatically new terms. The user also provides the number of the column for generate new IRIs. It improves upon similar tools by not requiring the term’s IRI, label, parent label, and parent IRI, as well as the the user to learn a separate langugage, nor does it require name and IRI of each annotation property. installation. ROBOT’s “template” command also has this functionality [4]. The setting, input, and output files of a simple example, shown in Figures 1-3, is available from the “test” directory in the project’s Even with these tools available, the overall process required in the GitHub repository [https://github.com/EuPath- conversion of tabular text to OWL can still be complex and time- ontology/OWLmaker/tree/master/test]. Here a small OWL file consuming to learn. A tool that provides similar functionality based on the EuPath ontology is used as the external ontology. A while also automating more of the conversion process would be URI for the external ontology is provided in this example, but a preferable for the EuPathDB project. One such automating feature filename with file path can also be used. An input file and setting would be the ability to assign an existing IRI in a specified file are also provided so that, when OWLmaker is run, external ontology to a matching term in the input file based on the big_ontology.owl is created. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). IRI Label parentLabel parentIRI definition definition source focus, which is the creation of annotated classes, rather than a categorical measurement datum that specifies whether annotated individuals or properties. presence of Ancylostomatoidea was detected Ancylostomatoidea by categorical http://purl.obolibrary. by a real time polymerase chain qPCR measurement datum org/obo/OBI_0000938 reaction assay a categorical measurement EuPathDB Since tools are available to merge two OWL files into one, the datum that specifies whether Ancylostoma duodenale was output of OWLmaker can be used to add new terms into an presence of Ancylostoma categorical duodenale by qPCR measurement datum http://purl.obolibrary. detected by a real time org/obo/OBI_0000938 polymerase chain reaction assay EuPathDB existing ontology. Since a SPARQL query can generate a a categorical measurement datum that specifies whether conversion file from an OWL file, having OWLMaker and presence of Ascaris categorical Ascaris lumbricoides was http://purl.obolibrary. detected by a real time software supporting SPARQL also allows an ontologist to keep lumbricoides by qPCR measurement datum org/obo/OBI_0000938 polymerase chain reaction assay EuPathDB a categorical measurement an ontology in both tabular and OWL formats and convert datum that specifies whether Astrovirus was detected by an between them easily. This way, an ontologist can submit an http://purl.obolibrary.org presence of Astrovirus by categorical /obo/EUPATH_0010611 ELISA measurement datum http://purl.obolibrary. enzyme-linked immunosorbent org/obo/OBI_0000938 assay EuPathDB ontology as a CSV file to collaborators who can then easily a categorical measurement datum that specifies whether populate a column with values to attribute a new annotation presence of Astrovirus Y categorical Astrovirus was detected by a http://purl.obolibrary. reverse transcription polymerase property to terms. The modified CSV can then be converted back by RT-PCR categorical measurement datum org/obo/OBI_0000938 chain reaction assay EuPathDB to OWL with the new annotation values included. measurement datum measurement datum measurement datum data item information content Future work may include removing unnecessary warnings data item information content entity generically dependent generated upon execution of the application. Generation of IDs entity generically dependent continuant could be improved as well: we could remove the seven-digit continuant continuant continuant entity restriction of IDs and prevent the application from generating a entity new ID that was already assigned to a term in the external Figure 1– Example Input File ontology. Future work could also allow the application to not only work with tab-delimited or comma-separated files as input, but also proprietary formats such as Excel files or Google Sheets. path /path/to/input/file/ input file output file new_terms.txt big_ontology.owl Conclusions ontology IRI http://example.com/big_ontology.owl IRI base http://example.com/ prefix EX start ID 10001 OWLmaker does not need to be installed (although it requires external ontology file https://raw.githubusercontent.com/EuPath-ontology/OWLmaker/master/test/small_ontology.owl Java) and requires access to only the websites specified in the label position 2 setting file. Because a user of OWLmaker has both the options of IRI position 1 parent label position 3 having the application search an external ontology for IRIs for parent IRI position 4 annotation property definition|IAO_0000115 matching term labels and also having the application generate annotation property definition source|IAO_0000119 new IRIs as specified, a complete OWL file for an ontology can Figure 2– Example Setting File be generated in fewer steps compared to similar tools. For these reasons, we conclude that OWLmaker usefully supplements existing similiar tools to convert tabular text to OWL. Results Acknowledgements Using the input and settings files in our example, OWLmaker We thank Mark A. Miller for testing the software and providing generates a new IRI for each term unless its label matches a term valuable feedback. This work was supported by NIH label in the external ontology. Explicit IRIs in the input file are HHSN272201400030C. also automatically assigned. Address for correspondence Jie Zheng, jiezheng@pennmedicine.upenn.edu References 1. Noy N F, Crubézy M, Fergerson R W, Knublaugh H, Figure 3– Screenshots displaying differences between external Tu S W, Vendetti J, Musen M A. Protégé-2000: An open- ontology (left) and output (right) source ontology-development and knowledge-acquisition environment. AMIA Annu Symp Proc. 2003;2003:953. Discussion 2. O'Connor MJ, Halaschek-Wiener C, Musen MA. Mapping Master: a Language for Mapping Spreadsheets to The application has been shown to reliably output the desired OWL. The Semantic Web – ISWC 2010: Springer; 2010. p. 194- OWL file from correctly populated input and setting files. The 208. output file in Protégé displays the complete hierarchy correctly, and all annotations are properly attributed. OWLmaker fulfills its 3. Xiang Z, Zheng J, Lin Y, He Y. Ontorat: automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns: J Biomed Sem; 2015. 4. Tauber R, Balhoff JP, Douglass E, Mungall CJ, Overton JA. Standardizing Ontology Workflows Using ROBOT. CEUR Workshop Proc. 2018;2285.