<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Automated Exploration of Ontology Repositories</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Ondřej</forename><surname>Zamazal</surname></persName>
							<email>ondrej.zamazal|svatek@vse.cz</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Economics</orgName>
								<address>
									<addrLine>W. Churchill Sq.4</addrLine>
									<postCode>130 67</postCode>
									<settlement>Prague 3</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vojtěch</forename><surname>Svátek</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Economics</orgName>
								<address>
									<addrLine>W. Churchill Sq.4</addrLine>
									<postCode>130 67</postCode>
									<settlement>Prague 3</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Automated Exploration of Ontology Repositories</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3D7535711E152E6E348957186B3FBE8E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T15:52+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Motivation The choice of adequate ontology repository is an important prerequisite to finding an ontology to be reused or adapted for a concrete use case. As the repositories are mostly affiliated to particular communities within the semantic web, understanding the typical features of ontologies in each of them is also helpful for designers of ontology management tools.</p><p>Overall Process, Metrics and Results Our ontology exploration process includes ontology collection, materialization and then metrics computation; finally, the resulting metrics are explored using the R language<ref type="foot" target="#foot_0">1</ref> to automatically get a summary report in the form of tables. To automate the collection phase, we partly employed Ontohub, <ref type="bibr" target="#b1">2</ref> which is an open ontology repository mirroring several other repositories. The materialization includes ontology storing (into the database) in order to decompose them into entities, names, relations, imported ontologies and head nouns. We use the OWL-API<ref type="foot" target="#foot_2">3</ref> to manipulate the ontologies.</p><p>We considered metrics related to four aspects of ontologies. <ref type="bibr">4</ref> Logical and structural metrics include, e.g., the numbers of different types of entities and axioms. We also categorize the complexity of ontologies into bins (as in [2, 3]). The naming aspect reflects some basic information regarding the length of class name (local fragments of URI or labels), capitalization and usage of concatenation symbol/technique, i.e. a hyphen, underscore, camel-case or dot (as in [1]). For the annotation aspect we compute the proportions of RDFS annotations.</p><p>We explored ontologies from five prominent ontology repositories (Table <ref type="table" target="#tab_0">1</ref> contains just a few selected metrics). Due to parsing problems, unavailability of ontologies or their imports we however did not collect all ontologies from the repository. BioPortal<ref type="foot" target="#foot_4">5</ref> is a web portal providing access to a library of wellcurated biomedical ontologies via REST-ful services. It contains ontologies from another ontology repository, the OBO Foundry. <ref type="bibr">6</ref> We collected ontologies using the Ontohub mirror where only ontologies with size below 5MB (thus only 342 of the total) are available. <ref type="bibr" target="#b3">7</ref> The Dumontier lab ontologies 8 are biological ontologies aimed at knowledge representation and reasoning. Their ontologies are quite interconnected (many mutual imports). LOV 9 is a well-curated collection of linked open vocabularies used in the Linked Data Cloud. The Protégé ontology library mostly contains ontologies developed within the Protégé editor. As there is no programmatic access to the library, we manually downloaded them. It turned up that out of 93 ontologies (except Dumontier ontologies on which there is also a link) 43% ontologies were not available. Finally, the TONES repository (using its Ontohub mirror of 207 ontologies -collected 88%) contains ontologies of various domains, many of them however designed for testing purposes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Future Work</head><p>We plan to run such an analysis repeatedly, include more repositories (preferably via Ontohub) and more metrics. We also want to keep the ontology exploration services available via a web interface <ref type="bibr" target="#b4">10</ref> where the users could ask, on the one hand, for the latest summaries of particular repositories, and on the other hand for particular ontologies or ontologies meeting some criteria.</p><p>Ondřej Zamazal has been supported by the CSF grant no. 14-14076P and this research is also supported by UEP IGA project F4/34/2014 (IG407024).</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Selected metrics. Average (Avg) is either mean or median, according to better representativeness. The min statistics is omitted since it is always zero. The max statistics is omitted for ratios because it is nearly always 100%. The larges value across all repositories is in bold.</figDesc><table><row><cell>Metrics</cell><cell>(June 2014 snapshot)</cell><cell></cell><cell cols="5">BioPortal Dumontier LOV Protégé TONES</cell></row><row><cell cols="2">Ontologies processed</cell><cell></cell><cell>254</cell><cell>70</cell><cell>353</cell><cell>41</cell><cell>183</cell></row><row><cell cols="2">Percentage of all</cell><cell></cell><cell>74%</cell><cell cols="2">95% 83%</cell><cell>44%</cell><cell>88%</cell></row><row><cell cols="3">Complex class using existential restr. Avg</cell><cell>57%</cell><cell>28%</cell><cell>7%</cell><cell>14%</cell><cell>43%</cell></row><row><cell cols="2">Complex class as superclass</cell><cell>Avg</cell><cell>74%</cell><cell>35%</cell><cell>69%</cell><cell>57%</cell><cell>67%</cell></row><row><cell>Branching</cell><cell></cell><cell>Avg</cell><cell>0.88</cell><cell>0.55</cell><cell>0.48</cell><cell>0.61</cell><cell>0.79</cell></row><row><cell></cell><cell></cell><cell>Max</cell><cell>2.39</cell><cell>1.09</cell><cell>1.77</cell><cell>1.43</cell><cell>1.78</cell></row><row><cell cols="2">Multiple inheritance</cell><cell>Avg</cell><cell>32</cell><cell>0</cell><cell>1</cell><cell>6</cell><cell>12</cell></row><row><cell></cell><cell></cell><cell>Max</cell><cell>1877</cell><cell>254</cell><cell>321</cell><cell>497</cell><cell>24800</cell></row><row><cell cols="2">Annotation as label</cell><cell>Avg</cell><cell>38%</cell><cell cols="2">51% 32%</cell><cell>13%</cell><cell>38%</cell></row><row><cell cols="2">Annotation as comment</cell><cell>Avg</cell><cell>1%</cell><cell cols="2">37% 25%</cell><cell>49%</cell><cell>37%</cell></row><row><cell cols="2">Camel technique</cell><cell>Avg</cell><cell>15%</cell><cell cols="2">61% 39%</cell><cell>28%</cell><cell>29%</cell></row><row><cell cols="2">Underscore technique</cell><cell>Avg</cell><cell>54%</cell><cell>0%</cell><cell>0%</cell><cell>23%</cell><cell>36%</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.r-project.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://ontohub.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://owlapi.sourceforge.net/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">Due to the space limitation full list of metrics and complete results are at the supplementary web page: http://owl.vse.cz:8080/MetricsExploration/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">5 http://bioportal.bioontology.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">/ 6 http://obofoundry.org/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A Survey of Identifiers and Labels in OWL Ontologies</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A A M</forename><surname>Manaf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bechhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Stevens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">OWLED</title>
				<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A Snapshot of the OWL Web</title>
		<author>
			<persName><forename type="first">N</forename><surname>Matentzoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bail</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Parsia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A Survey of the Web Ontology Landscape</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Parsia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hendler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC-2006</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m">To overcome 5MB limitation we gathered BioPortal ontologies directly by RESTful services</title>
				<imprint/>
	</monogr>
	<note>Corresponding ontology metrics are available via the supplementary web</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<ptr target="http://owl.vse.cz:8080/MetricsExploration/" />
		<title level="m">A sample service, providing metrics for a given ontology</title>
				<imprint/>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
