=Paper=
{{Paper
|id=Vol-440/paper-7
|storemode=property
|title=Ontology-based Technologies - Technology Transfer from Bioinformatics?
|pdfUrl=https://ceur-ws.org/Vol-440/paper7.pdf
|volume=Vol-440
|dblpUrl=https://dblp.org/rec/conf/oic/Neuhaus08
}}
==Ontology-based Technologies - Technology Transfer from Bioinformatics?==
1
Ontology-based technologies — Technology
transfer from bioinformatics?
Fabian Neuhaus, NIST
I. I NTRODUCTION are used primarily as controlled vocabularies; so far the use
In the call for paper for OIC 2008 the description of the of biological ontologies for automatic reasoning has been
conference contains the following optimistic outlook: surprisingly limited. However, even when biologists reason
New approaches are required to enable greater flexi- with the content of their ontologies, their needs typically differ
bility, precision, timeliness and automation of analy- from these in the intelligence community. Biologists are in-
sis in response to rapidly evolving threats. Ontology- terested in type-level reasoning (so-called ‘TBox reasoning’);
based technology as applied in the areas such as the intelligence community is primarily interested in instance-
bioinformatics has demonstrated the possibility of level reasoning (so-called ‘ABox reasoning’). For example,
gains along all of these dimensions. The time is ripe a biologist might be interested in the query ‘What types of
to extend these gains to other spheres. mutation lead to red eyes in fruit flies?’ but a biologist would
Ontology-based technologies clearly offer great potential for never enter the query ‘Find all the fruit flies that have red
the intelligence community. In this paper I will discuss whether eyes’. The reason is, of course, that biologists do not care
the intelligence community could adopt technologies that have about individual fruit flies; and they do not keep track of the
been proven successful in bioinformatics. For this purpose individual animals.
we have to consider how biologists apply these technologies In contrast, analysts in the intelligence community are
and how their needs differ from the needs of the intelligence interested primarily in instance-level queries about individual
community. people and organizations and their properties and relations.
For example, a typical query might be ‘Find all people known
II. K INDS OF KNOWLEDGE to be member of Hamas, currently residents of Paris, and
Biologists have been very successful at representing biolog- have been in Tehran in the last three years’. Since instance-
ical knowledge in a machine-readable form with the help of level reasoning is irrelevant for biologists the OBO-format,
ontology-based technologies. However, we should not take for which is the knowledge representation language that has been
granted that the technologies that work for biologists would be tailored to their needs, does not even allow assertions about
appropriate for the intelligence community, because the kind instances. Consequently, all tools based on it do not support
of knowledge gathered by the intelligence community differs instance-level reasoning. Ontologies that are written in OWL
in important respects from biological knowledge. While the DL can be used with reasoners like Pellet or Racer1 , which
intelligence community is interested in individual people and support instance-level reasoning. However, in spite of impres-
organizations, biologists are producing scientific knowledge sive performance improvements, as of 2008 these reasoners are
that consist of more or less general laws. Even in cases where not able to cope with the large-scale instance-level reasoning
biologists use terms from ontologies to describe the results of (ABox reasoning) that would be required by the intelligence
individual experiments, these results are formulated as laws; community [3], [4], [5], [6].
for example, laws like ‘if a fruit fly has the mutation x, then
the fly will have red eyes’. Biologists are only interested in the IV. T IME
properties of individual animals or plants if these properties
Another difference between biological knowledge and the
might provide evidence for or against a general hypothesis.
knowledge gathered by the intelligence community is related
For this reason, it is usually not necessary, and often not even
to time. Biological laws (and other natural laws) are timeless
possible, for biologists to keep track of the individual entities
in the following sense: if a law like ‘if a fruit fly has the
that they are experimenting with; e.g., no biologist would care
mutation x, then the fly will have red eyes’ is true then it
to uniquely identify the individual fruit flies of a population, let
is not only true now, but also at any given other time. Of
alone the individual RNA molecules in a particular sample. In
course, this does not mean that biologists do not care about
contrast, for the intelligence community it is crucial to identify
change over time. Evolutionary biology is strongly concerned
individual persons of interest, to keep track of them over time,
with the changes of DNA that give rise to new species, and
and to gather information about them. Furthermore, it is not
developmental biologists study the processes and changes that
the primary purpose of the intelligence community to produce
lead from fertilization to an adult organism. But while the
and test general hypotheses.
individual organism changes over time during its development
III. R EASONING WITH INSTANCES
1 Any mention of commercial products or companies is for information
Most biological ontologies are written either in the OBO only and does not imply recommendation or endorsement by the author or
Flat File Format [1] or in OWL DL [2]. These ontologies the National Institute of Standards and Technology.
2
(e.g., today’s caterpillar is tomorrow’s butterfly) the truth-value VI. C ONCLUSION
of statements about development in biological ontologies (e.g., There are some skills that biologists have developed when
‘The pupal stage follows the larval stage’) does not change they adopted ontology-based technologies that might be rele-
over time. As a result biologists have no need to express that vant for the intelligence community: techniques to build and
a statement is true only with respect to a given time. maintain large scale ontologies, evaluation methodologies, and
In contrast, much of the knowledge the intelligence com- general design principles for ontologies. However, biologists
munity needs to represent is time-relative. For this reason, it and the intelligence community deal with very different kinds
turns out that the knowledge representation languages used by of knowledge and create ontologies for different purposes.
biologists do not meet the needs of the intelligence community. Thus the lessons that the intelligence community can learn
For example, it would be trivial to express a statement like ‘All from biologists will be limited: (i) The knowledge represen-
leaders of Hamas are located in the Gaza strip’ in the OBO- tation languages used by biologists do not meet the needs
format or in OWL DL but there is no straightforward way to of the intelligence community. OWL DL is more expressive
express ‘All leaders of Hamas are located in the Gaza strip on than the OBO-format, but since OWL DL offers no straight
August 27, 2008.’ The OBO-format cannot express statements forward ways to deal with time-relative statements and offers
about instances, but in OWL DL the same problem arises for no way to reason over the sources of statements OWL DL is
statements about instances: e.g., there is no straightforward still not expressive enough. (ii) Existing OWL DL reasoners
way to express ‘John has been married to Sue in 2004 and are not able to handle the amount of instance-level reasoning
John is married to Anne in 2008’ in OWL DL.2 that the intelligence community requires. (iii) Since the tools
developed for biologists work with ontologies either in the
OBO-format or in OWL DL it follows that these tools will
V. S OURCES not be useful for the work of the intelligence community.
Biology, as any evolving science, contains competing theo-
ries that are inconsistent with each other. To maintain con- R EFERENCES
sistency, biologists limit the scope of their ontologies to [1] J. Day-Richter. The OBO Flat File Format specification, version 1.2.
textbook knowledge – knowledge that has been vetted by the http://www.geneontology.org/GO.format.obo-1 2.shtml
[2] P.F. Patel-Schneider, P. Hayes, I. Horrocks. OWL Web Ontology
community and is considered part of the scientific consensus. Language semantics and abstract syntax. http://www.w3.org/TR/owl-
Obviously, this approach would not work for the intelligence semantics/
community, which has to deal with conflicting information [3] Z. Pan Benchmarking DL reasoners using realistic ontologies.
http://www.mindswap.org/2005/OWLWorkshop/sub6.pdf
from unreliable sources. For this reason, it is crucial for the [4] E. Sirin, B. Parsia, B.C. Grau, A. Kalyanpur, Y. Katz. Pellet: A practical
intelligence community to represent not only the information OWL-DL reasoner. Web Semantics: Science, Services and Agents on the
itself but also the sources of the information. A knowledge World Wide Web, vol 5, issue 2, 2007, 51-53.
[5] Racer Systems. Release Notes for RacerPro 1.9.2 beta.
representation language suitable for the intelligence commu- http://www.sts.tu-harburg.de/%7Er.f.moeller/racer/Racer-1-9-2-beta-
nity would enable the representation of statements like ‘Source Release-Notes/release-notes-1-9-2.html
x claims that Khaled Mashal will be in Tehran on August 17th [6] J. Bock, P. Haase, Q. Ji, R. Volz. Benchmarking OWL reasoners.
In F. van Harmelen, A. Herzig, P. Hitzler, Z. Lin, R. Piskac, G. Qi:
or 19th’. One major advantage of representing sources and the Proceedings of the Workshop on Advancing Reasoning on the Web:
information they provide within the same formalism is that the Scalability and Commonsense, 2008.
sources are treated as first-class citizens in the knowledge base [7] http://www.cs.unm.edu/ mccune/mace4
[8] P. Hayes, C. Menzel. IKL Specification Document.
and can be used in queries like: ‘Are there two independent http://www.ihmc.us/users/phayes/IKL/SPEC/SPEC.html
sources who claim that Khaled Mashai will be in Tehran?’ or [9] P. Hayes. IKL Guide.
’Provide source x and source y inconsistent information?’ http://www.ihmc.us/users/phayes/IKL/GUIDE/GUIDE.html
[10] ISO/IEC 24707. Information technology – Common Logic (CL): a
The representation of and the reasoning about sources of framework for a family of logic-based languages.
information is far beyond the scope of the OBO-format as well
as OWL DL. It is possible to stretch the boundaries of first-
order logic in a way that one can represent information about
sources. However, the resulting ontology is rather convoluted,
and my experiments with Prover9 (a first-order logic reasoner
[7]) showed that as a result the reasoner had difficulties to
answer even fairly simple queries. A knowledge representation
language that is designed to handle this kind of expression is
the IKRIS Knowledge Language (IKL), an extension of the
Common Logic Interchange Format [8], [9], [10]. Unfortu-
nately, there are no reasoning engines for IKL available at
this time.
2 Note that it is possible to represent statements whose truth-values change
over time in OWL DL, but the resulting ontologies are rather convoluted, and
– at least in my opinion – OWL DL is a poor choice for ontologies that are
intended to support reasoning with these kind of statements.