=Paper= {{Paper |id=Vol-440/paper-7 |storemode=property |title=Ontology-based Technologies - Technology Transfer from Bioinformatics? |pdfUrl=https://ceur-ws.org/Vol-440/paper7.pdf |volume=Vol-440 |dblpUrl=https://dblp.org/rec/conf/oic/Neuhaus08 }} ==Ontology-based Technologies - Technology Transfer from Bioinformatics?== https://ceur-ws.org/Vol-440/paper7.pdf
                                                                                                                                            1




          Ontology-based technologies — Technology
                 transfer from bioinformatics?
                                                       Fabian Neuhaus, NIST



                         I. I NTRODUCTION                              are used primarily as controlled vocabularies; so far the use
   In the call for paper for OIC 2008 the description of the           of biological ontologies for automatic reasoning has been
conference contains the following optimistic outlook:                  surprisingly limited. However, even when biologists reason
      New approaches are required to enable greater flexi-             with the content of their ontologies, their needs typically differ
      bility, precision, timeliness and automation of analy-           from these in the intelligence community. Biologists are in-
      sis in response to rapidly evolving threats. Ontology-           terested in type-level reasoning (so-called ‘TBox reasoning’);
      based technology as applied in the areas such as                 the intelligence community is primarily interested in instance-
      bioinformatics has demonstrated the possibility of               level reasoning (so-called ‘ABox reasoning’). For example,
      gains along all of these dimensions. The time is ripe            a biologist might be interested in the query ‘What types of
      to extend these gains to other spheres.                          mutation lead to red eyes in fruit flies?’ but a biologist would
Ontology-based technologies clearly offer great potential for          never enter the query ‘Find all the fruit flies that have red
the intelligence community. In this paper I will discuss whether       eyes’. The reason is, of course, that biologists do not care
the intelligence community could adopt technologies that have          about individual fruit flies; and they do not keep track of the
been proven successful in bioinformatics. For this purpose             individual animals.
we have to consider how biologists apply these technologies               In contrast, analysts in the intelligence community are
and how their needs differ from the needs of the intelligence          interested primarily in instance-level queries about individual
community.                                                             people and organizations and their properties and relations.
                                                                       For example, a typical query might be ‘Find all people known
                   II. K INDS OF KNOWLEDGE                             to be member of Hamas, currently residents of Paris, and
   Biologists have been very successful at representing biolog-        have been in Tehran in the last three years’. Since instance-
ical knowledge in a machine-readable form with the help of             level reasoning is irrelevant for biologists the OBO-format,
ontology-based technologies. However, we should not take for           which is the knowledge representation language that has been
granted that the technologies that work for biologists would be        tailored to their needs, does not even allow assertions about
appropriate for the intelligence community, because the kind           instances. Consequently, all tools based on it do not support
of knowledge gathered by the intelligence community differs            instance-level reasoning. Ontologies that are written in OWL
in important respects from biological knowledge. While the             DL can be used with reasoners like Pellet or Racer1 , which
intelligence community is interested in individual people and          support instance-level reasoning. However, in spite of impres-
organizations, biologists are producing scientific knowledge           sive performance improvements, as of 2008 these reasoners are
that consist of more or less general laws. Even in cases where         not able to cope with the large-scale instance-level reasoning
biologists use terms from ontologies to describe the results of        (ABox reasoning) that would be required by the intelligence
individual experiments, these results are formulated as laws;          community [3], [4], [5], [6].
for example, laws like ‘if a fruit fly has the mutation x, then
the fly will have red eyes’. Biologists are only interested in the                                  IV. T IME
properties of individual animals or plants if these properties
                                                                          Another difference between biological knowledge and the
might provide evidence for or against a general hypothesis.
                                                                       knowledge gathered by the intelligence community is related
For this reason, it is usually not necessary, and often not even
                                                                       to time. Biological laws (and other natural laws) are timeless
possible, for biologists to keep track of the individual entities
                                                                       in the following sense: if a law like ‘if a fruit fly has the
that they are experimenting with; e.g., no biologist would care
                                                                       mutation x, then the fly will have red eyes’ is true then it
to uniquely identify the individual fruit flies of a population, let
                                                                       is not only true now, but also at any given other time. Of
alone the individual RNA molecules in a particular sample. In
                                                                       course, this does not mean that biologists do not care about
contrast, for the intelligence community it is crucial to identify
                                                                       change over time. Evolutionary biology is strongly concerned
individual persons of interest, to keep track of them over time,
                                                                       with the changes of DNA that give rise to new species, and
and to gather information about them. Furthermore, it is not
                                                                       developmental biologists study the processes and changes that
the primary purpose of the intelligence community to produce
                                                                       lead from fertilization to an adult organism. But while the
and test general hypotheses.
                                                                       individual organism changes over time during its development
            III. R EASONING WITH INSTANCES
                                                                         1 Any mention of commercial products or companies is for information
  Most biological ontologies are written either in the OBO             only and does not imply recommendation or endorsement by the author or
Flat File Format [1] or in OWL DL [2]. These ontologies                the National Institute of Standards and Technology.
                                                                                                                                                               2



(e.g., today’s caterpillar is tomorrow’s butterfly) the truth-value                                          VI. C ONCLUSION
of statements about development in biological ontologies (e.g.,                      There are some skills that biologists have developed when
‘The pupal stage follows the larval stage’) does not change                       they adopted ontology-based technologies that might be rele-
over time. As a result biologists have no need to express that                    vant for the intelligence community: techniques to build and
a statement is true only with respect to a given time.                            maintain large scale ontologies, evaluation methodologies, and
   In contrast, much of the knowledge the intelligence com-                       general design principles for ontologies. However, biologists
munity needs to represent is time-relative. For this reason, it                   and the intelligence community deal with very different kinds
turns out that the knowledge representation languages used by                     of knowledge and create ontologies for different purposes.
biologists do not meet the needs of the intelligence community.                   Thus the lessons that the intelligence community can learn
For example, it would be trivial to express a statement like ‘All                 from biologists will be limited: (i) The knowledge represen-
leaders of Hamas are located in the Gaza strip’ in the OBO-                       tation languages used by biologists do not meet the needs
format or in OWL DL but there is no straightforward way to                        of the intelligence community. OWL DL is more expressive
express ‘All leaders of Hamas are located in the Gaza strip on                    than the OBO-format, but since OWL DL offers no straight
August 27, 2008.’ The OBO-format cannot express statements                        forward ways to deal with time-relative statements and offers
about instances, but in OWL DL the same problem arises for                        no way to reason over the sources of statements OWL DL is
statements about instances: e.g., there is no straightforward                     still not expressive enough. (ii) Existing OWL DL reasoners
way to express ‘John has been married to Sue in 2004 and                          are not able to handle the amount of instance-level reasoning
John is married to Anne in 2008’ in OWL DL.2                                      that the intelligence community requires. (iii) Since the tools
                                                                                  developed for biologists work with ontologies either in the
                                                                                  OBO-format or in OWL DL it follows that these tools will
                              V. S OURCES                                         not be useful for the work of the intelligence community.
   Biology, as any evolving science, contains competing theo-
ries that are inconsistent with each other. To maintain con-                                                    R EFERENCES
sistency, biologists limit the scope of their ontologies to                        [1] J. Day-Richter. The OBO Flat File Format specification, version 1.2.
textbook knowledge – knowledge that has been vetted by the                             http://www.geneontology.org/GO.format.obo-1 2.shtml
                                                                                   [2] P.F. Patel-Schneider, P. Hayes, I. Horrocks. OWL Web Ontology
community and is considered part of the scientific consensus.                          Language semantics and abstract syntax. http://www.w3.org/TR/owl-
Obviously, this approach would not work for the intelligence                           semantics/
community, which has to deal with conflicting information                          [3] Z. Pan       Benchmarking DL reasoners using realistic ontologies.
                                                                                       http://www.mindswap.org/2005/OWLWorkshop/sub6.pdf
from unreliable sources. For this reason, it is crucial for the                    [4] E. Sirin, B. Parsia, B.C. Grau, A. Kalyanpur, Y. Katz. Pellet: A practical
intelligence community to represent not only the information                           OWL-DL reasoner. Web Semantics: Science, Services and Agents on the
itself but also the sources of the information. A knowledge                            World Wide Web, vol 5, issue 2, 2007, 51-53.
                                                                                   [5] Racer Systems.           Release Notes for RacerPro 1.9.2 beta.
representation language suitable for the intelligence commu-                           http://www.sts.tu-harburg.de/%7Er.f.moeller/racer/Racer-1-9-2-beta-
nity would enable the representation of statements like ‘Source                        Release-Notes/release-notes-1-9-2.html
x claims that Khaled Mashal will be in Tehran on August 17th                       [6] J. Bock, P. Haase, Q. Ji, R. Volz. Benchmarking OWL reasoners.
                                                                                       In F. van Harmelen, A. Herzig, P. Hitzler, Z. Lin, R. Piskac, G. Qi:
or 19th’. One major advantage of representing sources and the                          Proceedings of the Workshop on Advancing Reasoning on the Web:
information they provide within the same formalism is that the                         Scalability and Commonsense, 2008.
sources are treated as first-class citizens in the knowledge base                  [7] http://www.cs.unm.edu/ mccune/mace4
                                                                                   [8] P. Hayes, C. Menzel. IKL Specification Document.
and can be used in queries like: ‘Are there two independent                            http://www.ihmc.us/users/phayes/IKL/SPEC/SPEC.html
sources who claim that Khaled Mashai will be in Tehran?’ or                        [9] P. Hayes. IKL Guide.
’Provide source x and source y inconsistent information?’                              http://www.ihmc.us/users/phayes/IKL/GUIDE/GUIDE.html
                                                                                  [10] ISO/IEC 24707. Information technology – Common Logic (CL): a
   The representation of and the reasoning about sources of                            framework for a family of logic-based languages.
information is far beyond the scope of the OBO-format as well
as OWL DL. It is possible to stretch the boundaries of first-
order logic in a way that one can represent information about
sources. However, the resulting ontology is rather convoluted,
and my experiments with Prover9 (a first-order logic reasoner
[7]) showed that as a result the reasoner had difficulties to
answer even fairly simple queries. A knowledge representation
language that is designed to handle this kind of expression is
the IKRIS Knowledge Language (IKL), an extension of the
Common Logic Interchange Format [8], [9], [10]. Unfortu-
nately, there are no reasoning engines for IKL available at
this time.

   2 Note that it is possible to represent statements whose truth-values change
over time in OWL DL, but the resulting ontologies are rather convoluted, and
– at least in my opinion – OWL DL is a poor choice for ontologies that are
intended to support reasoning with these kind of statements.