<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Hitchhiker's Guide to Ontology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabian M. Suchanek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Télécom Paris &amp; Institut Polytechnique de Paris</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A knowledge base (KB) is a computer-processable col- just as well as heavier solutions such as transformers. lection of knowledge about the world. In its simplest Completion of KBs. KBs are usually highly incomplete. variant, a KB takes the form of a labeled graph, where We have worked on this problem along several axes: Our the nodes are entities (such as people, organizations, and AMIE system [10] can find rules such as If two people are geographical locations), and the edges represent the links married, they usually live in the same city [11]. Such rules between these entities in the real world (such as who can then be used to predict missing facts. We have also was born where, which organization is headed by whom, developed methods to determine whether a fact is missing which city is the capital of which country etc.). The in the first place [ 12]. Another work [13] can determine formal definition of the categories and the relations of whether an attribute (such as hasParent) appears with all a KB is called an ontology1. Knowledge bases provide entities of a class (say, Person) in the real world - even if the background knowledge for diferent artificial intelli- it does not in the KB. Finally, we have developed methods gence applications, ranging from personal assistants to to estimate how many entities of a class are missing [14]. Web search, question answering, and text analysis. In Querying KBs. KBs can be pretty large, and thus they particular, KBs are useful in information retrieval (IR), are usually loaded into a triple store (database) in order where they serve for structured search and entity dis- to query them. However, for large KBs, even this loading ambiguation. Research has made extraordinary progress can take hours, and if we want to launch only a single in the automated construction of KBs, and today's KBs query, the loading is an overhead. We have developed contain billions of entities [1]. Nevertheless, KBs are still an approach that transforms a Datalog or SPARQL query far from perfect. In this keynote talk, I outline several into bash commands [15]. These can be executed directly challenges in the construction and maintenance of KBs, on the files of the KB (in TSV format), thus bypassing and show how our research group approached them. the triple store completely. Another work [16] is conConstruction of KBs. KBs used to be built by hand. In cerned with querying KBs that can be accessed only by 2008, our YAGO knowledge base [2] was one of the first predefined functions. We show that for a certain class large knowledge bases that were constructed automati- of functions and queries, it is decidable whether a query cally. While the first version of YAGO fed from Wikipedia, can be answered by an orchestration of these functions. the newest version, YAGO 4 [3], feeds from Wikidata. Applying KBs. We have applied KBs for the purposes YAGO 4 cleans up the taxonomy of Wikidata (by replac- of combinatorial creativity [17] and to the digital humaning it by the one from schema.org), gives entities and ities [18]. With the help of YAGO, we can, e.g., trace the relations readable identifiers, and applies schema con- life expectancy over the centuries, drilled down by genstraints. This cleans the data, and an OWL DL reasoner der or country of birth. In an attempt to bring semantic can actually run on this KB. We have also ventured be- understanding to a very diferent domain, we look into yond Wikidata and Wikipedia, by extracting commercial explaining the decisions of a black box machine learning products from the Web [4]. Our work harvests universal model by help of several decision trees [19]. product codes from the Web and builds a shallow KB on Extension of KBs. KBs usually contain mainly binary top. In such scenarios, one often encounters the problem links between entities - a knowledge representation of entity linking: Given the mention of an entity on a known as RDF. In our NoRDF project [6], we aim to enrich Web page, map it to any of the predefined entities from a KBs by beliefs, claims, events, causes, and entire stories. catalog. We found that this problem can be solved by a We have so far mainly surveyed the existing literature: rather lightweight neural architecture [5], which works how to deal with non-named entities [7], how to deal with vague expressions [8], and how to assess whether transformers can reason on natural language [9]. Conclusion. Knowledge Bases are a fascinating and useful domain of research. Many challenges have been overcome recently, and many new ones are awaiting us.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>DESIRES 2021 – 2nd International Conference on Design of
Experimental Search &amp; Information REtrieval Systems, September
15–18, 2021, Padua, Italy
" suchanek@telecom-paris.fr (F. M. Suchanek)
~ https://suchanek.name (F. M. Suchanek)</p>
      <p>© 2021 Copyright for this paper by its authors. Use permitted under Creative
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmmUoRns LWiceonsrekAstthribouptionP4r.0oIncteerenadtiionnagl s(CC(CBYE4U.0)R.-WS.org)</p>
      <p>1The title uses this word instead of “knowledge base” to rhyme
with a book title by Douglas Adams [42].</p>
      <p>Partially funded by ANR-20-CHIA-0012-01 (“NoRDF”).</p>
      <p>Acknowledgments
[18] T. Rebele, A. Nekoei, F. M. Suchanek, Using YAGO
for the Humanities , in: WHISE workshop, 2017.
[1] G. Weikum, L. Dong, S. Razniewski, F. M. Suchanek, [19] N. Radulović, A. Bifet, F. M. Suchanek, Confident
Machine Knowledge: Creation and Curation of Interpretations of Black Box Classifiers, in: IJCNN,
Comprehensive Knowledge Bases, in: Foundations 2021.</p>
      <p>and Trends in Databases, 2021. [42] Douglas Adams, The Hitchhiker’s Guide to the
[2] F. M. Suchanek, G. Kasneci, G. Weikum, Yago - A Galaxy, 1979.</p>
      <p>Core of Semantic Knowledge , in: WWW, 2007.
[3] T. P. Tanon, G. Weikum, F. M. Suchanek, YAGO 4:</p>
      <p>A Reason-able Knowledge Base , in: ESWC, 2020.
[4] A. Talaika, J. A. Biega, A. Amarilli, F. M. Suchanek,</p>
      <p>IBEX: Harvesting Entities from the Web Using</p>
      <p>Unique Identifiers , in: WebDB workshop, 2015.
[5] L. Chen, G. Varoquaux, F. M. Suchanek, A</p>
      <p>Lightweight Neural Model for Biomedical Entity</p>
      <p>Linking, in: AAAI, 2021.
[6] F. M. Suchanek, The Need to Move Beyond Triples</p>
      <p>, in: Text2Story workshop, 2020.
[7] P.-H. Paris, F. M. Suchanek, Non-named entities
- the silent majority, in: ESWC short paper track,
2021.
[8] P.-H. Paris, S. E. Aoud, F. M. Suchanek, The
Vagueness of Vagueness in Noun Phrases, in: AKBC short
paper track, 2021.
[9] C. Helwe, C. Clavel, F. M. Suchanek, Reasoning
with Transformer-based Models: Deep Learning,
but Shallow Reasoning, in: AKBC short paper track,
2021.
[10] L. Galárraga, C. Teflioudi, K. Hose, F. M. Suchanek,</p>
      <p>AMIE: Association Rule Mining under Incomplete
Evidence in Ontological Knowledge Bases , in:</p>
      <p>WWW, 2013.
[11] F. M. Suchanek, J. Lajus, A. Boschin, G. Weikum,</p>
      <p>Knowledge Representation and Rule Mining in
Entity-Centric Knowledge Bases , in: RW invited
paper, 2019.
[12] L. Galárraga, S. Razniewski, A. Amarilli, F. M.</p>
      <p>Suchanek, Predicting Completeness in Knowledge</p>
      <p>Bases , in: WSDM, 2017.
[13] J. Lajus, F. M. Suchanek, Are All People Married?</p>
      <p>Determining Obligatory Attributes in Knowledge</p>
      <p>Bases , in: WWW, 2018.
[14] A. Soulet, A. Giacometti, B. Markhof, F. M.</p>
      <p>Suchanek, Representativeness of Knowledge Bases
with the Generalized Benford’s Law, in: ISWC,
2018.
[15] T. Rebele, T. P. Tanon, F. M. Suchanek, Bash
Datalog: Answering Datalog Queries with Unix Shell</p>
      <p>Commands, in: ISWC, 2018.
[16] J. Romero, N. Preda, A. Amarilli, F. M. Suchanek,</p>
      <p>Equivalent Rewritings on Path Views with Binding</p>
      <p>Patterns, in: ESWC, 2020.
[17] F. M. Suchanek, C. Menard, M. Bienvenu, C.
Chapellier, Can you imagine... a language for
combinatorial creativity? , in: ISWC, 2016.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>