A Hitchhiker’s Guide to Ontology
Fabian M. Suchanek
Télécom Paris & Institut Polytechnique de Paris, France


A knowledge base (KB) is a computer-processable col-                                                               just as well as heavier solutions such as transformers.
lection of knowledge about the world. In its simplest                                                              Completion of KBs. KBs are usually highly incomplete.
variant, a KB takes the form of a labeled graph, where                                                             We have worked on this problem along several axes: Our
the nodes are entities (such as people, organizations, and                                                         AMIE system [10] can find rules such as If two people are
geographical locations), and the edges represent the links                                                         married, they usually live in the same city [11]. Such rules
between these entities in the real world (such as who                                                              can then be used to predict missing facts. We have also
was born where, which organization is headed by whom,                                                              developed methods to determine whether a fact is missing
which city is the capital of which country etc.). The                                                              in the first place [12]. Another work [13] can determine
formal definition of the categories and the relations of                                                           whether an attribute (such as hasParent) appears with all
a KB is called an ontology1 . Knowledge bases provide                                                              entities of a class (say, Person) in the real world – even if
the background knowledge for different artificial intelli-                                                         it does not in the KB. Finally, we have developed methods
gence applications, ranging from personal assistants to                                                            to estimate how many entities of a class are missing [14].
Web search, question answering, and text analysis. In                                                              Querying KBs. KBs can be pretty large, and thus they
particular, KBs are useful in information retrieval (IR),                                                          are usually loaded into a triple store (database) in order
where they serve for structured search and entity dis-                                                             to query them. However, for large KBs, even this loading
ambiguation. Research has made extraordinary progress                                                              can take hours, and if we want to launch only a single
in the automated construction of KBs, and today’s KBs                                                              query, the loading is an overhead. We have developed
contain billions of entities [1]. Nevertheless, KBs are still                                                      an approach that transforms a Datalog or SPARQL query
far from perfect. In this keynote talk, I outline several                                                          into bash commands [15]. These can be executed directly
challenges in the construction and maintenance of KBs,                                                             on the files of the KB (in TSV format), thus bypassing
and show how our research group approached them.                                                                   the triple store completely. Another work [16] is con-
Construction of KBs. KBs used to be built by hand. In                                                              cerned with querying KBs that can be accessed only by
2008, our YAGO knowledge base [2] was one of the first                                                             predefined functions. We show that for a certain class
large knowledge bases that were constructed automati-                                                              of functions and queries, it is decidable whether a query
cally. While the first version of YAGO fed from Wikipedia,                                                         can be answered by an orchestration of these functions.
the newest version, YAGO 4 [3], feeds from Wikidata.                                                               Applying KBs. We have applied KBs for the purposes
YAGO 4 cleans up the taxonomy of Wikidata (by replac-                                                              of combinatorial creativity [17] and to the digital human-
ing it by the one from schema.org), gives entities and                                                             ities [18]. With the help of YAGO, we can, e.g., trace the
relations readable identifiers, and applies schema con-                                                            life expectancy over the centuries, drilled down by gen-
straints. This cleans the data, and an OWL DL reasoner                                                             der or country of birth. In an attempt to bring semantic
can actually run on this KB. We have also ventured be-                                                             understanding to a very different domain, we look into
yond Wikidata and Wikipedia, by extracting commercial                                                              explaining the decisions of a black box machine learning
products from the Web [4]. Our work harvests universal                                                             model by help of several decision trees [19].
product codes from the Web and builds a shallow KB on                                                              Extension of KBs. KBs usually contain mainly binary
top. In such scenarios, one often encounters the problem                                                           links between entities – a knowledge representation
of entity linking: Given the mention of an entity on a                                                             known as RDF. In our NoRDF project [6], we aim to enrich
Web page, map it to any of the predefined entities from a                                                          KBs by beliefs, claims, events, causes, and entire stories.
catalog. We found that this problem can be solved by a                                                             We have so far mainly surveyed the existing literature:
rather lightweight neural architecture [5], which works                                                            how to deal with non-named entities [7], how to deal
                                                                                                                   with vague expressions [8], and how to assess whether
DESIRES 2021 – 2nd International Conference on Design of                                                           transformers can reason on natural language [9].
Experimental Search & Information REtrieval Systems, September                                                     Conclusion. Knowledge Bases are a fascinating and
15–18, 2021, Padua, Italy                                                                                          useful domain of research. Many challenges have been
" suchanek@telecom-paris.fr (F. M. Suchanek)
~ https://suchanek.name (F. M. Suchanek)
                                                                                                                   overcome recently, and many new ones are awaiting us.
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                    Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
         CEUR Workshop Proceedings (CEUR-WS.org)
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                                                                                                   Acknowledgments
               1
     The title uses this word instead of “knowledge base” to rhyme
with a book title by Douglas Adams [42].                                                                           Partially funded by ANR-20-CHIA-0012-01 (“NoRDF”).
References                                                   [18] T. Rebele, A. Nekoei, F. M. Suchanek, Using YAGO
                                                                  for the Humanities , in: WHISE workshop, 2017.
 [1] G. Weikum, L. Dong, S. Razniewski, F. M. Suchanek,      [19] N. Radulović, A. Bifet, F. M. Suchanek, Confident
     Machine Knowledge: Creation and Curation of                  Interpretations of Black Box Classifiers, in: IJCNN,
     Comprehensive Knowledge Bases, in: Foundations               2021.
     and Trends in Databases, 2021.                          [42] Douglas Adams, The Hitchhiker’s Guide to the
 [2] F. M. Suchanek, G. Kasneci, G. Weikum, Yago - A              Galaxy, 1979.
     Core of Semantic Knowledge , in: WWW, 2007.
 [3] T. P. Tanon, G. Weikum, F. M. Suchanek, YAGO 4:
     A Reason-able Knowledge Base , in: ESWC, 2020.
 [4] A. Talaika, J. A. Biega, A. Amarilli, F. M. Suchanek,
     IBEX: Harvesting Entities from the Web Using
     Unique Identifiers , in: WebDB workshop, 2015.
 [5] L. Chen, G. Varoquaux, F. M. Suchanek, A
     Lightweight Neural Model for Biomedical Entity
     Linking, in: AAAI, 2021.
 [6] F. M. Suchanek, The Need to Move Beyond Triples
     , in: Text2Story workshop, 2020.
 [7] P.-H. Paris, F. M. Suchanek, Non-named entities
     - the silent majority, in: ESWC short paper track,
     2021.
 [8] P.-H. Paris, S. E. Aoud, F. M. Suchanek, The Vague-
     ness of Vagueness in Noun Phrases, in: AKBC short
     paper track, 2021.
 [9] C. Helwe, C. Clavel, F. M. Suchanek, Reasoning
     with Transformer-based Models: Deep Learning,
     but Shallow Reasoning, in: AKBC short paper track,
     2021.
[10] L. Galárraga, C. Teflioudi, K. Hose, F. M. Suchanek,
     AMIE: Association Rule Mining under Incomplete
     Evidence in Ontological Knowledge Bases , in:
     WWW, 2013.
[11] F. M. Suchanek, J. Lajus, A. Boschin, G. Weikum,
     Knowledge Representation and Rule Mining in
     Entity-Centric Knowledge Bases , in: RW invited
     paper, 2019.
[12] L. Galárraga, S. Razniewski, A. Amarilli, F. M.
     Suchanek, Predicting Completeness in Knowledge
     Bases , in: WSDM, 2017.
[13] J. Lajus, F. M. Suchanek, Are All People Married?
     Determining Obligatory Attributes in Knowledge
     Bases , in: WWW, 2018.
[14] A. Soulet, A. Giacometti, B. Markhoff, F. M.
     Suchanek, Representativeness of Knowledge Bases
     with the Generalized Benford’s Law, in: ISWC,
     2018.
[15] T. Rebele, T. P. Tanon, F. M. Suchanek, Bash Dat-
     alog: Answering Datalog Queries with Unix Shell
     Commands, in: ISWC, 2018.
[16] J. Romero, N. Preda, A. Amarilli, F. M. Suchanek,
     Equivalent Rewritings on Path Views with Binding
     Patterns, in: ESWC, 2020.
[17] F. M. Suchanek, C. Menard, M. Bienvenu, C. Chapel-
     lier, Can you imagine... a language for combinato-
     rial creativity? , in: ISWC, 2016.