A Hitchhiker’s Guide to Ontology Fabian M. Suchanek Télécom Paris & Institut Polytechnique de Paris, France A knowledge base (KB) is a computer-processable col- just as well as heavier solutions such as transformers. lection of knowledge about the world. In its simplest Completion of KBs. KBs are usually highly incomplete. variant, a KB takes the form of a labeled graph, where We have worked on this problem along several axes: Our the nodes are entities (such as people, organizations, and AMIE system [10] can find rules such as If two people are geographical locations), and the edges represent the links married, they usually live in the same city [11]. Such rules between these entities in the real world (such as who can then be used to predict missing facts. We have also was born where, which organization is headed by whom, developed methods to determine whether a fact is missing which city is the capital of which country etc.). The in the first place [12]. Another work [13] can determine formal definition of the categories and the relations of whether an attribute (such as hasParent) appears with all a KB is called an ontology1 . Knowledge bases provide entities of a class (say, Person) in the real world – even if the background knowledge for different artificial intelli- it does not in the KB. Finally, we have developed methods gence applications, ranging from personal assistants to to estimate how many entities of a class are missing [14]. Web search, question answering, and text analysis. In Querying KBs. KBs can be pretty large, and thus they particular, KBs are useful in information retrieval (IR), are usually loaded into a triple store (database) in order where they serve for structured search and entity dis- to query them. However, for large KBs, even this loading ambiguation. Research has made extraordinary progress can take hours, and if we want to launch only a single in the automated construction of KBs, and today’s KBs query, the loading is an overhead. We have developed contain billions of entities [1]. Nevertheless, KBs are still an approach that transforms a Datalog or SPARQL query far from perfect. In this keynote talk, I outline several into bash commands [15]. These can be executed directly challenges in the construction and maintenance of KBs, on the files of the KB (in TSV format), thus bypassing and show how our research group approached them. the triple store completely. Another work [16] is con- Construction of KBs. KBs used to be built by hand. In cerned with querying KBs that can be accessed only by 2008, our YAGO knowledge base [2] was one of the first predefined functions. We show that for a certain class large knowledge bases that were constructed automati- of functions and queries, it is decidable whether a query cally. While the first version of YAGO fed from Wikipedia, can be answered by an orchestration of these functions. the newest version, YAGO 4 [3], feeds from Wikidata. Applying KBs. We have applied KBs for the purposes YAGO 4 cleans up the taxonomy of Wikidata (by replac- of combinatorial creativity [17] and to the digital human- ing it by the one from schema.org), gives entities and ities [18]. With the help of YAGO, we can, e.g., trace the relations readable identifiers, and applies schema con- life expectancy over the centuries, drilled down by gen- straints. This cleans the data, and an OWL DL reasoner der or country of birth. In an attempt to bring semantic can actually run on this KB. We have also ventured be- understanding to a very different domain, we look into yond Wikidata and Wikipedia, by extracting commercial explaining the decisions of a black box machine learning products from the Web [4]. Our work harvests universal model by help of several decision trees [19]. product codes from the Web and builds a shallow KB on Extension of KBs. KBs usually contain mainly binary top. In such scenarios, one often encounters the problem links between entities – a knowledge representation of entity linking: Given the mention of an entity on a known as RDF. In our NoRDF project [6], we aim to enrich Web page, map it to any of the predefined entities from a KBs by beliefs, claims, events, causes, and entire stories. catalog. We found that this problem can be solved by a We have so far mainly surveyed the existing literature: rather lightweight neural architecture [5], which works how to deal with non-named entities [7], how to deal with vague expressions [8], and how to assess whether DESIRES 2021 – 2nd International Conference on Design of transformers can reason on natural language [9]. Experimental Search & Information REtrieval Systems, September Conclusion. Knowledge Bases are a fascinating and 15–18, 2021, Padua, Italy useful domain of research. Many challenges have been " suchanek@telecom-paris.fr (F. M. Suchanek) ~ https://suchanek.name (F. M. Suchanek) overcome recently, and many new ones are awaiting us. © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 Acknowledgments 1 The title uses this word instead of “knowledge base” to rhyme with a book title by Douglas Adams [42]. Partially funded by ANR-20-CHIA-0012-01 (“NoRDF”). References [18] T. Rebele, A. Nekoei, F. M. Suchanek, Using YAGO for the Humanities , in: WHISE workshop, 2017. [1] G. Weikum, L. Dong, S. Razniewski, F. M. Suchanek, [19] N. Radulović, A. Bifet, F. M. Suchanek, Confident Machine Knowledge: Creation and Curation of Interpretations of Black Box Classifiers, in: IJCNN, Comprehensive Knowledge Bases, in: Foundations 2021. and Trends in Databases, 2021. [42] Douglas Adams, The Hitchhiker’s Guide to the [2] F. M. Suchanek, G. Kasneci, G. Weikum, Yago - A Galaxy, 1979. Core of Semantic Knowledge , in: WWW, 2007. [3] T. P. Tanon, G. Weikum, F. M. Suchanek, YAGO 4: A Reason-able Knowledge Base , in: ESWC, 2020. [4] A. Talaika, J. A. Biega, A. Amarilli, F. M. Suchanek, IBEX: Harvesting Entities from the Web Using Unique Identifiers , in: WebDB workshop, 2015. [5] L. Chen, G. Varoquaux, F. M. Suchanek, A Lightweight Neural Model for Biomedical Entity Linking, in: AAAI, 2021. [6] F. M. Suchanek, The Need to Move Beyond Triples , in: Text2Story workshop, 2020. [7] P.-H. Paris, F. M. Suchanek, Non-named entities - the silent majority, in: ESWC short paper track, 2021. [8] P.-H. Paris, S. E. Aoud, F. M. Suchanek, The Vague- ness of Vagueness in Noun Phrases, in: AKBC short paper track, 2021. [9] C. Helwe, C. Clavel, F. M. Suchanek, Reasoning with Transformer-based Models: Deep Learning, but Shallow Reasoning, in: AKBC short paper track, 2021. [10] L. Galárraga, C. Teflioudi, K. Hose, F. M. Suchanek, AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases , in: WWW, 2013. [11] F. M. Suchanek, J. Lajus, A. Boschin, G. Weikum, Knowledge Representation and Rule Mining in Entity-Centric Knowledge Bases , in: RW invited paper, 2019. [12] L. Galárraga, S. Razniewski, A. Amarilli, F. M. Suchanek, Predicting Completeness in Knowledge Bases , in: WSDM, 2017. [13] J. Lajus, F. M. Suchanek, Are All People Married? Determining Obligatory Attributes in Knowledge Bases , in: WWW, 2018. [14] A. Soulet, A. Giacometti, B. Markhoff, F. M. Suchanek, Representativeness of Knowledge Bases with the Generalized Benford’s Law, in: ISWC, 2018. [15] T. Rebele, T. P. Tanon, F. M. Suchanek, Bash Dat- alog: Answering Datalog Queries with Unix Shell Commands, in: ISWC, 2018. [16] J. Romero, N. Preda, A. Amarilli, F. M. Suchanek, Equivalent Rewritings on Path Views with Binding Patterns, in: ESWC, 2020. [17] F. M. Suchanek, C. Menard, M. Bienvenu, C. Chapel- lier, Can you imagine... a language for combinato- rial creativity? , in: ISWC, 2016.