Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes Krzysztof Janowicz, Bo Yan, Blake Regalia, Rui Zhu, and Gengchen Mai STKO Lab, University of California, Santa Barbara, US {janowicz,boyan,blake,ruizhu,gengchen mai}@geog.ucsb.edu Abstract. Bias, may it be in sampling or judgment, is not a new topic. However, with the increasing usage of data and models trained from them in almost all areas of everyday life, the topic rapidly gains relevance to the broad public. Even more, the opportunistic reuse of data (traces) that characterizes today’s data science calls for new ways to understand and mitigate the effects of biases. Here, we discuss biases in the context of Linked Data, ontologies, and reasoning services and point to the need for both technical and social solutions. We believe that debiasing knowledge graphs will become a pressing issue as these graphs enter everyday life rapidly. This is a provocative topic, not only from a technical perspective but because it will force us as a Semantic Web community to discuss whether we want to debias in the first place and who gets a say in how to do so. Keywords: Bias, Knowledge Graphs, Machine Learning, Ontologies 1 Introduction and Motivation The bias-variance dilemma describes the struggle of trying to minimize the two main sources of error that plague machine learning algorithms in their attempt to generalize beyond their training data. While this trade-off relation is well known, it has only recently reached broad public attention due to the mainstream adoption of machine learning methods in everyday electronics and spectacular failures of their learned models such as categorizing black people as gorillas or auto-suggesting to kill all Jews in typeahead search. While these cases generate the most public outcry, a different type of error may have subtle but more serious long-term consequences, namely representa- tional bias such as Google’s image search for “CEO” depicting mostly males. Even if this might still reflect today’s reality and history, we would not want an intelligent system to learn this and consequently, for instance, recommend doctor as a career choice for men and nurse for women [1]. These kinds of biases are not merely a technological challenge; they are also a social issue. There are ongoing discussions within the machine learning community about how to address representational biases (among other kinds), but these discus- sions have not yet reached the Knowledge Graph and Semantic Web communities despite representational issues being at their core. We believe that the hetero- geneity of the Linked Data cloud, i.e., its decentralized nature and contributions 2 K Janowicz et al. from many sources and cultures, can offer protections against biases to a certain degree, but this will come at a cost of increased variances. Biases in knowledge graphs (KGs), as well as potential means to address them, differ from those in linguistic models or image classification. Instead of learning the meaning of a term by observing the context in which it arises within a large corpus or classifying buildings from millions of labeled images, KGs are sparse in the sense that only a small number of triples are available per entity. These triples are statements about the world, not usage patterns. Finally, it’s important to recognize the term bias has different meanings across communities, implying that models can be unbiased in a machine learning sense yet simulta- neously show substantial bias in a cultural context, e.g., when the training data do not reflect evolving social consensus; we address the latter kind of bias here. In this vision paper, we describe biases from three different perspectives, namely those arising from the available data, those embedded in ontologies (i.e., the schema level), and finally those that are a result of drawing inferences. We illustrate each type with a small experiment. Fig. 1. Coverage of DBpedia (en) in contrast to population density. 2 Data Bias The sheer size of the Linked Data cloud may lead to the impression that it is safe from selection biases such as sampling bias. However, the data available to both open and commercial knowledge graphs today is, in fact, highly biased. Coverage serves as an illustrative example of the underlying problem. Figure 1 plots more than one million geographically-located entities from DBpedia (en) in red. These entities, most of them being places such as Ford’s Theater, link actors such as Abraham Lincoln, to events such as his assassination, to objects such as Debiasing Knowledge Graphs 3 Booth’s weapon. Consequently, the global coverage of places reveals much about the underlying distribution of non-spatial data as well. The map also shows Landscan-based estimates of population density. As can be seen, coverage is not uniform. For Europe, Japan, Australia, and the US, marker density strongly correlates with population density, whereas this trend breaks for large parts of Asia, Africa, and South America. At first, one may expect that these results are reflections of using the English DBpedia version, however, the resulting pattern largely remains the same when comparing other language versions and data sources such as GeoNames, social media postings, or even government data [7]. Put more provocatively, we know a lot about the western world and we do so from a western perspective. Even more, most of what we know about other re- gions and cultures comes from this same western perspective. This is an artifact of history, differences in cultures of data sharing, availability of free governmen- tal data, financial resources, and so forth. The effects are similar to what has been discussed in the machine learning community with respect to biases in word embeddings or in image search and tagging. This is not a minor issue. It translates into biased knowledge graph embeddings that increase dissimilarity to those less prototypical cases, it influences recommender system and question answering, and it learns rules from biased training data. As a substantial part of the Linked Data technology stack is based on open standards and is available as open source implementations, we hope that a more diverse set of contributors will form around it, thereby making the Linked Data cloud more robust to biases, financial incentives, and so forth. 3 Schema Bias In addition to data bias, knowledge graphs encounter another type of bias at the schema/ontologies level. Most ontologies are developed in a top-down man- ner with application needs in mind, or in case of top-level ontologies, certain philosophical stances. They are typically defined by a group of engineers in col- laboration with domain experts, and thus also implicitly reflect the worldviews and biases of the development team. Such ontologies will likely contain most of the well-known human biases and heuristics such as anthropocentric thinking. The increasing use of bottom-up techniques such as machine learning to derive axioms/rules from data will not mitigate these problems as the resulting models will fall victim to the data biases discussed before. In addition, ontologies may be affected by other biases such as so-called encoding bias [4]. Anthropocentric thinking and application needs are, for instance, at play in many of the vocabularies used to describe Points Of Interest. They typically contain dozens or even hundreds of classes for various sub-classes of restaurants, bars, and music venues, but only a handful of classes for natural features such as rivers. Consider, for example, the Place branch of the DBpedia ontology (2015); although it contains 168 classes, the average in- and out-degree of each class is only 0.994 when the class hierarchy is visualized as a network. The average path length is 2.18, suggesting that it is relatively shallow and flat. By running 4 K Janowicz et al. PageRank with a teleport probability of 0.85, more than 75% of the classes have a near zero PageRank score, meaning that much of its expressivity is occupied by less than 25% of classes. Simply put, the relative importance of classes is unevenly distributed in the ontology despite its general purpose characteristics. It is worth noting that many biases are not directly encoded in the ontology but only become visible when comparing multiple ontologies together with their respective datasets. For example, DBpedia, GeoNames, and the Getty Thesaurus of Geographic Names (TGN) all contain a Theater class. From a data-driven per- spective, one may assume that computing (spatial) statistics for all members of this class, such as intensity, interaction, and point patterns, would yield similar results across datasets [8]. However, this is not the case and those indicators will show very distinct patterns. The reason for this is that GeoNames aims at con- taining all currently existing theaters, DBpeda contains culturally/historically relevant theaters, and TGN contains those that are significant for works of art. Put differently, the dissimilarity in the extension of these classes is an expression of the implicit biases across the classes despite their common name. We believe that the multitude of ontologies developed for the Semantic Web are a strength rather than a weakness as their diversity may help to mitigate some of the issues outlined above. 4 Inferential Bias Another potential source of bias arises at the inferencing level, such as reasoning, querying, or rule learning. To start with a simple but easily overlooked example, the results of a SPARQL query depend on the entailment regimes (e.g., simple vs. RDFS entailment). However, the configuration of a particular query endpoint is outside the control of a data creator or ontology engineer and thus one may find that multiple endpoints utilizing the same ontologies and datasets, e.g., local copies of DBpedia, yield different results for the same SPARQL query. Here we focus on another aspect, namely, the relation of learning a (cor- rect) model that collides with social consensus. Consider for example, the use of association rule mining to infer new rules from knowledge graphs[2]. For our experiment we extracted all popes, US 5-star generals, and US presidents from DBpedia. These entities have one aspect in common: they are all male1 . Running AMIE+ over this graph results in numerous rules and the following 3 have very high confidence scores due to a lack of negative examples: (1) if X is a pope, X is male; (2) if X is a US 5-star general, X is male; and (3) if X is a US president, X is male. While these rules may be perceived as controversial, they are all correct. The provocative point we are trying to make here is that these rules are correct for different reasons. The first case is true by definition of the concept Pope; hence, the learned rule is suitable for its application. The second case is an enumerated class as US 5-star general is a historical military rank no longer in use. Hence, while the rank is not exclusive to men, the rule still applies 1 A triple we had to add to our graph for this example as it is not present in DBpedia Debiasing Knowledge Graphs 5 to all cases despite not being useful since it will not generate new triples. The third case is most controversial as it applies to the entire training data but does not align with social consensus. We do not want KG-based question answering systems to suggest to users that only men can become US presidents in a similar fashion to today’s systems recommending women to become nurses. 5 Conclusions In this vision paper, we highlighted the need to establish debiasing knowledge graphs as a novel research theme for the Semantic Web / Knowledge Graph com- munity that differs from current mainstream research. We highlighted multiple sources of bias using small experiments. We believe that the topic is provoca- tive as it walks the fine line between social responsibility and censorship. Risk mainly arises from the fact that debiasing itself is not a neutral task but based on social norms that may differ by countries. Will we develop methods that can be used for censorship and manipulation? As far as the time horizon is con- cerned, we believe that this will become an equally pressing issue for the SW community as it is currently in machine learning and that it should be openly addressed in workshops or panels. Finally, the topic may also be approached from a Web Science [5] perspective as well as by considering the interplay of trust and provenance [6, 3]. References 1. Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to com- puter programmer as woman is to homemaker? debiasing word embeddings. In: Advances in Neural Information Processing Systems. pp. 4349–4357 (2016) 2. Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.: Amie: association rule mining under incomplete evidence in ontological knowledge bases. In: Proceedings of the 22nd international conference on World Wide Web. pp. 413–422. ACM (2013) 3. Golbeck, J., Parsia, B., Hendler, J.: Trust networks on the semantic web. In: In- ternational Workshop on Cooperative Information Agents. pp. 238–249. Springer (2003) 4. Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing? International journal of human-computer studies 43(5-6), 907–928 (1995) 5. Hendler, J., Shadbolt, N., Hall, W., Berners-Lee, T., Weitzner, D.: Web science: an interdisciplinary approach to understanding the web. Communications of the ACM 51(7), 60–69 (2008) 6. Janowicz, K.: Trust and provenance you cant have one without the other. Tech. rep., Technical Report, Institute for Geoinformatics, University of Muenster, Germany (2009) 7. Janowicz, K., Hu, Y., McKenzie, G., Gao, S., Regalia, B., Mai, G., Zhu, R., Adams, B., Taylor, K.: Moon landing or safari? a study of systematic errors and their causes in geographic linked data. In: GIScience 2016. pp. 275–290. Springer (2016) 8. Zhu, R., Hu, Y., Janowicz, K., McKenzie, G.: Spatial signatures for geographic feature types: Examining gazetteer ontologies using spatial statistics. Transactions in GIS 20(3), 333–355 (2016)