Towards Leveraging Commonsense Knowledge for Autonomous Driving Sreyasi Nag Chowdhury1,2 , Ruwan Wickramarachchi3 , Mohamed H. Gad-Elrab1 , Daria Stepanova1 , and Cory Henson4 1 Bosch Center for Artificial Intelligence, Renningen, Germany 2 Max Plank Institute for Informatics, Germany 3 AI Institute, University of South Carolina, USA 4 Bosch Research and Technology Center, USA 1 {firstname.lastname}@de.bosch.com, 3 ruwan@email.sc.edu, 4 {firstname.lastname}@us.bosch.com Abstract. Rapid development of autonomous vehicles has enabled the collection of huge amounts of multimodal road traffic data resulting in large knowledge graphs for autonomous driving. These knowledge graphs typically storing factual statements like “scene 1 includes parking area”, have proved to be useful complements for computer vision models espe- cially for tasks like object prediction in a scene. However, they do not cap- ture inter-object commonsense relationships (e.g., “car is smaller than truck ” or “car accelerates faster than bicycle”), on which human drivers rely subconsciously when making decisions. Existing commonsense repos- itories target mainly general purpose domains, and their coverage with respect to driving scenarios is very limited, prohibiting a straightforward integration. To bridge this gap, we extend existing autonomous driving knowledge graphs with commonsense knowledge, and demonstrate em- pirically the benefits of such extension for two downstream applications: object prediction and explainable scene clustering. Keywords: Autonomous Driving, Commonsense Knowledge Graphs 1 Introduction Autonomous Driving (AD) has recently emerged as an exciting research area within AI owing to its vast application potential and interdisciplinary nature. The rapid development of AD has enabled the collection and analysis of huge amounts of multimodal road traffic data extracted from semi-structured re- sources or learned automatically from visual driving scenes. Such data enriched with schema led to the construction of knowledge graphs (KGs) for AD [15, 2, 16]. The existing knowledge repositories for AD contain millions of facts in the form of hsubject, relation, objecti triples describing entities occurring in AD scenes, e.g., hscene1 , includes, car i and mutual relations among them, hcar , located at, parkingAreai. KGs for AD naturally complement visual and decision making sys- tems and have been successfully exploited, in particular, via KG embeddings [14] in many applications, e.g., data search [2] and scene similarity [15]. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Nag Chowdhury, Wickramarachchi, Gad-Elrab, Stepanova, Henson However, current KGs for AD storing primarily factual data extracted from observations, e.g., hpedestrian, located at, crossingi are limited in terms of com- monsense assertions describing, among others, comparative properties between objects, for example, hcar , accelerates faster than, bicyclei. On the other hand, human drivers’ ability for seamless decision making is largely attributed to such commonsense knowledge (CSK). Clearly, CSK-enriched KGs could be very ben- eficial for supporting sophisticated decision-making in AD scenarios – e.g., the CSK facts hdirt road , has property, muddyi, hrain, causes, mud i, hmud ,causes, wheel jami can lead to a decision to avoid parking on dirt roads on rainy days or on cloudy days with a prediction of rain. Since to the best of our knowledge no KGs for AD that systematically incorporate commonsense relations exist, the goal of our work is to bridge this gap and enrich the knowledge graphs for autonomous driving with CSK assertions. The usage of CSK in AD scenarios can be divided into two sequential sub- problems: acquisition of relevant domain-specific CSK, and its integration into autonomous driving frameworks. The existing domain-agnostic commonsense KGs (e.g., ConceptNet [11], WebChild [13], Quasimodo [9], etc.), mainly con- tain assertions about everyday objects and scenes, and their fraction related specifically to autonomous driving is rather limited. In [8] a KG for smart city applications has been extended with parts of WebChild related exclusively to transportation, but the resulting dataset is of limited scope containing CSK re- lations over only 17 manually specified object classes. Gathering CSK for AD from annotated image captions or description data sets like MSCOCO [6], Con- ceptual Captions [10], Visual Genome [5], Visual Storytelling [3] etc. yields as- sertions with spatial relations (e.g., located near ); however, such resources are not directly useful for extracting other physical commonsense relations, (e.g., faster than). The integration of reasoning methods into AD applications has been recently targeted by, e.g., exploiting answer set programming for visual sense-making [12]. However, to the best of our knowledge no attempts of sys- tematic collection of CSK for AD and its integration exist. In this preliminary work, we address the limitations of existing KGs for AD by enriching them with commonsense facts from a wide range of state-of-the-art domain-agnostic CSK repositories. More specifically, the salient contributions of our work are summarized as follows1 : • We have enhanced the existing KG for AD [15] based on Pandaset2 by linking it to assertions from four popular domain agnostic CSK repositories: Concept- Net [11], WebChild [13], Quasimodo [9], and CSKG [4]. • We study the effect of KG enhancement with CSK for the task of scene entity prediction, following [15] as well as explainable scene clustering [1]. 2 Commonsense Knowledge for Autonomous Driving CSK Acquisition. Defining a domain is a challenging problem which we elude in this work. As a starting point, a set of 38 road entities identified in the 1 The dataset and results are available at https://github.com/boschresearch/ad cskg 2 https://scale.com/open-datasets/pandaset Towards Leveraging Commonsense Knowledge for Autonomous Driving Table 1: Examples of CSK triples for AD Table 2: Link Prediction Subject Relation Object HolE TransE MRR w/o CSK 0.9266 0.4133 Bus is more efficient than Car w/ CSK 0.9357 0.4353 Motorcycle are more popular than Bus Pedicab conceptually related to Bicycle Hits@1 w/o CSK 0.8729 0.2747 Standing located near Motorcycle w/ CSK 0.8874 0.3123 Pedestrian are more susceptible than Car Hits@3 w/o CSK 0.9796 0.4582 Walking related to Pedestrian w/ CSK 0.9846 0.4613 PandaSet dataset are considered as seeds. We then expand these entities with their synonyms from WordNet [7], considering only noun senses that contain the original entity. These constitute 36 additional seeds. The expanded set of entities (referred to as Features of Interest or FoIs) are then used to collect an AD-CSK slice from three existing CSK knowledge graphs – ConceptNet [11], WebChild [13], Quasimodo [9], as well as from a consolidated source named Commonsense Knowledge Graph (CSKG) [4]. CSKG integrates seven sources – ConceptNet, Visual Genome (a dataset of annotated images for visual CSK), ATOMIC (a procedural source), Wikidata, and three lexical sources, namely WordNet, Roget, and FrameNet. In total there are 98,738 CSK assertions (hsubject, relation, objecti triples) where either the subject or the object is a FoI, consisting of 7217 unique CSK relations. For experiments we work with a subset of these assertions where both the subject and the object are FoIs. This subset consists of ∼1000 assertions with ∼600 unique relations. Removal of Synonymous CSK Relations via Clustering. Since AD-CSK is collected from various sources, it naturally contains many synonymous relations, which need to be further canonicalized. We exploit the DB-scan algorithm to cluster semantically similar relations based on their cosine similarities calculated from their respective Word2Vec embeddings. The relation clusters are further manually refined. In this way, 440 out of 600 relations are grouped into 60 dominant relation clusters. Table 1 shows some example assertions from our curated AD-CSK knowledge graph. CSK Integration. Our proposed AD-CSK KG can then be integrated into existing KGs. For our experiments, we enrich the AD KG [15] constructed from the PandaSet consisting of 8343 scenes. Although our PandaSet-specific AD- CSK KG consists of only 1000 triples, at scene level each CSK relation connects millions of entities. E.g., the most frequent CSK relation (located near ) connects 8,589,486 entities adding as many edges in the CSK-enriched driving scene KG. 3 Results We evaluate the KG embeddings learned with and without the harvested CSK assertions on two downstream tasks – scene entity prediction, and explainable scene clustering. For both tasks, embeddings learned from the CSK-enriched KG outperform those learned from the KG without CSK. This is the first evidence of the benefits of CSK integration for AD applications. Nag Chowdhury, Wickramarachchi, Gad-Elrab, Stepanova, Henson Table 3: Quality of the explainable clusters learned using ExCut Clusters Quality Explanations Quality Silh. Width Cov@1 Exc@1 WRA@1 without CSK 0.0423 0.8539 0.2362 0.0157 with CSK 0.1103 0.8713 0.2991 0.0698 Table 4: Sample explainable clusters Cluster Explanation (without CSK) C1 e1 :< X, partOf , scene 123 > C2 e2 :< X, includes, Y >, < Y, type, Motorcycle > C3 e3 :< X, includes, PedestrianWithObject >, < X, includes, TrafficCone > Cluster Explanation (with CSK) C1 e1 :< X, includes, TrafficCone >, < X, includes, Pedestrian > C2 e2 :< X, includes ent slower than, Car > C3 e3 < X, includes ent located near , Walking >, < X, includes, VehicleWithoutRider > Scene Entity Prediction. An important application for AD is predicting miss- ing entities based on existing road entities. For example, if a ball rolls out in front of a car in a residential area, there is a high probability of children suddenly ap- pearing in the scene. In [15] this entity prediction task has been modified to a link prediction problem which aims to derive potentially missing links in a KG from learned embeddings. Table 2 shows the results on the PandaSet KG for two popular embedding models3 , TransE and HolE. One can observe that predicting links relying on the embeddings learned from the CSK-enhanced KG yields better results across all standard metrics for evaluating link prediction. Explainable Clustering. Intuitively, the addition of CSK to KGs for AD should provide better understanding of the scene context, which might not nec- essarily be visual. To assess this hypothesis, we adapt the explainable clustering approach ExCut [1], and examine whether the addition of CSK relations helps in clustering a given set of scenes based on their semantic similarity. More specifi- cally, we cluster a set of 824 driving scenes based on the KG embeddings com- puted using the HolE method, and learn explanations for the discovered clusters from the underlying AD KG with and without introduced CSK relations. Table 3 reports the quality of the clusters with and without CSK, which is measured using average silhouette width. In addition, we present the average quality for the top explanation assessed using cluster coverage (Cov), exclusive quality (Exc), and weighted average accuracy (WRA) as described in [1]. The CSK relations have a positive impact on the overall quality of the clusters and their explanations. In Table 4, we show samples of the explanations derived from the KG with and without CSK. One can observe that the explanations derived from the CSK-enriched KG contain more general relations among scene entities. 3 Available in Ampligraph library https://github.com/Accenture/AmpliGraph Towards Leveraging Commonsense Knowledge for Autonomous Driving 4 Conclusion This preliminary work aims at studying the role of commonsense knowledge (CSK) in autonomous driving (AD) applications. To this end, we have con- structed a domain-specific CSK knowledge graph (KG) and integrated it into an existing AD KG. To verify the usefulness of the constructed CSK-enriched KG, we exploit KG embeddings for two tasks – Scene Entity Prediction, which involves predicting the presence of entities in the scene that were not detected by computer vision tools, and Explainable Clustering, which clusters scenes based on their semantic similarity and generates explanations for the clusters. The pre- liminary results are promising, and demonstrate the importance of integrating CSK in AD applications. References 1. Gad-Elrab, M.H., Stepanova, D., Tran, T., Adel, H., Weikum, G.: Excut: Explain- able embedding-based clustering over knowledge graphs. In: ISWC (2020) 2. Henson, C., Schmid, S., Tran, A.T., Karatzoglou, A.: Using a knowledge graph of scenes to enable search of autonomous driving data. In: ISWC Satellites (2019) 3. Huang, T.K., Ferraro, F., Mostafazadeh, N., Misra, I., Agrawal, A., Devlin, J., Girshick, R.B., He, X., Kohli, P., Batra, D., Zitnick, C.L., Parikh, D., Vanderwende, L., Galley, M., Mitchell, M.: Visual storytelling. CoRR (2016) 4. Ilievski, F., Szekely, P.A., Cheng, J., et al: Consolidating commonsense knowledge. CoRR (2020) 5. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Kenji Hata, e.a.: Visual genome. Int. J. Comput. Vis. (2017) 6. Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: ECCV (2014) 7. Miller, G.A., Hristea, F.: Wordnet nouns: Classes and instances. Comput. Linguis- tics 32(1), 1–3 (2006) 8. Persaud, P., Varde, A.S., Robila, S.A.: Enhancing autonomous vehicles with com- monsense: Smart mobility in smart cities. In: ICTAI (2017) 9. Romero, J., Razniewski, S., Pal, K., Pan, J.Z., Sakhadeo, A., Weikum, G.: Com- monsense properties from query logs and qa forums. In: CIKM. ACM (2019) 10. Sharma, P., Ding, N., Goodman, S., et al: Conceptual captions: A cleaned, hyper- nymed, image alt-text dataset for automatic image captioning. In: ACL (2018) 11. Speer, R., Havasi, C.: Conceptnet 5: A large semantic network for relational knowl- edge. In: The People’s Web Meets NLP. Theory and Applications of NLP, Springer (2013) 12. Suchan, J., Bhatt, M., Varadarajan, S.: Commonsense visual sensemaking for au- tonomous driving. CoRR (2020) 13. Tandon, N., de Melo, G., Suchanek, F.M., Weikum, G.: Webchild: harvesting and organizing commonsense knowledge from the web. In: WSDM. ACM (2014) 14. Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: A survey of approaches and applications. IEEE Trans. Knowl. Data Eng. (2017) 15. Wickramarachchi, R., Henson, C.A., Sheth, A.P.: An evaluation of KG embeddings for AD data. In: AAAI Symposium: Combining ML with KE (1) (2020) 16. Zhao, L., Ichise, R., Mita, S., Sasaki, Y.: Core ontologies for safe autonomous driving. In: ISWC Posters & Demos (2015)