=Paper=
{{Paper
|id=Vol-2003/NeSy17_paper8
|storemode=property
|title=Learning with Knowledge Graphs
|pdfUrl=https://ceur-ws.org/Vol-2003/NeSy17_paper8.pdf
|volume=Vol-2003
|authors=Volker Tresp,Yunpu Ma,Stephan Baier
|dblpUrl=https://dblp.org/rec/conf/nesy/TrespMB17
}}
==Learning with Knowledge Graphs==
Learning with Knowledge Graphs Volker Tresp1,2 , Yunpu Ma1,2 , Stephan Baier2 1 Siemens AG, Corporate Technology, Munich, Germany 2 Ludwig-Maximilians-Universität München, Munich, Germany Abstract. In recent years a number of large-scale triple-oriented knowl- edge graphs have been generated. They are being used in research and in applications to support search, text understanding and question answer- ing. Knowledge graphs pose new challenges for machine learning, and research groups have developed novel statistical models that can be used to compress knowledge graphs, to derive implicit facts, and to detect errors in the knowledge graph. In this paper we decribe the concept of triple-oriented knowledge graphs and corresponding learning approaches. We also discuss episodic knowledge graphs which are able to represent temporal data; learning with episodic data can be the basis for decision support systems, e.g. in a clinical context. Finally we discuss how knowl- edge graphs can support perception, by mapping subsymbolic sensory inputs, such as images, to semantic triples. A particular feature of our approach would be that perception, episodic memory and semantic mem- ory are highly interconnected and that, in a cognitive interpretation, all rely on the same brain structures. 1 Semantic Knowledge Graphs A technical realization of a semantic memory is a knowledge graph (KG) which is a triple-oriented knowledge representation: A labelled link implies a (subject, predicate, object) statement where subject and object are entities that are rep- resented as the nodes in the graph and where the predicate labels the link from subject to object. Large KGs have been developed that support search, text un- derstanding and question answering [8]. A KG can be represented as a tensor which maps indices to true or false s, p, o 7→ Q with Q ∈ {T, F}, and where s ∈ 1, . . . , N and o ∈ 1, . . . , N are indices for the N entities used as subject and object, and where p ∈ 1, . . . , R is the index for the predicate. A statistical model for a KG can be obtained by a tensor model of the form s, p, o 7→ ae(s) , ap , ae(o) 7→ P. (1) Here e(s) and e(o) are the entities associated with subject and object, respec- tively. The indices are first mapped to their latent representations ae(s) , ap , ae(o) Copyright © 2017 for this paper by its authors. Copying permitted for private and academic purposes. which are then mapped to a probability P ∈ [0, 1]. P ((s, p, o) = T|ae(s) , ap , ae(o) ) represents the Bernoulli probability that the triple (s, p, o) is true, and, when normalized across all triples, P (s, p, o|ae(s) , ap , ae(o) ) stands for the categorical probability that the triple (s, p, o) is selected as an answer in a query process. A number of mathematical models have been developed for the mapping in Equa- tion 1 (see [7]). A representative example is the RESCAL model [6], which is a constraint Tucker2 tensor model. 2 Episodic Knowledge Graphs Whereas a semantic KG model reflects the state of the world, e.g, of a clinic and its patients, observations and actions describe factual knowledge about discrete events. Generalizing the semantic KG, an episodic KG can be represented as a 4-way tensor with time index t as the map s, p, o, t 7→ Q. A statistical model for a KG can be obtained by a 4-way tensor model of the form s, p, o, t 7→ ae(s) , ap , ae(o) , at 7→ P (2) where at is the latent representation for time index t. The basis for the tight link between different memory functions is the “unique representation hypothesis”, which states that an entity has a unique latent rep- resentation in a technical application, but maybe also in the human brain [9]. As discussed in [11, 5] both the episodic KG and the semantic KG might rely on the same representations, i.e., it was proposed that the semantic KG can be derived from the episodic KG by a marginalization operation. Thus an episodic fact might represent that “Jack, wasDiagnosed, Diabetes, on Jan 15”, the derived semantic fact might be “Jack, hasDisease, Diabetes”. In [3, 4] medical decision systems are described that combine semantic and episodic tensor representations of data with recurrent neural network predictive models. 3 Perception The tensor models permit generalization, i.e., the prediction of the probability of triples which were not known to be true in the data. This is especially important in perception, which we propose can be thought off as the mapping of subsym- bolic sensory inputs to a semantic description in the form of a set of triples, describing and explaining the sensory inputs. These triples then becomes part of episodic memory. Let ut,1 , . . . , ut,c , . . . , ut,C be the content of the sensory buffers at time t. We propose that this sensory input can predict the latent representation for time in the form of a map ut,1 , . . . , ut,c , . . . , ut,C 7→ at . 2 This map at (ut , w) might be modelled by a deep neural network with weights w. Perceptual decoding then produces likely triples from the probability distri- bution (generalized nonlinear model) using P (s, p, o; ae(s) , ap , ae(o) , at (ut , w)). An episodic memory would simply store at , and memorizing simply means the restoring of a past at , which then can be decoded as described [9, 10]. A semantic memory uses the marginalizing approach describes in Section 2. As another approach, there is the option to use P (s, p, o) or P (s, p, o, t) as a semantic prior in sensory decoding. This was the basis for approaches to extract triples from Web sources [2] and for the extraction of triples from images [1]. References [1] Stephan Baier, Yunpu Ma, and Volker Tresp. Improving visual relationship de- tection using semantic modeling of scene descriptions. In ISWC, 2017. [2] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Mur- phy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion. In KDD, 2014. [3] Cristóbal Esteban, Danilo Schmidt, Denis Krompaß, and Volker Tresp. Predicting sequences of clinical events by using a personalized temporal latent embedding model. In Healthcare Informatics (ICHI), 2015 International Conference on, 2015. [4] Yinchong Jang, Volker Tresp, and Peter Fasching. Predictive modeling of therapy decisions in metastatic breast cancer with recurrent neural network encoder and multinomial hierarchical regression decoder. In ICHI, 2017. [5] Yunpu Ma, Volker Tresp, and Erik Daxberger. Embedding models for episodic memory. In submitted, 2017. [6] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. A Three-Way Model for Collective Learning. In ICML, 2011. [7] Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. A review of relational machine learning for knowledge graphs: From multi-relational link prediction to automated knowledge graph construction. Proceedings of the IEEE, 2015. [8] Amit Singhal. Introducing the Knowledge Graph: things, not strings. Official Google Blog, 2012. [9] Volker Tresp, Cristóbal Esteban, Yinchong Yang, Stephan Baier, and Denis Krompaß. Learning with memory embeddings. NIPS 2015 Workshop (extended TR); arXiv:1511.07972, 2015. [10] Volker Tresp, Yunpu Ma, and Stephan Baier. Tensor memories. In CCN, 2017. [11] Volker Tresp, Yunpu Ma, Stephan Baier, and Yinchong Yang. Embedding learning for declarative memories. In ESWC, 2017. 3