=Paper=
{{Paper
|id=Vol-1963/paper594
|storemode=property
|title=Predicting Human Associations with Graph Patterns Learned from Linked Data
|pdfUrl=https://ceur-ws.org/Vol-1963/paper594.pdf
|volume=Vol-1963
|authors=Jörn Hees,Rouven Bauer,Joachim Folz,Damian Borth,Andreas Dengel
|dblpUrl=https://dblp.org/rec/conf/semweb/HeesBFBD17
}}
==Predicting Human Associations with Graph Patterns Learned from Linked Data==
Predicting Human Associations with Graph Patterns Learned from Linked Data Jörn Hees1,2 , Rouven Bauer1,2 , Joachim Folz1,2 , Damian Borth1,2 , and Andreas Dengel1,2 1 Computer Science Department, University of Kaiserslautern, Germany 2 Knowledge Management Department, DFKI GmbH, Kaiserslautern, Germany {joern.hees,rouven.bauer,joachim.folz,damian.borth,andreas.dengel}@dfki.de Abstract The datasets provided by the Linked Data community cur- rently form the world’s largest, freely available, decentralised and inter- linked knowledge bases. However, to be able to benefit from this knowl- edge in a specific use-case, one typically needs to understand the mod- elling of the knowledge and formulate appropriate SPARQL queries. In order to ease this process, we developed an evolutionary algorithm that learns such SPARQL queries (graph patterns) for pairwise relations between source and target entities. Given a training list of source-target- pairs, our algorithm learns a predictive model, which given a new source entity predicts target entities analogously to the training examples. In this demo paper we present a high level overview over our graph pattern learner and show its application to simulate human associations (e.g., “fish - water”). In the demo users can choose a semantic entity (e.g., dbr:Fish) as stimulus and let the learned model predict human- like responses (e.g., dbr:Water). 1 Introduction In recent years, many large, machine accessible and interlinked RDF datasets have emerged from the Semantic Web [1] and its Linked Data [2] movement. The datasets are prominently depicted as the LOD Cloud3 and form the currently largest openly available representation of machine accessible knowledge. Due to its encyclopaedic nature DBpedia4 [3] has become one of the most interlinked and central datasets of the LOD Cloud. Despite the availability of all this knowledge, actually using it typically re- quires non-trivial up-front work: SPARQL queries need to be formulated to ex- tract relevant knowledge for the given use-case. Hence, we developed a graph pattern learning algorithm [4] that can help to learn such SPARQL queries. While several other systems exist that learn SPARQL queries (e.g., AutoSPARQL [7], kretr [8]), they typically focus on learn- ing a single query for a simple list of entities. Our algorithm differs from these 3 http://lod-cloud.net/ 4 http://dbpedia.org Endpoint Trained Model Application Phase (Demo) New Data Graph Patterns Prediction ?target ?score ?source dbr:Fishing 4.2 dbr:Fish Fusion Model dbr:Animal 2.1 ... ... Training Data Graph Pattern ?source ?target Learner dbr:Dog dbr:Cat dbr:Summer dbr:Winter Pattern Learner dbr:Paris dbr:France ... ... Fusion Training Training Phase SPARQL Endpoint Application Phase Trained Model Prediction New Data Graph Patterns ?target ?score dbr:Fishing 4.2 ?source dbr:Animal 2.1 dbr:Fish Fusion Model ... ... Figure 1: Graph Pattern Learner System Overview in two main aspects: (i) it learns an ensemble model that can (and will) consist of many queries and (ii) it doesn’t try to learn queries that reproduce a given list of entities, but it learns queries that represent a relation R between entity source-target-pairs (s, t) ∈ R. By learning queries for a relation between pairs of entities, the graph pattern learner can generate a predictive model, that given a new source entity s0 can predict targets {t0 |(s0 , t0 ) ∈ R}. In this demo paper we show one such predictive trained model that has been generated by our graph pattern learner as detailed in the following sections. 2 Graph Pattern Learner System Overview Figure 1 shows a high level system overview of the graph pattern learner. The system is divided into training and application phase and resembles a classic machine learning outline. The training phase mainly consists of the graph pat- tern learner that is given its training data in form of source-target-pairs and a SPARQL endpoint from which to learn patterns. For this demo, the training data consisted of 655 pairs of “semantic associ- ations” 5 [5], corresponding to ≈ 22500 human associations from the Edinburgh Associative Thesaurus (EAT) [6]. The SPARQL endpoint for this demo is a lo- cal LOD mirror loaded with ≈ 8G triples (amongst others: DBpedia 2015-10, Wikidata, Freebase, BabelNet, DBLP, YAGO Labels)6 . In all brevity7 , the graph pattern learner is an evolutionary algorithm which finds SPARQL queries that are good predictors: Each pattern consists of a SPARQL Basic Graph Pattern (BGP) which contains at least a ?source and a ?target variable. A pattern is good if it fulfils many of the training source- target-pairs (ASK queries where ?source and ?target are bound correspondingly) and it doesn’t generate much noise (SELECT ?target queries where ?source is bound only return few irrelevant values for ?target). Each of the patterns learned by the pattern learner is a predictor: We can execute a SPARQL SELECT ?target query in which we bind the ?source variable 5 Datasets available at http://w3id.org/associations/#datasets. 6 For further details, see set-up instructions at: https://joernhees.de/blog/2015/ 11/23/setting-up-a-linked-data-mirror 7 See [4] for details on the training and evaluation. (a) Stimulus auto-complete input-box (c) Filtered graph patterns highlighting (b) Fused Prediction Results those that generated the target dbr: Fishing Figure 2: Main Demo Components to an arbitrary source s0 (e.g. with a VALUES (?source) {(s0 )} block). This results in one list of target candidates per pattern that need to be ranked to yield the set of predicted targets. The fusion training component uses several strategies to generate (late) fusion models for this purpose. In order to fuse such resulting target-lists for a provided new source node, the graph pattern learner includes a fusion training component that generates late fusion machine learning models. The fusion models vary in complexity from ba- sic to full-fledged machine learning models themselves. For example, we provide basic target-occurrence ranking over all queries (called “target occurrences”) po- tentially normalised so that each pattern only has a total vote of 1 (called “preci- sions”). As full machine learning models, we provide amongst others KNN, SVM, Logistic Regression and RankSVM models estimating relevance based on target candidate vectors wrt. the generating queries. In the demo the user can switch between these fusion models with a simple drop-down as shown in Section 3. After training, the set of graph patterns and fusion model form a predictive trained model that is used in the application phase. Given a new source node (e.g. dbr:Fish) the trained model uses all learned graph patterns to issue SELECT ?target queries in which the ?source variable is bound to the new source node against the SPARQL endpoint, and fuses the individual target result lists into an overall ranked list of target predictions. 3 Demo The main screen of our online demo8 starts with the stimulus auto-complete input-box on top (Figure 2a) asking the user to enter a stimulus. The auto- 8 https://w3id.org/associations/gp_learner/demo/predict.html complete is realised via the Wikipedia OpenSearch API9 , allowing a fuzzy search for matching Wikipedia Articles, including the resolving of redirects. After selecting one of the Wikipedia articles from the auto-complete box, the URI is transformed to the corresponding DBpedia resource and the prediction started. The fused prediction results are then presented in the “Fused Prediction” tab (Figure 2b), in which the user can provide feedback about the generated targets (logged and used for future improvements). The user can also click the explain button to gain insight on why a target was predicted. All graph patterns that played a role in predicting this target will be highlighted and expanded in the “Graph Patterns” tab (Figure 2c). 4 Conclusion In this demo paper we presented a high level overview over the graph pattern learner and show its application to simulate human associations. The algorithm, used datasets and interactive visualisation of the results are available online. https://w3id.org/associations/gp_learner/demo/predict.html. This work was supported by the University of Kaiserslautern CS PhD scholarship program and the BMBF project MOM (Grant 01IW15002). References 1. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284(5), 34–43 (may 2001) 2. Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (jan 2009) 3. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3), 154–165 (sep 2009) 4. Hees, J., Bauer, R., Folz, J., Borth, D., Dengel, A.: An Evolutionary Algorithm to Learn SPARQL Queries for Source-Target-Pairs. In: Knowledge Engineering and Knowledge Management - EKAW 2016. vol. 10024, pp. 337–352. Springer LNCS, Bologna, Italy (nov 2016) 5. Hees, J., Bauer, R., Folz, J., Borth, D., Dengel, A.: Edinburgh associative thesaurus as RDF and DBpedia mapping. In: The Semantic Web - ESWC 2016 SE. vol. 9989 LNCS, pp. 17–20. Springer LNCS, Heraklion, Crete, Greece (may 2016) 6. Kiss, G., Armstrong, C., Milroy, R., Piper, J.: An associative thesaurus of English and its computer analysis. In: The Computer and Literary Studies, pp. 153–165. Edinburgh University Press, Edinburgh, UK (1973) 7. Lehmann, J., Bühmann, L.: AutoSPARQL: Let users query your knowledge base. In: The Semantic Web: Research and Applications - ESWC 2011. LNCS, vol. 6643, pp. 63–79. Springer, Heraklion, Crete, Greece (2011) 8. Potoniec, J.: An On-Line Learning to Query System. In: Proc. of the ISWC 2016 Posters & Demonstrations Track. vol. 1690. CEUR-WS.org, Kobe, Japan (2016) 9 https://www.mediawiki.org/wiki/API:Opensearch