Introduction

Constructing demand-driven Wikidata Subsets?

0 Alexander University Erlangen-Nurnberg , Nurnberg , Germany 1 Fraunhofer Institute for Integrated Circuits IIS , Nurnberg , Germany

A Wikidata subset is a selected subset from the set of all triples included in Wikidata. We approach the task to create a subset with a demand-based subset that satis es a present use case. The introduced algorithm constructs triples in multiple steps starting from a seed of URIs. Di erent input options for the seed, the sequence of construction steps, and a lter enable adaptations to the use case. A formal description and matching SPARQL queries complement the algorithm. Hospital data provides a running example.

Wikidata Subset Subgraph Neighbourhood

Introduction

to a smaller demand-driven subset brings multiple advantages. These include faster results (answers) to queries (questions) of machines (humans), an easier overview of available information, and the possibility of copies on regular-sized machines. Beghaeiraveri et al. [ 2 ] and Mimouni et al. [ 8 ] list extended advantages of subsets/subgraphs.

Challenges in building a subset include the volume of Wikidata's data and the resulting possibilities for subsets. A total number of triples N results in 2N possible subsets, as any number up to N triples can be included in the subset in any variation. The task is to nd a subset out of these based on a use case. As it is not feasible to create all subsets and choose afterwards, the task changes to nd a starting point (the seed) in the data and from there draw a line separating included and excluded data. The seed thus would be any set of Wikidata items, including none and all.

The subset must t the demands of a use case. The subset mustn't exclude relevant data to not miss any information which may lead to wrong conclusions. The subset also mustn't include unnecessary information to reduce computation costs when working with the subset and to increase the clarity of its content. In summary, we look for a minimal subset that still covers our demand (a minimal full-covering subset).

Another challenge is how to describe the need of the use case for the subset. We need enough detail to ful ll the criteria of the last paragraph while o ering a feasible expression. Furthermore, the algorithm that creates the subset has to process the expression in a way that the subset ful lls it.

Currently available subsets of Wikidata don't contain domain data our use cases demand. Existing approaches to subsetting don't o er the depth of subsets we need for our use cases. Therefore, a new solution for demand-driven Wikidata subsets is required.

The main contribution of this paper is an algorithm to construct a demanddriven Wikidata subset including formal descriptions (see section 6.1). The necessary steps are explained using a running example of hospital data (see section 3). Additionally, we show how the SPARQL query language5 can be used to implement the steps (see section 6). 2

Related Work

Wikidata has a webpage6 on subsetting that collects e orts and ideas on the subject. The content focuses on stability and reusability of subsets, a focus we don't share (see section 4). 5 See https://www.w3.org/TR/sparql11-query/ (Last accessed 29 July 2021). 6 Available at https://www.wikidata.org/wiki/Wikidata:WikiProject Schemas/Subsetting (Last accessed 29 July 2021).

Downloadable subsets from Wikidata itself exist in the form of dumps7 but aren't domain-speci c. Instead, Wikidata refers to WDumper8 for custom dumps that can be domain-speci c.

WDumper is a service that provides a subset of Wikidata according to various lters. However, it doesn't support variables as SPARQL queries do for example. Therefore, we can only query the direct neighbourhood of known Wikidata items selected by lters. This doesn't comply the demand of our use cases as we also look for data we are not yet aware of and thus can't select with lters. An evaluation of WDumper by Beghaeiraveri et al. con rms that the service can be used in some use cases but also has limitations [ 2 ].

DBpedia is similar to Wikidata and o ers various download options9 but neither of them is domain-speci c.

The Concise Bounded Description (CBD) of a resource "is a subgraph consisting of those statements which together constitute a focused body of knowledge about the resource" [ 10 ]. We pursue the concept of a focused body of knowledge, but not regarding a single resource but rather a set of resources of a domain. This includes triples in a exible depth from the starting node.

Beghaeiraveri et al. introduce the Topical Subset, "a set of entities related to a particular topic and the relationships between them" [ 2 ]. Similarly to the subsets we want to create, a Topical Subset includes data in the neighbourhood of a selected seed with the focus on a domain. However, a Topical Subset doesn't include triples in a exible depth around the seed and thus doesn't construct a subset around the seed but rather gathers data about the seed. Therefore, the subsets our algorithm creates includes Topical Subsets but not vice versa.

Mimouni et al. present an algorithm to build a Context Graph [ 8 ]. Their algorithm has the same basic structure as our algorithm shown in section 6.1, nevertheless di erences exist. The Context Graph algorithm has the neighborhood depth as an input parameter while we propose a more exible construction tuple. Furthermore, their algorithm lters entities provided in the input while our approach includes a more general lter function that may exclude entities, properties, and classes that are discovered by the algorithm at runtime. These di erences enable our algorithm to be adjustable to the demands of a use case.

A research area with similarities to our approach to Wikidata subsets is focused Web crawlers. Olston and Najork describe the basic algorithm: "Given a set of seed Uniform Resource Locators (URLs), a crawler downloads all the Web pages addressed by the URLs, extracts the hyperlinks contained in the pages, and iteratively downloads the Web pages addressed by these hyperlinks." [9, p. 178]. Our approach starts with a set of seed Uniform Resource Identiers (URIs), downloads all Wikidata triples containing the URIs, extracts newly found URIs, and iteratively downloads the triples containing these URIs. Focused Web crawlers look to minimize the number of accessed Web pages and 7 Available at https://www.wikidata.org/wiki/Wikidata:Database download (Last accessed 29 July 2021). 8 Available at https://wdumps.toolforge.org/ (Last accessed 29 July 2021). 9 Available at https://www.dbpedia.org/resources/ (Last accessed 29 July 2021). maximize the percentage of relevant pages among those [ 9, 1 ]. We share these goals for subsets as we want minimal but relevant data. To accomplish this, focused Web crawlers lter and rate every URL using various metrics and then add them to a set of yet to be downloaded URLs called frontier F . Each iteration, the top-rated URL in F is downloaded and processed which again adds new URLs to F [ 9, 4 ]. We use lters to exclude URIs we found. However, we don't iterate over single URLs in F but rather the complete set Fi which results in a completely new set of URIs Fi+1 for the next iteration. Therefore, we also don't rate the URIs.

Another related research area is link traversal query execution. The main idea is to follow URIs found when querying distributed RDF data while executing said query [ 6, 7 ]. Similar to focused Web crawlers and our approach, a frontier of to be processed URIs exists. 3

Running Example

We use the graph in gure 1 as an example of a Wikidata subset throughout this paper. It shows simpli ed data from Wikidata on the hospital Charite. The rounded vertices represent Wikidata items (e.g. Q162684) that we present by their (sometimes shortened) labels (e.g. Charite) instead of their URIs (e.g. https://www.wikidata.org/wiki/Q162684). We use labels to increase readability, the real identi er of an item remains its URI which the original triples contain. The angular vertices represent literals that are concrete data values (e.g. 1958). Charite is the seed of the example subset. The displayed subset shows some concepts of this paper: From the seed, inbound and outbound triples in varying distances contain data relevant for the use case and are thus included in the subset. When we iterate to construct the subset, we have to keep an eye on already visited URIs because multiple paths may exist from the seed to an URI (see Berlin in g. 1). Not all data in the adjacency of the seed is relevant to the use case (see (CBF namedAfter BF ) in g. 1). Therefore, we lter the triples before we add them to the subset. 4

Approaches

Two general approaches come to mind when thinking about how and why to create a Wikidata subset: (1) stable de ned subsets, (2) subsets by acute demand. We shortly discuss both.

(1) An organization or group creates stable de ned subsets and provides them to the public, maybe on the Wikidata website. Advantages include: The user of the subset { whoever that might be { does not necessarily need to know how to build a subset or be familiar with Wikidata content, as the creation is done and the download easy. The subset can also be used to normalize the contained Wikidata classes by analyzing instances and developing a recommendation for minimal properties. Disadvantages include: The available subsets don't comply the demand of individual use cases. Open questions include: Must all data belong to a subset? Must data belong to only one subset? What is a fair number of subsets considering relevance and minimalism? E ort: Large onetime e ort to create all subsets (ignoring updates).

(2) An algorithm that creates a subset by acute demand is available. Every time a subset is required, it is created as needed. Advantages include: With the demand known, the subset can be minimal. Additionally, the subsets are always relevant because they only exist if there is an acute demand. Disadvantages include: The user has to create a subset himself. Therefore they needs time as well as knowledge of their demand, the domain of the subset, and Wikidata. Open questions include: How to express the demand of a use case? How to t the subset to the demand? E ort: Pay as you use (after a tool is available).

When considering the use cases from our project work, the second approach is better tting. In this way, we receive useful (minimal and full-covering) results in a quicker fashion. Reusability of created subsets is thus not the focus of our work. 5

Preliminaries

Let G be a set of RDF triples T , U be the set of URIs, B be the set of blank nodes, and L be the set of literals. Each triple T = (s; p; o) 2 (U [ B) U (U [ B [ L) consists of a subject s, predicate p and object o according to the RDF speci cation10. The subject s 2 (U [ B) may be an URI or blank node. The predicate p 2 U is always an URI. The object o 2 (U [ B [ L) may be an URI, a blank node, or a literal. G also represents a labeled directed multigraph, but we continue the view as a triple set in this text. 10 See https://www.w3.org/TR/rdf11-concepts/ (Last accessed 30 July 2021).

Let W G be the set of all triples in Wikidata. Let S subset. Let Q U be the set of all Wikidata items. Let F URIs that are part of the frontier in an algorithm. W be a Wikidata Q be the set of

Solution Concept

In RDF, data is described through its relations. Therefore, all triples containing a certain URI describe the semantics of that URI. Additionally, data of interest for one domain is usually connected. Consequently, selected major URIs of interest and their surroundings build domain data. This ts our goal of a demand-driven subset as the demand usually covers one domain. The surroundings of a seed of URIs can be constructed through multiple iterations over adjacent URIs (neighbours) with increasing distance to the seed. In summary, a seed of selected URIs and data connected to it in varying distance in the form of triples build a subset (see g. 1). Di erent options regarding the seed, the neighbourhood, and lters enable us to adjust the subset to the demand of a use case as shown in the following sections. 6.1

Basic Algorithm

We de ne the function that constructs a Wikidata subset S as

S = subset(W; C; seed(); f ilter()) (1) with the set of all triples in Wikidata W , the construction tuple C = (c0; :::; cn), the seed function, and the f ilter function as inputs. The basic algorithm is shown in the following. 1. Select a set of URIs F0 Q as the seed (F0 = seed(W )). 2. For each construction step ci 2 C, (a) get all triples Si W that contain an URI f 2 Fi of the frontier Fi according to the construction step ci (Si = construct(W; Fi; ci)). (b) lter the triples from set Si to a set Si0 Si (Si0 = f ilter(Si)). (c) add Si0 to the desired subset S W (S = S [ Si0). (d) add Fi to the set of visited URIs V (V = V [ Fi). (e) if ci+1 exists: get the set of URIs Fi+1 Q that are adjacent to the URIs

Fi Q according to the construction step ci (Fi+1 = f rontier(Si0; ci; Fi; V )). 3. Export the complete subset S.

How these steps look in detail and combine is the focus of the following sections. 6.2

Seed Considerations

The seed acts as a starting point for the subset construction. It consists of a set of URIs F0 Q. F0 will be the frontier of the rst construction step that gets triples from Wikidata that contain an URI u 2 F0. The function F0 = seed(W ) is provided as an input to the algorithm to enable maximal exibility.

One possibility to express seed() is a SPARQL SELECT query that runs on the Wikidata set W and returns F0. A SPARQL query provides suitable possibilities to de ne the seed. In the example of gure 1, the SPARQL query for German hospitals shown in listing 1.1 returns Charite 2 F0 as one URI of the seed.

Listing 1.1: SPARQL query for German hospitals that returns the seed F0 SELECT ? h o s p i t a l FROM W WHERE f ? h o s p i t a l wdt : P31 wd : Q16917 ; #i n s t a n c e O f h o s p i t a l wdt : P17 wd : Q183 . #c o u n t r y Germany g A basic possibility would be to provide F0 in seed() by hand. This is a feasible option if only a few known URIs are relevant for the use case. In our example, the domain could not include all German hospitals but rather only the Charite, so F0 = fChariteg. 6.3

Construction/Triple Considerations

In terms of RDF triples, we de ne the neighbourhood of a URI u in a set of triples G as the subset NGRDF (u) G containing all triples T (u; p; o) 2 G and T (s; p; u) 2 G that have u as subject or object. In these triples, all URIs adjacent to the URI u in G are included. The neighbourhood NGRDF (U ) of a set of URIs U de nes the union of the neighbourhood of each URI u 2 U . An example from gure 1 is provided at the end of this section.

Because triples build a directed graph, we can further distinguish adjacent URIs of an URI u between predecessors and successors. The triples that contain u as object and a predecessor are inbound triples. They provide "secondary knowledge" about u [ 10 ]. The triples that contain u as subject and a successor are outbound triples. They provide "primary knowledge" about u [ 10 ]. We de ne the subset of inbound triples of a URI u from a set of triples G as the subset NG RDF (u) G containing all triples T (s; p; u) 2 G that have u as object. We de ne the subset of outbound triples of a URI u from a set of triples G as the subset NG+RDF (u) G containing all triples T (u; p; o) 2 G that have u as subject. The sets of inbound triples NG RDF (U ) G and outbound triples NG+RDF (U ) G of a set of URIs U de ne as the union of the appropriate sets of each URI u 2 U .

Over multiple iterations, we look at the neighbourhood of the given frontier Fi to construct the subset S. The input of the subset must express a sequence of construction steps adequately. Let C = (c0; c1; :::; cn) be a tuple with ci 2 f$; ; !g that represents the sequence of construction steps. For every iteration i, the expansion ci is executed on the frontier Fi resulting in the partial subset Si W that adds to the complete subset S. The construction step ci may construct the whole neighbourhood ($), the set of inbound triples ( ), or the set of outbound triples (!). The size n of the tuple C expresses the number of iterations and the maximum depth of triples around the seed.

The dedicated function Si = construct(W; Fi; ci) speci es to: (2) Regarding an implementation, several options to access and query Wikidata exist11. Available are per-item access with dereferenceable URIs, a SPARQL endpoint, and an endpoint for Linked Data Fragments. We won't evaluate these options as the implementation is not a focus of this paper. However, to give another expression of the constructed triples, the listings 1.2, and 1.3 show the SPARQL CONSTRUCT queries to construct the inbound triples (ci = ) and the outbound triples (ci = !) respectively. Both queries together construct the neighbourhood (ci = $).

Listing 1.2: SPARQL query that constructs the inbound triples of an URI u CONSTRUCT WHERE f GRAPH W f ? s ?p $u g . g

CONSTRUCT

WHERE f GRAPH W f $u ?p ? o g . g Listing 1.3: SPARQL query that constructs the outbound triples of an URI u In the example of gure 1, the rst construction step c0 = $ returned the inbound and outbound triples of the seed F0 = fChariteg. Therefore, the rst partial subset is S0 = NWRDF (Charite) and returns the triples: {(KfP partOf Charite), (KfN partOf Charite), (Charite subsidiary CBF), (Charite locatedIn Berlin), (Charite subsidiary VK)} 6.4

Filter Considerations

When we construct the neighbourhood of a URI, we receive all triples connected with that URI. This data quickly accumulates to a sizable subset. However, one of our goals for the subset is a smaller size. Additionally, we look for a minimal and full-covering subset. Thus, the algorithm has a lter function that can exclude triples that don't contain relevant information. It reduces a subset Si to a ltered subset Si0 Si. 11 See https://www.wikidata.org/wiki/Wikidata:Data access (Last accessed 2 August 2021).

Because there are plenty of options to lter triples and the focus of lters varies between di erent use cases, the lter function Si0 = f ilter(Si) is provided as an input to the algorithm to enable maximal exibility.

Some basic lter options to exclude triples are language-tagged literals for selected languages and triples using wikibase:Statements as they model duplicate information. Further possibilities are lters by Wikidata's property hierarchy or a lter for designated classes.

In this paper, however, we propose the idea of a lter for rarely used properties. We argue that triples with rarely used properties provide relatively little value for later analysis (like the namedAfter property in g. 1). If we analyse hundreds or thousands of instances of one class, we can exclude properties that are only used by one or two instances. This may not sound much, but a few properties ltered for many URIs add up. Additionally, the lter function also reduces data volume in future iterations because every removed triple also means one less adjacent URI for the next construction, so the frontier Fi+1 shrinks (see section 6.5).

Let Ap+x Si be the set of triples Ap+x = fT = (u; px; o) j T 2 Si and u 2 Fig with the URIs u 2 Fi and a property px. Let Apx Si be the set of triples Apx = fT = (s; px; u) j T 2 Si and u 2 Fig with the URIs u 2 Fi and a property px. Let (px) = jApxj be the fraction of URIs with the property px jF j from all URIs in the frontier F . Let 0 be the threshold value any property px must reach ( (px) 0) so that Apx is not removed from Si in order to build Si0. For the construction of inbound triples (ci = ), only Apx is relevant, as jAp+xj = 0. For the construction of outbound triples (ci = !), only Ap+x is relevant, as jApxj = 0. For the construction of the neighbourhood (ci = $), Ap+x and Apx are relevant and calculated separately.

The seed F0 of a subset probably contains Wikidata items (URIs) of only a few classes. URIs of the same class likely have similar properties, so (px) tends to be relatively large for most properties. However, when we iteratively construct the neighbourhood, the classes of the URIs u 2 Fi will quickly di erentiate and (px) decrease. Therefore, a constant value for 0 might not be the best solution. Instead, there are arguments to both decrease and increase it with each construction step. We might decrease i to not exclude too many triples that could provide relevant information. We might also increase i to remove even more triples that could provide irrelevant information. Both can be done in a additive ( i+1 = i ) or multiplicative ( i+1 = i ) way. The decision to increase or decrease i is in its consequence a decision for a more dense, maybe minimal subset or a full-covering subset that doesn't want to miss out relevant data. The values 0 and are set by input parameters of the lter function. It is also possible to select 0 = = 0 to disable the property lter.

With the proposed lter for rare properties, the function Si0 = f ilter(Si) speci es to:

Si0 = Si n fT = (s; px; o) j (px) < i( 0; )g (3) 6.5

Frontier Considerations

With the data that describes the URIs of the frontier Fi collected and ltered (subset Si0), the next construction step is due. Therefore, we need to select a new set of URIs Fi+1. Because we build a subset from a seed through the construction of the surroundings of the seed, the new frontier Fi+1 consists of URIs found in the last construction step. These new URIs are adjacent to the last frontier Fi and can therefore be described using the set of adjacent URIs NSi0 (Fi) from graph theory [ 5, 3 ].

However, not all adjacent URIs are proper for Fi+1. Let's say we expand through the complete neighbourhood. In the example of gure 1, we reach the URI Berlin with the rst construction step. Therefore, the URI is part of the following frontier (Berlin 2 Fi+1). The URI CBF is also included in that frontier (CBF 2 Fi+1). After another construction step, we reach Berlin again in the outbound triple from CBF. This would mean that we process the neighbourhood of Berlin twice. Thus, we must only select unvisited URIs for the new set of URIs Fi+1. The set of visited URIs V collects all newly visited URIs after every construction step. V can then be excluded from Fi+1. In consequence, the sets Fi and Fi+1 are disjoint: Fi \ Fi+1 = ;.

We can also omit literals from the set Fi+1 because they are never the subject of a triple and we are not interested in URIs with identical literal values as object (e.g. the year 1958 in g. 1). Additionally, we can omit blank nodes because they can't be queried.

The example in gure 1 shows another scenario we have to consider for Fi+1. The displayed graph represents the subset we would like to receive as a result of our algorithm. The URI Charite serves as seed (F0 = fChariteg). Initially, we are interested in all data related to the seed, so we construct the complete neighbourhood in the rst step (c0 = $). However, out of all the URIs we discover with this construction, we know we only want to continue the construction in the next step with the successors, because the outbound relations of the seed are more important for our use case. In this case, the frontier must only consider the URIs NS+i0 (Fi) succeeding the set of URIs Fi. The preceding URIs NSi0 (Fi) are the opposite option. To include this information in the input, we add two options for the sequence of expansion steps C: Only select predecessors ($ ) or successors ($+) after constructing the complete neighbourhood. Therefore, the expansion options are ci 2 f$; $ ; $+; ; !g.

For a construction of inbound (ci = ) or outbound (ci = !) triples, the frontier also de nes with NSi0 (Fi) or NS+i0 (Fi) respectively because only those adjacent URIs exist.

In conclusion, the function Fi+1 = f rontier(Si0; ci; Fi; V ) speci es to: Fi+1 = 8>(NSi0 (Fi) \ U ) n V if ci 2 f$g < (NSi0 (Fi) \ U ) n V

g >:(NS+i0 (Fi) \ U ) n V if ci 2 f$+; !g if ci 2 f$ ; The listings 1.4, 1.5, and 1.6 show SPARQL SELECT queries that select all URIs u that are adjacent (ci 2 f$g) / predecessors (ci 2 f$ ; g) / successors (4) SELECT ?u FROM Si ' WHERE f ?u ?p $ f .

FILTER ( isURI ( ? u ) ) g

SELECT ?u

FROM Si ' WHERE f $ f ?p ?u .

FILTER ( isURI ( ? u ) ) g Listing 1.6: SPARQL query that selects URIs u that are successors of f 2 Fi for the frontier Fi+1 from Si0 (ci 2 f$+; !g) to an URI f 2 Fi respectively. The SPARQL queries don't exclude visited URIs u 2 V .

Listing 1.4: SPARQL query that selects URIs u adjacent to f 2 Fi for the frontier Fi+1 from S0

i SELECT ?u FROM Si ' WHERE f f $ f ?p ?u . g UNION f? u ?p $ f . g

FILTER ( isURI ( ? u ) ) g Listing 1.5: SPARQL query that selects URIs u that are predecessors of f 2 Fi for the frontier Fi+1 from Si0 In the example of gure 1, the rst construction step c0 =$+ results in the frontier F1 = fCBF; Berlin; V Kg (see section 6.3 for S0). The following step only constructs outbound triples (c1 = !). The corresponding subset S1 = NW+RDF (F1) returns the triples: {(CBF locatedIn Berlin), (CBF inception 1958), (CBF namedAfter BF), (Berlin population 3644826), (Berlin contains Mitte), (VK locatedIn Mitte), (VK inception 1906)} The resulting frontier would be F2 = fBF; M itteg, however, the subset from gure 1 is already completed with C = ($+; !). 7 7.1

Conclusion Summary

The paper introduces an algorithm S = subset(W; C; seed(); f ilter()) to create a demand-driven Wikidata subset. Therefore, we construct triples in multiple steps starting from a seed. Available options ci 2 f$; $ ; $+; ; !g that sequence the construction o er exibility. A lter reduces the number of triples to exclude irrelevant data.

Future Work

We are currently in the process of implementing the introduced algorithm. With a working implementation, we plan to construct a Wikidata subset of the more challenging domain of microelectronics and their supply chains.

Once we created multiple subsets, we need an evaluation to compare them. The clustering coe cient from graph theory seems like one possible parameter.

One advantage of Wikidata is its links to other data sources on the web. With techniques from link traversal query execution, we could follow the links and include data we nd in the subset.

1. Batsakis , S. , Petrakis , E.G. , Milios , E.: Improving the performance of focused web crawlers . Data & Knowledge Engineering 68 ( 10 ), 1001 {1013 (Oct 2009 ). https://doi.org/10.1016/j.datak. 2009 . 04 .002

2. Beghaeiraveri , S.A.H. , Gray , A.J.G. , McNeill , F.J.: Experiences of Using WDumper to Create Topical Subsets from Wikidata . In: Proceedings of the 2nd International Workshop on Knowledge Graph Construction co-located with 18th Extended Semantic Web Conference (ESWC 2021 ). p. 15 ( Jun 2021 )

3. Bondy , J.A. , Murty , U.S.R. : Graph Theory with Applications . Elsevier, New York, 5 edn . ( 1982 )

4. Chakrabarti , S., van den Berg, M., Dom , B. : Focused crawling: a new approach to topic-speci c Web resource discovery . Computer Networks 31 ( 11 - 16 ), 1623 {1640 (May 1999 ). https://doi.org/10.1016/S1389- 1286 ( 99 ) 00052 - 3

5. Dundar , P. , Aytac , A. , Kilic , E. : The common-neighbourhood of a graph . Boletim da Sociedade Paranaense de Matematica 23 - 32 ( 1 ), 23 ( 2017 ). https://doi.org/10.5269/bspm.v35i1. 22464

6. Hartig , O. , Bizer , C. , Freytag , J.C. : Executing SPARQL Queries over the Web of Linked Data . In: Bernstein, A. , Karger , D.R. , Heath , T. , Feigenbaum , L. , Maynard , D. , Motta , E. , Thirunarayan , K . (eds.) The Semantic Web - ISWC 2009. Lecture Notes in Computer Science , vol. 5823 , pp. 293 { 309 . Springer, Berlin, Heidelberg ( 2009 ). https://doi.org/10.1007/978-3- 642 -04930-9 19

7. Hartig , O. , Freytag , J.C. : Foundations of Traversal Based Query Execution over Linked Data (Extended Version) . In: HT '12: Proceedings of the 23rd ACM conference on Hypertext and social media . pp. 43 { 52 (Jun 2012 ). https://doi.org/10.1145/2309996.2310005

8. Mimouni , N. , Moissinac , J.C. , Vu , A.T. : Domain Speci c Knowledge Graph Embedding for Analogical Link Discovery . International Journal On Advances in Intelligent Systems 13 ( 1 &2), 140 {150 (Jun 2020 ), https://hal-cnrs.archivesouvertes.fr/hal-03052226

9. Olston , C. , Najork , M. :

Web

Crawling . Foundations and Trends in Information Retrieval 4 ( 3 ), 175 { 246 ( 2010 ). https://doi.org/10.1561/1500000017

10. Stickler , P. : CBD - Concise Bounded Description ( Jun 2005 ), https://www.w3.org/Submission/CBD/