<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">KeySearchWiki: An Automatically Generated Dataset for Keyword Search over Wikidata</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Leila</forename><surname>Feddoul</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Heinz Nixdorf Chair for Distributed Information Systems</orgName>
								<orgName type="institution">Friedrich Schiller University Jena</orgName>
								<address>
									<settlement>Jena</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Institute of Data Science</orgName>
								<orgName type="institution">German Aerospace Center DLR</orgName>
								<address>
									<settlement>Jena</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Frank</forename><surname>Löffler</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Heinz Nixdorf Chair for Distributed Information Systems</orgName>
								<orgName type="institution">Friedrich Schiller University Jena</orgName>
								<address>
									<settlement>Jena</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department">Competence Center for Digital Research</orgName>
								<orgName type="institution">Michael Stifel Center</orgName>
								<address>
									<settlement>Jena</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sirko</forename><surname>Schindler</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Institute of Data Science</orgName>
								<orgName type="institution">German Aerospace Center DLR</orgName>
								<address>
									<settlement>Jena</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">KeySearchWiki: An Automatically Generated Dataset for Keyword Search over Wikidata</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">5895AAD8438C85AB630F8A8CD3AE94CD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:20+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Keyword Search</term>
					<term>Knowledge Graph</term>
					<term>Wikidata</term>
					<term>Wikipedia</term>
					<term>Dataset</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Keyword search is an intuitive method to access knowledge graphs without requiring technical expertise or knowledge of the underlying data schema. In this context, various methods for keyword search over knowledge graphs have been developed. However, only few evaluation datasets have been created, mostly based on a time-consuming manual generation. We present KeySearchWiki, an automatically generated dataset for keyword search over Wikidata, containing over 16 thousand queries and their relevant results. It is based on Wikidata and Wikipedia set categories which are refined and combined to derive more complex queries. We explain the dataset generation workflow, highlight some dataset characteristics, present experiments using baseline retrieval methods, and evaluate the accuracy of relevant results.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Knowledge graphs (KGs) have become an undisputed source of semantic knowledge for various tasks, e.g., Question Answering (QA) or Entity Linking <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. Hence, techniques that simplify access for end-users are in great demand. Keyword Search over Knowledge Graphs (KSKG) is a familiar method enabling information retrieval. KSKG systems generally attempt to answer a user query by retrieving graph connections between query keywords. In general, the output of interest of KSKG systems is a set of uniquely identified relevant entities. Recently, KSKG research resulted in a wide range of methods developed <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10]</ref>.</p><p>Benchmarks for effectiveness evaluation play a key role in enhancing systems and enabling inter-system comparison. They provide a target KG, user queries, and corresponding results together with their relevance judgments (RJs), e.g., using binary 1 or 3-point scales. Queries are often either manually crafted <ref type="bibr" target="#b10">[11]</ref> or manually selected from search engine query logs <ref type="bibr" target="#b11">[12]</ref>, which results in small datasets. Relevant results are generally provided by pooling a subset Wikidata'23: Wikidata workshop at ISWC 2023 leila.feddoul@uni-jena.de (L. Feddoul); frank.loeffler@uni-jena.de (F. Löffler); sirko.schindler@dlr.de (S. Schindler) 0000-0001-8896-8208 (L. Feddoul); 0000-0001-6643-6323 (F. Löffler); 0000-0002-0964-4457 (S. Schindler) of system's top results and judged via crowd-sourcing <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b13">14]</ref>. This approach is timeconsuming and depends on systems' results. To the best of our knowledge, there is only one dataset specifically for KSKG <ref type="bibr" target="#b14">[15]</ref>. Its focus was not on creating queries, but on mapping relevant results from previous evaluation campaigns to DBpedia <ref type="bibr" target="#b15">[16]</ref> entities.</p><p>In this paper, we present KeySearchWiki, an automatically generated dataset for keyword search over Wikidata <ref type="bibr" target="#b16">[17]</ref>. We focus on Type Search (TS) <ref type="bibr" target="#b17">[18]</ref> with queries retrieving entities of a specific type (target), e.g., Paul Auster novels with novels as a target. This relates to common real-world scenarios: (1) users explicitly mentioning the target in traditional search engines, (2) users selecting a target category, e.g., books in an online shop, or (3) search systems providing access to only a single type of results, e.g., portals offering access to datasets. To the best of our knowledge, this is the first automatically generated, large-scale, and diverse dataset that also includes complex queries. The general idea is to leverage Wikipedia set categories <ref type="foot" target="#foot_0">2</ref> that are mapped to Wikidata (e.g., Category:American television directors (Q8032156)) as a source of queries and their members as relevant entities. KeySearchWiki is more closely related to humancurated datasets than purely synthetic ones. The queries represent an actual information need as witnessed by the manually maintained, corresponding Wikipedia categories. Furthermore, Our approach is both multilingual (all Wikipedia languages are considered) and hierarchical (exploiting the Wikipedia category hierarchy). We summarize the key contributions as follows:</p><p>• We present a workflow for the automatic generation of the KeySearchWiki dataset. The source code for the dataset generation is publicly available. • We introduce KeySearchWiki, a diverse dataset consisting of 16, 605 queries of different complexity levels together with their relevant entities. • We provide for each query an annotated version that tags each query term with its corresponding Wikidata identifier. Mappings between natural language queries and corresponding Wikidata entities can, e.g., be used to evaluate entity linking systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>In addition to challenges, several datasets have been created to support the evaluation of KSKG approaches and related topics such as QA. Table <ref type="table" target="#tab_0">1</ref> summarizes major existing datasets/challenges. They vary with respect to the task, the target data and its format, size, query creation method, source of relevant entities, RJs types, and RJs source. We distinguish between four types of tasks:</p><p>• Entity Search (ES) <ref type="bibr" target="#b17">[18]</ref>: finding a specific entity, e.g., University of Phoenix in SemSearch Challenge 2010's Entity Search Track (SemSearch2010 ES). • Type Search (TS) <ref type="bibr" target="#b17">[18]</ref>: finding a (ranked) list of entities having a specific type, e.g., Paul</p><p>Auster novels from the Entity Ranking Task of The INitiative for the Evaluation of XML retrieval (INEX2009 ER). • Ad-Hoc Search (AHS): finding a (ranked) list of entities that are described with a set of random keywords (could include ES and TS style queries but also other random ones), ES, TS, and AHS are directly related to KSKG. We include datasets for QA over KGs, as they share characteristics with KSKG datasets and have been adapted to evaluate KSKG systems (e.g., in <ref type="bibr" target="#b2">[3]</ref>). KSKG systems require a target KG represented as triples. Most of the described datasets (cf. Table <ref type="table" target="#tab_0">1</ref>) provide underlying data in RDF format. INEX campaigns usually focus on XML retrieval, but organized a Linked Data (LD) Track to close the gap with the Semantic Web. The target data provided by INEX2012 LD was in a semantically annotated XML format: Wikipedia-LOD v1.1<ref type="foot" target="#foot_1">3</ref> using DBpedia and YAGO <ref type="bibr" target="#b22">[23]</ref> annotations. In INEX2013 LD different dataset collections allowed for various retrieval techniques: XML + RDF (English Wikipedia + DBpedia and YAGO), semantically annotated XML (Wikipedia-LOD v2.0<ref type="foot" target="#foot_2">4</ref> ), and text (extracted from Wikipedia-LOD v2.0). However, INEX LD tracks mainly target textual data, while KSKG systems work on data represented by KGs.</p><p>Queries are often either manually crafted by humans <ref type="bibr" target="#b10">[11]</ref> or collected from previous campaigns where source queries were also manually created <ref type="bibr" target="#b14">[15]</ref>. The manual approach is timeconsuming and requires effort not only for query creation but also for finding interested volunteers. This impacts the size of the dataset, since it results in a small number of queries, usually fifty <ref type="foot" target="#foot_3">5</ref> to one hundred queries per dataset. Furthermore, if the dataset is created in the context of a project that also aims at developing a KSKG system, manual query creation increases the risk of designing biased queries that favor ones own approach, especially if this is done by researchers directly related to the project. Other works select queries manually from search engine query logs <ref type="bibr" target="#b11">[12]</ref>. They argue that log queries are more realistic and representative to user needs. However, users often try to overcome the limitations of a search engine by adapting to its capabilities and by avoiding complex queries that involve relations between different entities <ref type="bibr" target="#b23">[24]</ref>. On the other hand, query logs are often not in-line with the specified underlying data (e.g., KGs). This shortcoming requires additional efforts for selecting queries to have at least some answers within the considered data. Datasets should also contain queries with unambiguous intentions. This is important for judging whether a potential entity is relevant to the query or not. Thus, another step is the selection of queries whose intentions could be derived. LC-QuAD 2.0 <ref type="bibr" target="#b21">[22]</ref> is, to the best of our knowledge, the only dataset that applies a semi-automatic approach for query creation and thus has the largest size (30, 000 queries). SPARQL queries are automatically generated, transformed to template questions, and finally verbalized into natural language questions. However, QA datasets are not initially geared towards KSKG tasks. Hence, their usage requires pre-processing and selection of suitable queries. Another approach to evaluate KSKG systems is the use of randomly generated queries (arbitrary combinations of keywords appearing in the data source). This is generally not a good practice, since resulting queries would not reflect real information needs <ref type="bibr" target="#b24">[25]</ref>.</p><p>All challenges (besides QA) use runs <ref type="foot" target="#foot_4">6</ref> submitted by participants as source of relevant entities. They pool a subset of top results and assess them either via crowd-sourcing (e.g., MTurk<ref type="foot" target="#foot_5">7</ref> ) or by the participants themselves <ref type="bibr" target="#b18">[19]</ref>. This depends on the participating systems, though, and provides no independent list of relevant entities. QA datasets use either automatically or manually created SPARQL queries to generate the list of relevant entities. Here, results do not need to be judged, and only relevant entities are presented. However, we believe that relevant entities could originate from different SPARQL queries depending on the underlying KG. Using a single SPARQL query may omit some potentially relevant entities. KSKG datasets should also contain complex queries. We distinguish between two degrees of complexity. Multi-keyword: queries that contain more than one keyword, and multi-hop (for TS): there is no direct relation between target and keywords <ref type="foot" target="#foot_6">8</ref> . Most of the listed datasets contain multi-keyword queries. However, none of them explicitly claims to provide multi-hop queries. For QA datasets this could be verified since SPARQL queries are provided.</p><p>All datasets provide queries as mere strings. Systems thus have to deal with keyword/target to knowledge graph entity mapping. Providing semantically annotated queries is also useful and allows for using the queries also to evaluate entity linking systems. Another significant quality for such datasets is diversity, which means avoiding similar queries (e.g., cities in Germany and cities in France). It is difficult to programmatically verify the diversity of a dataset, so researches have to rely on the claims made by its creators (e.g., LC-QuAD <ref type="bibr" target="#b25">[26]</ref> claims to avoid generating similar queries). Methods for constructing synthetic benchmarks are proposed in <ref type="bibr" target="#b26">[27,</ref><ref type="bibr" target="#b27">28]</ref>. They follow exactly the steps that a KSKG/QA system would perform to solve the task. In our view, this self-reference (evaluating one system with the output of another) defeats the idea of an objective evaluation and is not based on human judgment.</p><p>We conclude that there is a lack of established evaluation datasets dedicated to KSKG. We overcome the previously described shortcomings by proposing the first (1) automatically generated, (2) complex, (3) large-scale, (4) diverse dataset to evaluate KSKG systems on the TS Task over Wikidata, and providing semantically annotated queries. We propose an innovative approach by using Wikidata/Wikipedia set categories, existing human-edited sources of relevant   entities and TS queries. We leverage the multilingual and hierarchical nature of Wikipedia set category pages to improve the completeness of relevant entities. Our automated approach does not mirror the steps of KSKG systems, but automatically extracts and combines information from manually curated resources to reduce the additional manual effort. Consequently, we consider KeySearchWiki more closely related to human-curated datasets than purely synthetic ones.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Keywords</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Entities Source Target</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Relevant Entity Retrieval</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Dataset Generation Workflow</head><p>KeySearchWiki specifically focuses on the TS Task. Our primary goal is to lower manual effort and propose a fully automated workflow for dataset generation. A further goal is the creation of complex and diverse queries. We notice that Wikidata set categories exhibit TS-like characteristics (cf. Figure <ref type="figure" target="#fig_0">1</ref>). Most set categories have a property category contains (P4224) providing the type (target) of entities contained and additional qualifiers (keywords). This provides the building blocks to construct TS-like queries. Links to corresponding Wikipedia pages can provide relevant entities, as they represent human-curated collections of Wikipedia articles (Wikidata entities) for these categories. Figure <ref type="figure" target="#fig_1">2</ref> depicts the dataset generation workflow whose details will be described in the following.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>relevant entities queries</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>New Entry</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Uni. Houston</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Candidate Generation and Cleaning</head><p>The pipeline starts by generating candidate entries (queries and their relevant entities). Set categories are retrieved from Wikidata in one of two ways: (1) sending SPARQL queries to the Wikidata public endpoint<ref type="foot" target="#foot_7">9</ref> or (2) parsing a Wikidata JSON dump <ref type="foot" target="#foot_8">10</ref> . Both options retrieve set categories with additional information such as category contains (P4224) and its qualifiers used to determine target and keywords respectively. Next, for each available language, the Wikipedia subcategory hierarchy is explored in a Breadth-First-Search-manner to retrieve member pages and their corresponding Wikidata entities. Again, these may be retrieved online using the MediaWiki API <ref type="foot" target="#foot_9">11</ref> or offline from a local database built from SQL Dumps for all needed languages <ref type="foot" target="#foot_10">12</ref> . For each subcategory, we perform a type check: if fewer than 50% of its members are instances of the target or any of its subclasses <ref type="foot" target="#foot_11">13</ref> , traversal in this branch will be stopped. The output of this phase is a list of raw entries. In a cleaning phase, we then remove entries without target or keywords, with more than one keyword/target (ambiguous), without relevant entities, or with a keyword having either an unknown value or no label. The resulting intermediate entries act as input for the two following branches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Native Query Generation</head><p>This phase contains a single operation, Intermediate Entry Filtering. We define two criteria: (1) number of relevant entities (#RE) of intermediate entries and (2) number of keywords/target (#Concepts) of corresponding queries <ref type="foot" target="#foot_12">14</ref> . We keep entries whose #RE is at least equal to two, since TS aims at retrieving a list of entities that could be ranked afterwards. Furthermore, we only keep queries having #Concepts below 7. The rationale is to reflect real-world user behavior, where generally a small number of keywords is used <ref type="bibr" target="#b28">[29,</ref><ref type="bibr" target="#b29">30]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Complex Query Generation</head><p>At this stage, two types of complex queries are constructed: multi-keyword and multi-hop queries 15 .</p><p>Multi-keyword. The process for multi-keyword entries generation is illustrated in Figure <ref type="figure" target="#fig_2">3</ref> based on an example iteration from the actual generation pipeline. The pipeline takes the intermediate entries as input. For the sake of simplicity, consider having three intermediate entries. Each entry corresponds to a set category given by a Wikidata IRI (e.g., Category:Computer programmers (Q6624060)) and contains a query and a set of relevant entities. Similarly, relevant entities are represented by Wikidata IRIs. For convenience, we use simple identifiers in the illustration (e.g., entry1, iri1). Each query is given by a target and a set of keywords (e.g., query1: "Programmer" "human" corresponds to instances of human that are described by the keyword Programmer). Multi-keyword queries combine a number of queries that have the same target to create a new query (e.g., query1 and query2 result in a new query "programmer" "University of Houston" "human"). To detect possible combinations, a RelevantEntities-Entry Inverted Index is created in step 1  ○. This index aggregates entries that share at least one relevant entity. Separate indices are created for each target to group only compatible queries. From those indices, we select elements containing at least two different entries as Possible Entry Combinations in step 2 ○. The New Entry Construction step 3  ○ involves the creation of new queries and the new relevant entity sets. New queries are created by merging the keywords and maintaining the shared target: "programmer" "University of Houston" "human". The new relevant entity set is the intersection of relevant entities of the involved entries: (iri1, iri2). Multi-keyword entries are then filtered in step 4  ○ based on the criteria defined in Subsection 3.2. Multi-hop. The steps for multi-hop entry generation are explained in Figure <ref type="figure" target="#fig_3">4</ref>, also using a real iteration. The pipeline takes the intermediate entries as input. We consider four intermediate entries as example and use simple identifiers for the sake of convenience (e.g., entry1, CH (first two letters of entity label), or iri1). The algorithm traverses all intermediate entries by applying steps 1  ○ to 4  ○. We consider one iteration (entry1). Multi-hop queries (2 hops for now) link two entries where a relevant entity of one query is equal to a keyword of another query. For example, from query1 "World Music Awards" "human" and query2 "EL" "album", we can derive a new query "World Music Awards" "album". In contrast to query1 and query2, in the new query there is no direct relation between target and keywords, i.e., album and World Music Awards. A system needs to use another intermediate entity (e.g., EL, an artist that won a World Music Awards) to connect keyword(s) and target. The Transitive Entry Linking 1  ○ links Relevant Entities of the Current Entry (CERE) to other entries using one of those relevant entities as keyword. Then an Entry Clustering 2  ○ groups linked entities by target and keywords different from the CERE. With this, new entries can be constructed. The new multi-hop query is built in step 3  ○ by merging cluster keys (album) and current entry keywords (World Music Awards, album). 15 Based on Figure <ref type="figure" target="#fig_0">1</ref>, native queries could be seen as 1-hop queries since they include keywords that are directly related to the target (e.g., entities of type human (Q5) directly related to television director (Q2059704) via the property occupation (P106)). Clusters with the same target as the current entry are removed since they generate the same query (if the cluster has no keywords), or multi-keyword queries. Relevant entities of the new entry are derived from the union of relevant entities in the corresponding cluster. For example, "World Music Awards" "albums" are albums from all human artists that won the World Music Awards (here, CH, MI and EL). Applying the same algorithm recursively, yields queries of more than two hops. For now, we limit the number of hops to two, though. The last step is Filtering 4 ○ using the criteria of Subsection 3.2 (#RE and #Concepts) as well as coverage. We define the coverage as 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑠𝑖𝑧𝑒 𝐶𝐸𝑅𝐸 𝑠𝑖𝑧𝑒 , where 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑠𝑖𝑧𝑒 is the number of entries in the respective cluster and 𝐶𝐸𝑅𝐸 𝑠𝑖𝑧𝑒 is the number of relevant entities of the current entry. This metric represents the completeness of the relevant entity set with regard to the new query. In the example of Figure <ref type="figure" target="#fig_3">4</ref>, the coverage of the new query is 0.66 ∼ 2 3 (in the actual dataset, the query World Music Awards album has 𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 = 0.44 with 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑠𝑖𝑧𝑒 = 109 and 𝐶𝐸𝑅𝐸 𝑠𝑖𝑧𝑒 = 250). A coverage of 1 is not reached here as no linking with relevant entity MI was found, i.e., the entry MI album does not exist among the input entries. In general, this indicates a missing set category for this combination. We empirically derived a minimum coverage requirement of 0.1 after analyzing the distribution for all multi-hop queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Entry Selection</head><p>This step aims to ensure the diversity of generated entries. From sets of structurally highly similar queries, one representative is chosen while others are discarded (e.g., "California State University, Fullerton" "human" is semantically highly similar to "University of Houston" "human"). We define the query signature as: &lt;Target&gt; &lt;Keyword-Types&gt; (e.g., signature of "University of Houston" "human" is &lt;human (Q5)&gt; &lt;university (Q3918), public educational institution of US (Q23002039)&gt;). The three types of entries are merged (native/multi-keyword/multi-hop) and grouped by their signature. From each group, one representative of each entry type is selected.</p><p>For native/multi-keyword entries, a pseudo-random <ref type="foot" target="#foot_13">16</ref> selection is performed, whereas from multi-hop entries, the one with the highest coverage is selected.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Dataset Characteristics and Availability</head><p>The current dataset version is generated using Wikidata JSON dump and Wikipedia SQL dumps of 2021-09-20 <ref type="foot" target="#foot_14">17</ref> . The final KeySearchWiki dataset consists of 16, 605 final entries (native: 1, 138, multi-keyword: 15, 354, multi-hop: 113) with 3, 899, 135 unique relevant entities. It involves 73 different targets, 2, 797 unique keywords, and 739 different keyword types. Human (Q5) is the most frequent target (13, 260), followed by album (Q482994) (1, 815), video game (Q7889) (757), and song (Q7366) (303). Other insights about the dataset are documented on GitHub <ref type="foot" target="#foot_15">18</ref> .</p><p>The source code for KeySearchWiki is publicly available <ref type="bibr" target="#b30">[31,</ref><ref type="bibr" target="#b31">32]</ref> under an MIT License, including a description of the dataset, its usage and characteristics, examples, and steps needed to reproduce it. We publish our data on Zenodo <ref type="bibr" target="#b32">[33]</ref> under a CC-BY 4.0 License to ensure persistent and public access to all resources. The current dataset is provided in both TREC <ref type="foot" target="#foot_16">19</ref>and JSON formats. The TREC format represents relevant entities and their RJs as follows <ref type="foot" target="#foot_17">20</ref> : &lt;queryID&gt; 0 &lt;RelevantEntityIRI&gt; &lt;judgment&gt;. Queries are in a separate text file, following the format in DBpedia-Entity v2 <ref type="bibr" target="#b14">[15]</ref>: &lt;queryID&gt; &lt;query&gt;. We provide two types of query files: one where queries are given by labels (e.g., &lt;MK79540&gt; &lt;programmer University of Houston human&gt; and a second with entity IRIs (e.g., &lt;MK79540&gt; &lt;Q5482740 Q1472358 Q5&gt;). The latter can be directly used by systems that omit a preceding entity linking step. We also provide an additional list of queries that was partially adjusted (naturalized) to better reflect natural language query formulation. For example, by transforming the query diplomat Germany 20th century human into diplomat Germany 20th century. This is done by removing the target from the query if one of its keywords is a descendant of the target via subclass of (P279). In the previous example, diplomat is in the subclass hierarchy of human. Following this process, 1, 826 queries were adjusted and the whole list was provided using the same format: &lt;queryID&gt; &lt;query&gt;. The provided data contains:</p><p>• KeySearchWiki-JSON -the final dataset in JSON format.</p><p>• KeySearchWiki-queries-label -a text file containing the 16, 605 queries, each line containing space-separated queryID and query text (labels). • KeySearchWiki-queries-iri -a text file containing the 16, 605 queries, each line containing space-separated queryID and IRIs of query elements. • KeySearchWiki-queries-naturalized -a text file with all 16, 605 queries, including 1, 826 adjusted queries, each line containing space-separated queryID and query text (labels). • KeySearchWiki-qrels-trec -a text file containing relevant entities in TREC format.</p><p>• KeySearchWiki-cache <ref type="bibr" target="#b33">[34]</ref> -a collection of SQLite database files containing all the data retrieved from Wikidata JSON Dump and Wikipedia SQL Dumps of 2021-09-20.</p><p>Users can update KeySearchWiki anytime by running our code on a new dump of Wikidata and Wikipedia. We plan to periodically publish new dataset releases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experiments and Evaluation</head><p>We use KeySearchWiki to evaluate Elas4RDF <ref type="bibr" target="#b34">[35]</ref>, a system based on indexing the textual information of entities. Elas4RDF relies on a triple-based indexing using Elasticsearch <ref type="foot" target="#foot_18">21</ref> . We use the best-performing approach where each triple is represented by a document with the following fields: subject/predicate/object keywords <ref type="foot" target="#foot_19">22</ref> , description of IRI object/subject, and label of IRI object/subject. Object fields are given higher weight. We use the Elas4RDF-index Service <ref type="foot" target="#foot_20">23</ref> to create an index of Wikidata entity triples. We perform our experiments on a subset of queries. Considering all dataset queries would imply indexing triples involving all Wikidata entities. We avoid that to keep the indexing time reasonable by selecting queries with one of the top-10 targets and thus index triples involving Wikidata entities that are either instances of the target itself or any of its subclasses. This way we keep 99% (only 112 queries discarded) of the queries from all the types (native: 1, 037, multi-keyword: 15, 343, multi-hop: 113). 146, 211, 253 triples were indexed with an index size of 16.8 GB. We evaluate four ranking methods provided by Elasticsearch with default settings as in <ref type="bibr" target="#b34">[35]</ref>: BM25 <ref type="bibr" target="#b35">[36]</ref>, DFR <ref type="bibr" target="#b36">[37]</ref>, LM Dirichlet <ref type="bibr" target="#b37">[38]</ref>, and LM Jelinek-Mercer <ref type="bibr" target="#b37">[38]</ref>. For each baseline (run), the Elas4RDF-search Service <ref type="foot" target="#foot_21">24</ref> is used to retrieve the results which are then written in TREC format: &lt;queryID&gt; Q0 &lt;RetrievedEntityIRI&gt; &lt;rank&gt; &lt;score&gt; &lt;runID&gt;. The second column is unused and should always be "Q0". Table <ref type="table" target="#tab_3">2</ref> summarizes the experiment results. We use Mean Average Precision (MAP) and Precision at rank 10 (P@10) (considering the top-1000 results). We notice that the different query types reflect various degrees of difficulty. This corresponds to our intention of adding complex queries. Native queries are less challenging and thus achieve better results across retrieval methods. These queries usually involve keywords that are directly related and hence their textual information is mostly occurring within the triples of the same entity. Complex queries are more difficult and show poor performance in general. Even though multi-keyword queries still involve directly related keywords, they tend to be longer and thus seem more challenging. Here, the performance has dropped by ∼ 0.18 points for both MAP and P@10 compared to native queries. Multi-hop queries are more difficult than their multi-keyword counterparts as they involve keywords not directly related and thus their textual information does not occur within triples of the same entity. This lowers the performance by ∼ 0.19 points compared to native queries and by ∼ 0.01 points compared to the multi-keyword ones. The results reveal no noticeable difference between the different retrieval methods. We only notice an improvement between ∼ 0.01 − 0.05 points of the P@10 using BM25 for native queries. A more detailed investigation is out of the scope for this paper. Overall, the performance of the ranking methods over KeySearchWiki is in line with other published results (e.g., in <ref type="bibr" target="#b14">[15]</ref> and <ref type="bibr" target="#b38">[39]</ref>). Further details about experiment data preparation, indexing, and the experimental setup are provided in the dataset's GitHub repository <ref type="bibr" target="#b30">[31]</ref>. Runs, experiment results, queries, and relevance judgments are published on Zenodo <ref type="bibr" target="#b39">[40]</ref>.</p><p>Evaluation. We evaluate the accuracy of relevant entities in KeySearchWiki by using existing SPARQL queries as a baseline and comparing KeySeachWiki's relevant entities with the results of these queries. Evaluation scripts and results are available on GitHub. Some Wikidata set categories have associated SPARQL queries using the property Wikidata SPARQL query equivalent (P3921). These queries retrieve results corresponding to the set category and are handcrafted by humans. They can be considered as another source of relevant entities (baseline) that can be used for comparison and verification of the relevant entities provided by KeySearchWiki. We extract native entries that contain such SPARQL queries (67 native entries) and manually verify whether the corresponding SPARQL queries correctly represent the information need expressed by the set category. One query was excluded which results in 66 queries used in this evaluation. For the selected queries, we calculate the . Results reveal that KeySearchWiki is capable of catching most of the relevant entities retrieved by SPARQL resulting in an Average Recall of ∼ 0.70 and an Average Precision of ∼ 0.54. A more detailed analysis of the results is provided in GitHub<ref type="foot" target="#foot_22">25</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Limitations</head><p>In the following we describe the limitations of KeySearchWiki.</p><p>Matching between data sources and the target KG. In general, even though each Wikipedia page has its corresponding Wikidata entity, the two sources do not always match with respect to the knowledge contained. This is due to the fact that both are mostly independently maintained by volunteers. Despite the overlap and collaboration between both communities, the information in both projects will probably continue to differ in the foreseeable future. Other benchmarks also suffer from this -especially those that collect queries independently from the actual KG (e.g., from logs). We attempt to mitigate the effects by using closely related sources (Wikipedia and Wikidata) for queries and relevant entities.</p><p>Completeness of relevant results. Depending on the approach, completeness is rather hard to achieve for benchmarks of reasonable size. Human relevance judgments may be feasible for smaller datasets, but fail for larger ones that are built using pooling <ref type="bibr" target="#b41">[42]</ref>. We try to increase the completeness by considering all Wikipedia languages and traversing its hierarchy. Furthermore, KeySearchWiki uses Wikipedia as relevant entity source to include also Wikidata entities with missing semantic description.</p><p>Evaluation of approaches exploiting Wikipedia categories. Systems following KeySearch-Wiki's strategy of exploiting Wikipedia categories may achieve close to perfect scores. However, the task of KSKG assumes only two inputs: a query and a target KG. Additional sources alter this task and result in systems heavily depending on a particular KG. In such scenarios (system using Wikipedia categories), the dataset should not be used to avoid any bias.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion and Future Work</head><p>We introduced KeySearchWiki -a fully automatically generated, complex, large-scale, and diverse dataset for evaluating keyword search systems over Wikidata. We leverage Wikidata and Wikipedia set categories as data sources for both relevant entities and queries. We gather relevant entities by carefully navigating the Wikipedia set categories hierarchy in all available languages. In the future, we plan to extend the dataset by also generalizing to Wikimedia categories (Q4167836) that are superclasses of the currently used set categories. This will allow us to increase the number of dataset entries and to also generate more high-coverage multi-hop entries.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Category:American television directors (Q8032156).</figDesc><graphic coords="5,131.19,127.20,116.44,71.38" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Dataset generation workflow.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Example pipeline for multi-keyword entries generation.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Example pipeline for multi-hop entries generation.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Overview of existing datasets/challenges. Wikipedia-LOD v1.1, sem. XML: semantically annotated XML, rel.: present only relevant entities, collec.: collected)</figDesc><table><row><cell>(QC: Query Creation,</cell><cell>RE</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2</head><label>2</label><figDesc>Experiment results of the baseline methods.</figDesc><table><row><cell>Method</cell><cell>Native</cell><cell cols="2">Multi-Keyword</cell><cell>Multi-hop</cell></row><row><cell></cell><cell cols="2">MAP P@10 MAP</cell><cell>P@10</cell><cell>MAP P@10</cell></row><row><cell>BM25</cell><cell cols="2">0.211 0.225 0.025</cell><cell>0.039</cell><cell>0.014 0.032</cell></row><row><cell>DFR</cell><cell cols="2">0.209 0.211 0.023</cell><cell>0.029</cell><cell>0.015 0.024</cell></row><row><cell>LM Dirichlet</cell><cell cols="2">0.182 0.180 0.020</cell><cell>0.025</cell><cell>0.015 0.018</cell></row><row><cell cols="3">LM Jelinek-Mercer 0.212 0.215 0.023</cell><cell>0.029</cell><cell>0.018 0.022</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head></head><label></label><figDesc>Precision and Recall of KeySearchWiki entities {𝑅𝐸 𝑊 𝐼𝐾𝐼 } with respect to SPARQL query results {𝑅𝐸 𝑆𝑃 𝐴𝑅𝑄𝐿 } [41]: |{𝑅𝐸 𝑆𝑃 𝐴𝑅𝑄𝐿 }∩{𝑅𝐸 𝑊 𝐼𝐾𝐼 }| |{𝑅𝐸 𝑊 𝐼𝐾𝐼 }| , 𝑅𝑒𝑐𝑎𝑙𝑙 = |{𝑅𝐸 𝑆𝑃 𝐴𝑅𝑄𝐿 }∩{𝑅𝐸 𝑊 𝐼𝐾𝐼 }| |{𝑅𝐸 𝑆𝑃 𝐴𝑅𝑄𝐿 }|</figDesc><table><row><cell>𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://en.wikipedia.org/wiki/Wikipedia:Categorization#Set_categories</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://inex.mmci.uni-saarland.de/tracks/lod/2012/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://inex.mmci.uni-saarland.de/tracks/lod/2013/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">Established minimum for evaluating retrieval systems<ref type="bibr" target="#b23">[24]</ref>.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">Output (ranked) list of relevant entities produced by the participating systems.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">https://www.mturk.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_6">The corresponding SPARQL query consists of two connected triples (e.g., ?target ns:relation1 ?iri2 . ?iri2 ns:relation2 ?keyword).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_7">https://query.wikidata.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_8">https://dumps.wikimedia.org/wikidatawiki/entities/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_9">https://www.mediawiki.org/wiki/API:Main_page</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_10">Three SQL dumps for each language: categorylinks, page, and page_props.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="13" xml:id="foot_11">SPARQL: ?entity wdt:P31/wdt:P279* ?target</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="14" xml:id="foot_12">The #Concepts is always equal or lower than #Words (e.g., for query : "University of Houston" "human", #𝐶𝑜𝑛𝑐𝑒𝑝𝑡𝑠 = 2 and #𝑊 𝑜𝑟𝑑𝑠 = 4).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="16" xml:id="foot_13">Entries in a group are sorted by queryID. Then the first element is selected to ensure a deterministic behavior for reproducibility.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="17" xml:id="foot_14">A version where the Wikidata "Wikimedia set categories (Q59542487)" were not yet merged with their initially superclasses "Wikimedia categories (Q4167836)". https://github.com/fusion-jena/KeySearchWiki/blob/master/ README.md#remark</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="18" xml:id="foot_15"><ref type="bibr" target="#b17">18</ref> https://github.com/fusion-jena/KeySearchWiki/tree/master/docs#dataset-</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="19" xml:id="foot_16">characteristics<ref type="bibr" target="#b18">19</ref> https://trec.nist.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="20" xml:id="foot_17">gov/data/qrels_eng/<ref type="bibr" target="#b19">20</ref> The second field is unused and set to 0 according to the TREC qrels format.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="21" xml:id="foot_18">https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="22" xml:id="foot_19">Represented by the literal value. If a triple component is not a literal, the IRI's namespace part is removed and the remainder is tokenized into keywords.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="23" xml:id="foot_20">https://github.com/SemanticAccessAndRetrieval/Elas4RDF-index (adapted to Wikidata)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="24" xml:id="foot_21">https://github.com/SemanticAccessAndRetrieval/Elas4RDF-search (adapted to Wikidata)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="25" xml:id="foot_22">https://github.com/fusion-jena/KeySearchWiki/tree/master/docs#evaluation-results</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Acknowledgments</head><p>This work has been funded by the German Aerospace Center (DLR). We thank Prof. Dr. Birgitta König-Ries for guidance and feedback.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">of The Information Retrieval Series</title>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-93935-3</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
			<publisher>Springer</publisher>
			<biblScope unit="volume">39</biblScope>
		</imprint>
	</monogr>
	<note>Entity-Oriented Search</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A survey of question answering over knowledge base</title>
		<author>
			<persName><forename type="first">P</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Feng</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-15-1956-7_8</idno>
	</analytic>
	<monogr>
		<title level="m">Knowledge Graph and Semantic Computing: Knowledge Computing and Language Understanding</title>
				<meeting><address><addrLine>Singapore, Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="86" to="97" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Search text to retrieve graphs: A scalable RDF keyword-based search system</title>
		<author>
			<persName><forename type="first">D</forename><surname>Dosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Silvello</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2020.2966823</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="14089" to="14111" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Semantics-driven keyword search over knowledge graphs</title>
		<author>
			<persName><forename type="first">L</forename><surname>Feddoul</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-2798/paper3.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Doctoral Consortium at ISWC 2020 co-located with 19th International Semantic Web Conference (ISWC 2020)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Doctoral Consortium at ISWC 2020 co-located with 19th International Semantic Web Conference (ISWC 2020)<address><addrLine>Athens, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-11-03">November 3rd, 2020. 2798. 2020</date>
			<biblScope unit="page" from="17" to="24" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Efficient keyword search over graph-structured data based on minimal covered r-cliques</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ghanbarpour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Niknafs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Naderi</surname></persName>
		</author>
		<idno type="DOI">10.1631/FITEE.1800133</idno>
	</analytic>
	<monogr>
		<title level="j">Frontiers Inf. Technol. Electron. Eng</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="448" to="464" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Keyword search over knowledge graphs via static and dynamic hub labelings</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kharlamov</surname></persName>
		</author>
		<idno type="DOI">10.1145/3366423.3380110</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The Web Conference 2020</title>
				<meeting>The Web Conference 2020<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="235" to="245" />
		</imprint>
	</monogr>
	<note>WWW &apos;20</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Novel node importance measures to improve keyword search over RDF graphs</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">S</forename><surname>Menendez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Casanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A P</forename><surname>Paes Leme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Boughanem</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-27618-8_11</idno>
	</analytic>
	<monogr>
		<title level="m">Database and Expert Systems Applications</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="143" to="158" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Keyword search over RDF graphs using WordNet</title>
		<author>
			<persName><forename type="first">M</forename><surname>Rihany</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Kedad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lopes</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-2343/paper15.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Conference on Big Data and Cyber-Security Intelligence, BDCSIntell 2018</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the 1st International Conference on Big Data and Cyber-Security Intelligence, BDCSIntell 2018<address><addrLine>Hadath, Lebanon</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">December 13-15, 2018. 2018</date>
			<biblScope unit="volume">2343</biblScope>
			<biblScope unit="page" from="75" to="82" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Keyword search on RDF graphs -a query graph assembly approach</title>
		<author>
			<persName><forename type="first">S</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">X</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhao</surname></persName>
		</author>
		<idno type="DOI">10.1145/3132847.3132957</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM &apos;17</title>
				<meeting>the 2017 ACM on Conference on Information and Knowledge Management, CIKM &apos;17<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="227" to="236" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Constructing target-aware results for keyword search on knowledge graphs</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Shan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.datak.2017.02.001</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1016/j.datak.2017.02.001" />
	</analytic>
	<monogr>
		<title level="j">Data &amp; Knowledge Engineering</title>
		<imprint>
			<biblScope unit="volume">110</biblScope>
			<biblScope unit="page" from="1" to="23" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Overview of the INEX 2012 linked data track</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kamps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">R</forename><surname>Camps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Marx</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Schuth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Theobald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gurajada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mishra</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-1178/CLEF2012wn-INEX-WangEt2012.pdf" />
	</analytic>
	<monogr>
		<title level="m">CLEF 2012 Evaluation Labs and Workshop, Online Working Notes</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting><address><addrLine>Rome, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">September 17-20, 2012. 2012</date>
			<biblScope unit="volume">1178</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Evaluating ad-hoc object retrieval</title>
		<author>
			<persName><forename type="first">H</forename><surname>Halpin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Herzig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mika</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Blanco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pound</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Thompon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">T</forename><surname>Tran</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-666/paper9.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Workshop on Evaluation of Semantic Technologies (IWEST 2010)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the International Workshop on Evaluation of Semantic Technologies (IWEST 2010)<address><addrLine>Shanghai, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010-11-08">November 8, 2010. 2010</date>
			<biblScope unit="volume">666</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Entity search evaluation over structured web data</title>
		<author>
			<persName><forename type="first">R</forename><surname>Blanco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Halpin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Herzig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mika</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pound</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Thompson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tran</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Workshop on Entity-Oriented Search at SIGIR 2011</title>
				<meeting>the 1st International Workshop on Entity-Oriented Search at SIGIR 2011<address><addrLine>Beijing, China, TU Delft</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011-07-28">28.07.2011. 2011</date>
			<biblScope unit="page" from="65" to="71" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Bellot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Doucet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Geva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gurajada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kamps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kazai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Koolen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Moriceau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mothe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Preminger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sanjuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schenkel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tannier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Theobald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Trappett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-40802-1_27</idno>
		<title level="m">Information Access Evaluation. Multilinguality, Multimodality, and Visualization</title>
				<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="269" to="281" />
		</imprint>
	</monogr>
	<note>Overview of INEX 2013</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">DBpedia-Entity v2: A test collection for entity search</title>
		<author>
			<persName><forename type="first">F</forename><surname>Hasibi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Nikolaev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">E</forename><surname>Bratsberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kotov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Callan</surname></persName>
		</author>
		<idno type="DOI">10.1145/3077136.3080751</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;17</title>
				<meeting>the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;17<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1265" to="1268" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">DBpedia -a large-scale, multilingual knowledge base extracted from Wikipedia</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Isele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jakob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jentzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kontokostas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Mendes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Morsey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Van Kleef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<idno type="DOI">10.3233/sw-140134</idno>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="167" to="195" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Wikidata: A free collaborative knowledgebase</title>
		<author>
			<persName><forename type="first">D</forename><surname>Vrandečić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krötzsch</surname></persName>
		</author>
		<idno type="DOI">10.1145/2629489</idno>
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="78" to="85" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Ad-hoc object retrieval in the web of data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pound</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mika</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zaragoza</surname></persName>
		</author>
		<idno type="DOI">10.1145/1772690.1772769</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 19th International Conference on World Wide Web, WWW &apos;10</title>
				<meeting>the 19th International Conference on World Wide Web, WWW &apos;10<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="771" to="780" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Overview of the INEX 2009 entity ranking track</title>
		<author>
			<persName><forename type="first">G</forename><surname>Demartini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Iofciu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Vries</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-14556-8_26</idno>
	</analytic>
	<monogr>
		<title level="m">Focused Retrieval and Evaluation</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="254" to="264" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Harth</surname></persName>
		</author>
		<ptr target="http://km.aifb.kit.edu/projects/btc-2009/" />
		<title level="m">Billion Triples Challenge data set</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">9th challenge on question answering over linked data (QALD-9)</title>
		<author>
			<persName><forename type="first">R</forename><surname>Usbeck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">H</forename><surname>Gusmita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Ngomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Saleem</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-2241/paper-06.pdf" />
	</analytic>
	<monogr>
		<title level="m">Joint proceedings of the 4th Workshop on Semantic Deep Learning (SemDeep-4) and NLIWoD4: Natural Language Interfaces for the Web of Data (NLIWOD-4) and 9th Question Answering over Linked Data challenge (QALD-9) co-located with 17th International Semantic Web Conference (ISWC 2018)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting><address><addrLine>Monterey, California, United States of America</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">October 8th -9th, 2018. 2241. 2018</date>
			<biblScope unit="page" from="58" to="64" />
		</imprint>
	</monogr>
	<note>invited paper</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">LC-QuAD 2.0: A large dataset for complex question answering over Wikidata and DBpedia</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dubey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Abdelkawi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-30796-7_5</idno>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2019</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="69" to="78" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">YAGO: A multilingual knowledge base from Wikipedia, Wordnet, and Geonames</title>
		<author>
			<persName><forename type="first">T</forename><surname>Rebele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hoffart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Biega</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kuzey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-46547-0_19</idno>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2016</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="177" to="185" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">A framework for evaluating database keyword search strategies</title>
		<author>
			<persName><forename type="first">J</forename><surname>Coffman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Weaver</surname></persName>
		</author>
		<idno type="DOI">10.1145/1871437.1871531</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM &apos;10</title>
				<meeting>the 19th ACM International Conference on Information and Knowledge Management, CIKM &apos;10<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="729" to="738" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Introduction to Information Retrieval</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Raghavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schütze</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<publisher>Cambridge University Press</publisher>
			<pubPlace>USA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">LC-QuAD: A corpus for complex question answering over knowledge graphs</title>
		<author>
			<persName><forename type="first">P</forename><surname>Trivedi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Maheshwari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dubey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-68204-4_22</idno>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2017</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="210" to="218" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Automatic construction of benchmarks for RDF keyword search systems evaluation</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">B</forename><surname>Neves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A P P</forename><surname>Leme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">T</forename><surname>Izquierdo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Casanova</surname></persName>
		</author>
		<idno type="DOI">10.5220/0010519401260137</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd International Conference on Enterprise Information Systems, ICEIS 2021</title>
				<meeting>the 23rd International Conference on Enterprise Information Systems, ICEIS 2021<address><addrLine>SCITEPRESS</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="126" to="137" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Maestro: Automatic generation of comprehensive benchmarks for question answering over knowledge graphs</title>
		<author>
			<persName><forename type="first">A</forename><surname>Orogat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>El-Roby</surname></persName>
		</author>
		<idno type="DOI">10.1145/3589322</idno>
	</analytic>
	<monogr>
		<title level="j">Proc. ACM Manag. Data</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">24</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Searching the web: The public and their queries</title>
		<author>
			<persName><forename type="first">A</forename><surname>Spink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wolfram</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B J</forename><surname>Jansen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Saracevic</surname></persName>
		</author>
		<idno type="DOI">10.1002/1097-4571(2000)9999:9999&lt;::AID-ASI1591&gt;3.0.CO;2-R</idno>
	</analytic>
	<monogr>
		<title level="j">J. Am. Soc. Inf. Sci. Technol</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="page" from="226" to="234" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">A picture of search</title>
		<author>
			<persName><forename type="first">G</forename><surname>Pass</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chowdhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Torgeson</surname></persName>
		</author>
		<idno type="DOI">10.1145/1146847.1146848</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale &apos;06</title>
				<meeting>the 1st International Conference on Scalable Information Systems, InfoScale &apos;06<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Feddoul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schindler</surname></persName>
		</author>
		<ptr target="https://github.com/fusion-jena/KeySearchWiki" />
		<title level="m">fusion-jena/KeySearchWiki</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">L</forename><surname>Feddoul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schindler</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.8016819</idno>
	</analytic>
	<monogr>
		<title level="j">fusion-jena/KeySearchWiki</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">L</forename><surname>Feddoul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Löffler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schindler</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.6010301</idno>
	</analytic>
	<monogr>
		<title level="j">KeySearchWiki</title>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<title level="m" type="main">KeySearchWiki-cache</title>
		<author>
			<persName><forename type="first">L</forename><surname>Feddoul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Löffler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schindler</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.5752018</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Keyword search over RDF using document-centric information retrieval systems</title>
		<author>
			<persName><forename type="first">G</forename><surname>Kadilierakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fafalios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Papadakos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-49461-2_8</idno>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="121" to="137" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">The probabilistic relevance framework: Bm25 and beyond</title>
		<author>
			<persName><forename type="first">S</forename><surname>Robertson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zaragoza</surname></persName>
		</author>
		<idno type="DOI">10.1561/1500000019</idno>
	</analytic>
	<monogr>
		<title level="j">Found. Trends Inf. Retr</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="333" to="389" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Probabilistic models of information retrieval based on measuring the divergence from randomness</title>
		<author>
			<persName><forename type="first">G</forename><surname>Amati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Van Rijsbergen</surname></persName>
		</author>
		<idno type="DOI">10.1145/582415.582416</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Inf. Syst</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="357" to="389" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">A study of smoothing methods for language models applied to ad hoc information retrieval</title>
		<author>
			<persName><forename type="first">C</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lafferty</surname></persName>
		</author>
		<idno type="DOI">10.1145/383952.384019</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;01</title>
				<meeting>the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;01<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="334" to="342" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">A test collection for entity search in DBpedia</title>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Neumayer</surname></persName>
		</author>
		<idno type="DOI">10.1145/2484028.2484165</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;13</title>
				<meeting>the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;13<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="737" to="740" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Feddoul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Löffler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schindler</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.6010349</idno>
		<title level="m">KeySearchWiki-experiments</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Analysis of consistency between Wikidata and Wikipedia categories</title>
		<author>
			<persName><forename type="first">L</forename><surname>Feddoul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Löffler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schindler</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3262/paper4.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Wikidata Workshop 2022 co-located with the 21st International Semantic Web Conference (ISWC2022), Virtual Event</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the 3rd Wikidata Workshop 2022 co-located with the 21st International Semantic Web Conference (ISWC2022), Virtual Event<address><addrLine>Hanghzou, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022-10">October 2022. 2022</date>
			<biblScope unit="volume">3262</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Retrieval evaluation with incomplete information</title>
		<author>
			<persName><forename type="first">C</forename><surname>Buckley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Voorhees</surname></persName>
		</author>
		<idno type="DOI">10.1145/1008992.1009000</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;04</title>
				<meeting>the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;04<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="25" to="32" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
