<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>DBLP-QuAD: A Question Answering Dataset over the DBLP Scholarly Knowledge Graph</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Debayan Banerjee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sushil Awale</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Usbeck</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chris Biemann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universität Hamburg</institution>
          ,
          <addr-line>Hamburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>37</fpage>
      <lpage>51</lpage>
      <abstract>
        <p>In this work we create a question answering dataset over the DBLP scholarly knowledge graph (KG). DBLP is an on-line reference for bibliographic information on major computer science publications that indexes over 4.4 million publications published by more than 2.2 million authors. Our dataset consists of 10,000 question answer pairs with the corresponding SPARQL queries which can be executed over the DBLP KG to fetch the correct answer. DBLP-QuAD is the largest scholarly question answering dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>Question Answering Scholarly Knowledge Graph DBLP Dataset</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        KGQA datasets exist [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, not all datasets contain a mapping of natural language
questions to the logical form (e.g. SPARQL,  -calculus, S-expression). Some simply contain the
question and the eventual answer. Such datasets can not be used to train models in the task of
semantic parsing.
      </p>
      <p>In this work, we present a KGQA dataset called DBLP-QuAD, which consists of 10,000
questions with corresponding SPARQL queries. The question formation process begins with
human-written templates, and later, we machine-generate more questions from these templates.
DBLP-QuAD consists of a variety of simple and complex questions and also tests the
compositional generalisation of the models. DBLP-QuAD is the largest scholarly KGQA dataset being
made available to the public5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        ORKG-QA benchmark [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is the first scholarly KGQA dataset grounded to ORKG. The dataset was
prepared using the ORKG API and focuses on the content of academic publications structured
in comparison tables. The dataset is relatively small in size with only 100 question-answer pairs
covering only 100 research publications.
      </p>
      <p>
        Several other QA datasets exist, both for IR-based QA [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] and KGQA [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ] approaches.
Several diferent approaches have been deployed to generate the KGQA datasets. These
approaches range from manual to machine generation. However, most datasets lie in between and
use a combination of manual and automated process.
      </p>
      <p>A clear separation can be created between datasets that contain logical forms and those that
do not. Datasets that do not require logical forms can be crowd-sourced and such datasets are
generally large in size. Crowd sourcing is generally not possible for annotating logical forms
because this task requires high domain expertise and it is not easy to find such experts on crowd
sourcing platforms. We focus on datasets that contain logical forms.</p>
      <p>
        Free917 and QALD [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ] datasets were created manually by domain experts, however, their
sizes are relatively small (917 and 806 respectively).
      </p>
      <p>
        WebQuestionsSP and ComplexWebQuestions [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ] are developed using exisiting datasets.
WebQuestionsSP is a semantic parsing dataset developed by using questions from WebQuestions
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Yih et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] developed a dialogue-like user interface which allowed five expert human
annotators to annotate the data in stages.
      </p>
      <p>ComplexWebQuestions is a collection of 34,689 complex question paired with answers and
SPARQL queries grounded to Freebase KG. The dataset builds on WebQuestionsSP by sampling
question-query pairs from the dataset and automatically generating questions and complex
SPARQL queries with composition, conjunctions, superlatives, and comparatives functions. The
machine generated questions are manually annotated to natural questions and validated by 200
AMT crowd workers.</p>
      <p>
        The OVERNIGHT (ON) approach is a semantic parsing dataset generation framework
introduced by Wang et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In this approach, the question-logical form pairs are collected with a
three step process. In the first step, the logical forms are generated from a KG. Secondly, the
logical forms are converted automatically into canonical questions. These canonical questions
are grammatically incorrect but successfully carry the semantic meaning. Lastly, the canonical
questions are converted into natural forms via crowdsourcing. Following are some of the
datasets developed using this approach.
      </p>
      <p>
        GraphQuestions [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] consists of 5,166 natural questions accompanied by two paraphrases of
the original question, an answer, and a valid SPARQL query grounded against the Freebase KG.
GraphQuestions uses a semi-automated three-step algorithm to generate the natural questions
for the KG.
      </p>
      <p>
        LC-QuAD 1.0 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is another semantic parsing dataset for the DBpedia KG. LC-QuAD 1.0
is relatively larger in size with 5,000 natural language English questions and corresponding
SPARQL queries. The generation process starts with the set of manually created SPARQL
query templates, a list of seed entities, and a whitelist of predicates. Using the list of seed
entities, two-hop subgraphs from DBpedia are extracted. The SPARQL query templates consist
of placeholders for both entities and predicates which are instantiated using triples from the
subgraph. These SPARQL queries are then used to instantiate natural question templates which
form the base for manual paraphrasing by humans.
      </p>
      <p>
        LC-QuAD 2.0 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] is the second iteration of LC-QuAD 1.0 with 30,000 questions, their
paraphrases and their corresponding SPARQL queries compatible with both Wikidata and
DBpedia KGs. Similar to LC-QuAD 1.0, in LC-QuAD 2.0 a sub-graph is generated using seed
entities and a SPARQL query template is selected based on whitelist predicates. Then, the query
template is instantiated using the sub-graph. Next, a template question is generated from the
SPARQL query which is then verbalised and paraphrased by AMT crowd workers. LC-QuAD
2.0 has more questions and more variation compared to LC-QuAD 1.0 with paraphrases to the
natural questions.
      </p>
      <p>
        GrailQA [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] extends the approach in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to generate 64,331 question-S-expression pairs
grounded to the Freebase Commons KG. Here, S-expression are linearized forms of graph
queries. Query templates extracted from graph queries generated from the KG are used to
generate canonical logical forms grounded to compatible entities. The canonical logic forms
are then validated by a graduate student if they represent plausible user query or not. Next,
another graduate student annotated the validated canonical logic form with a canonical question.
Finally, 6,685 Amazon Mechanical Turk workers write five natural paraphrases for each canonical
question which are further validated by multiple independent crowd workers.
      </p>
      <p>
        KQA Pro [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] is a large collection of 117,000 complex questions paired with SPARQL queries
for the Wikidata KG. KQA Pro dataset also follows the OVERNIGHT approach where firstly
facts from the KG are extracted. Next, canonical questions are generated with corresponding
SPARQL queries, ten answer choices and a golden answer. The canonical questions are then
converted into natural language with paraphrases using crowd sourcing.
      </p>
      <p>
        CFQ [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] (Compositional Freebase Questions) is a semantic parsing dataset developed
completely using synthetic generation approaches that consists of simple natural language questions
with corresponding SPARQL query against the Freebase KG. CFQ contains 239,357 English
questions which are generated using hand-crafted grammar and inference rules with a
corresponding logical form. Next, resolution rules are used to map the logical forms to SPARQL
queries. The CFQ dataset was specifically designed to measure compositional generalization.
      </p>
      <p>In this work, we loosely follow the OVERNIGHT approach to create a large scholarly KGQA
dataset for the DBLP KG.</p>
    </sec>
    <sec id="sec-3">
      <title>3. DBLP KG</title>
      <p>
        DBLP, which used to stand for Data Bases and Logic Programming6, was created in 1993 by
Michael Ley at the University of Trier, Germany [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The service was originally designed as a
bibliographic database for research papers and proceedings from the fields of database systems
and logic programming. Over time, the service has grown in size and scope, and today includes
bibliographic information on a wide range of topics within the field of computer science. The
DBLP RDF data models a person-publication graph shown in Figure 1.
      </p>
      <p>The DBLP KG contains two main entities: Person and Publication, where as other metadata
such as journal and conferences, afiliation of authors are currently only string literals.
Henceforth, we use the term person and creator interchangeably. At the time of its release, the RDF
dump consisted of 2,941,316 person entities, 6,010,605 publication entities, and 252,573,199
RDF triples. DBLP currently does not provide a SPARQL endpoint but the RDF dump can be
downloaded and a local SPARQL endpoint such as Virtuoso Server can be setup to run a SPARQL
query against the DBLP KG.</p>
      <p>The live RDF data model on the DBLP website follows the schema shown in Figure 1. However,
the RDF snapshots available for download have the coCreatorWith and authorOf predicates
missing. Although these predicates are missing, the authoredBy predicate can be used to derive
the missing relations. DBLP-QuAD is based on the DBLP KG schema of the downloadable RDF
graph.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset Generation Framework</title>
      <p>In this work, the aim is to generate a large variety of scholarly questions and corresponding
SPARQL query pairs for the DBLP KG. Initially, a small set of templates  containing a SPARQL
query template  and a few semantically equivalent natural language question templates 
are created. The questions and query templates are created such that they cover a wide range of
scholarly metadata user information need while also being answerable using a SPARQL query
against the DBLP KG. Next, we synthetically generate a large set of question-query pairs (, )
suitable for training a neural network semantic parser.</p>
      <p>The core methodology of the dataset generation framework encompasses instantiating the
templates using literals of subgraphs sampled from the KG. Moreover, to capture diferent
representations of the literal values from a human perspective, we randomly mix in diferent
augmentations of these textual representations. The dataset generation workflow is shown in
Figure 2.</p>
      <sec id="sec-4-1">
        <title>4.1. Templates</title>
        <p>The first step in the dataset generation process starts with the creation of a template set. After
carefully analyzing the ontology of the DBLP KG, we manually wrote 98 pairs of valid SPARQL
query templates and a set of semantically equivalent natural language question templates. The
template set was written by one author and verified for correctness by another author. The
query and question templates consist of placeholder markers instead of URIs, entity surface
forms or literals. For example, in Figure 2 (Section 1), the SPARQL query template includes the
placeholders ?1 and [   ] for DBLP person URI and venue literal respectively. Similarly,
the question templates include placeholders [ _  ] and [   ] for creator
name and venue literal respectively. The template set covers the two entities creator and
publication, and additionally the foreign entity bibtex type. Additionally, they also cover the 11
diferent predicates of DBLP KG.</p>
        <p>The template set consists of template tuples. A template tuple  = (, , , ) is composed
of a SPARQL query template , a set of semantically equivalent natural language question
templates , a set of entity placeholders  and a set of predicates  used in . We also
add a boolean indicating whether the query template is temporal or not and another boolean
indicating whether to use or not use the template while generating  dataset. Each template
tuple contains between four and seven paraphrased question templates ofering wide linguistic
diversity. While most of the question templates use the "Wh-" question keyword, we also include
instruction-style paraphrases.</p>
        <p>We group the template tuples as creator-focused or publication-focused  and further group
them by query types  . We have 10 diferent query types and they include Single Fact, Multiple
Facts, Boolean, Negation, Double Negation, Double Intent, Union, Count,
Superlative/Comparative, and Disambiguation. The question types are discussed in Section 4.6 with examples.
The distribution of templates per entity and query type is shown in Table 1. During dataset
generation, for each data instance we sample a template tuple from the template set using
stratified sampling maintaining equal distribution of entity types and query types.</p>
        <p>Query Type</p>
        <p>Single Fact
Multiple Facts</p>
        <p>Boolean</p>
        <p>Negation
Double Negation</p>
        <p>Double Intent</p>
        <p>Union</p>
        <p>Count
Superlative/Comparative</p>
        <p>Disambiguation</p>
        <p>Total</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Subgraph generation</title>
        <p>The second part of the dataset generation framework is subgraph generation. Given a graph
 = (, ) where  are the vertices, and  are edges, we draw a subgraph  = (, ) where
 ⊂  ,  ⊂ . For the DBLP KG,  are the creator and publication entity URIs or literals, and
the  are the predicates of the entities.</p>
        <p>The subgraph generation process starts with random sampling of a publication entity  from
the DBLP KG. We only draw from the set of publication entities as the RDF snapshot available
for download has ℎ and  ℎ predicates missing for creator entity. As
such, a subgraph centered on a creator entity would not have end vertices that can be expanded
further. With the sampled publication entity , we iterate through all the predicates  to extract
creator entities ′ as well as the literal values. We further, expand the creator entities and extract
their literal values to form a two-hop subgraph  = (, ) as shown in Figure 2 (Section 2).</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Template Instantiation</title>
        <p>Using the generated subgraph and the sampled template tuple, the template tuple is instantiated
with entity URIs and literal values from the subgraph. In the instantiation process, a placeholder
marker in a string is replaced by the corresponding text representation.</p>
        <p>For the SPARQL query template , we instantiate the creator/publication placeholder markers
with DBLP creator/publication entity URIs or literal values for afiliation and conference or
journals to create a valid SPARQL query  that returns answers when run against the DBLP KG
SPARQL endpoint.</p>
        <p>In case of natural language question templates, we randomly sample two from the set of
question templates 1, 2 ∈  , and instantiate each using only the literal values from the
subgraph to form one main natural language question 1 and one natural language question
paraphrase 2. In natural language, humans can write the literal strings in various forms. Hence
to introduce this linguistic variation, we randomly mix in alternate string representations of
these literal values in both natural language questions. The data augmentation process allows
us to add heuristically manipulated alternate literal representations to the natural questions. A
example of an instantiated template is shown in Figure 2 (Section 3).</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Data Augmentation</title>
        <p>For the template instantiation process, we perform simple string manipulations to generate
alternate literal representations. Then, we randomly select between the original literal
representation and the alternate representation to instantiate the natural language questions. For
each literal type, we apply diferent string manipulation techniques which we describe below.</p>
        <p>Names: For names we generate four diferent alternatives involving switching parts of names
or keeping only initials of the names. Consider the name John William Smith for which we
produce Smith, John William, J. William Smith, John W. Smith, and Smith, J. William.</p>
        <p>Venues: Venues can be represented using either its short form or its full form. For example,
ECIR or European Conference on Information Retrieval. In DBLP venues are stored in its short
form. We use a selected list of conference and journals7 containing the short form and its
equivalent full form to get the full venue names.</p>
        <p>Duration: About 20% of the templates contain temporal queries, and some of them require
dummy numbers to represent duration. For example, the question "In the last five years, which
7http://portal.core.edu.au/conf-ranks/?search=&amp;by=all&amp;source=CORE2021&amp;sort=atitle&amp;page=1
papers did Mante S. Nieuwland publish?" uses the dummy value five . We randomly select between
the numerical representation and the textual representation for the dummy duration value.</p>
        <p>Afiliation : In natural language questions, only the institution name is widely used to refer
to the afiliation of an author. However, the DBLP KG uses the full address of an institution
including city and country name. Hence, using RegeEx we extract the institution names and
randomly select between the institution name and the full institution address in the instantiation
process.</p>
        <p>Keywords: For disambiguation queries, we do not use the full title of a publication but rather
a part of it by extracting keywords. For this purpose, we use SpaCy’s Matcher API8 to extract
noun phrases from the title.</p>
        <p>Algorithm 1: Dataset Generation Process</p>
        <p>GenerateDataset (, , , )
inputs : template set  ; dataset set to generate ; size of dataset to generate  ; KG
to sample subgraphs from ;
output : dataset ;
 ← ∅ ;
 ← (/| |)/| |;
foreach  ∈  do
foreach  ∈  do</p>
        <p>0;
 ←
 ←  [][];
if  ==  then</p>
        <p>←  (, _ ==  )
while  &lt;  do
1, 2 ← ℎ(, 2);
 ← .();
 ← (, 1, 2, );
 ← ();
if  then
 ← ;
 ←  + 1;
return D</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Dataset Generation</title>
        <p>For each data instance , we sample 2 subgraphs (SampleSubgraph(G,2)) and instantiate a
template tuple  (Instantiate(, 1, 2, x)). We sample 2 subgraphs as some template tuples require
to be instantiated with two publication titles. Each data instance  = (, 1, 2, , , , )
comprises of a valid SPARQL query , one main natural language question 1, one semantically
equivalent paraphrase of the main question 2, a list of entities  used in , a list of predicates
 used in , a Boolean indicating whether the SPARQL query is temporal or not , and another
Boolean informing whether the SPARQL query is found only in  and  sets . We
generate an equal number  of questions for each entity group  equally divided for each query
type  .</p>
        <p>
          To foster a focus on generalization ability, we manually marked 20 template tuples to withhold
during generation of the  set. However, we use all the template tuples in the generation
of  and  sets. Furthermore, we also withhold 2 question templates when generating
 questions but use all question templates when generating  and  sets. This
controlled generation process allows us to withhold some entity classes, predicates and
paraphrases from  set. Our aim with this control is to create a scholarly KGQA dataset that
facilitates development of KGQA models that adhere to i.i.d, compositional, and zero-shot [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]
generalization.
        </p>
        <p>Further, we validate each data instance  by running the SPARQL query  against the DBLP
KG via a Virtuoso SPARQL endpoint9. We filter out data instances for which the SPARQL query
is invalid or generates a blank response. A SPARQL query may generate a blank response if the
generated subgraphs have missing literal values. In the DBLP KG, some of the entities have
missing literals for predicates such as primaryAfiliation , orcid, wikidata, and so on. Additionally,
we also store the answers produced by the SPARQL query against the DBLP KG formatted
according to https:// www.w3.org/ TR/ sparql11-results-json/ . The dataset generation process is
summarized in Algorithm 1.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. Types of Questions</title>
        <p>The dataset is composed of the following question types. The examples shown here are
handpicked from the dataset.</p>
        <p>• Single fact: These questions can be answered using a single fact. For example, “What
year was ‘SIRA: SNR-Aware Intra-Frame Rate Adaptation’ published?”
• Multiple facts: These questions require connecting two or more facts to answer. For
example, “In SIGCSE, which paper written by Darina Dicheva with Dichev, Christo was
published?”
• Boolean: These questions answer where a given fact is true or false. We can also add
negation keywords to negate the questions. For example, “Does Szeider, Stefan have an
ORCID?”
• Negation: These questions require to negate the answer to the Boolean questions. For
example, “Did M. Hachani not publish in ICCP?”
• Double negation: These questions require to negate the Boolean question answers twice
which results. For example, “Wasn’t the paper ‘Multi-Task Feature Selection on Multiple
Networks via Maximum Flows’ not published in 2014?”
• Count: These questions pertain to the count of occurrence of facts. For example, “Count
the authors of ‘Optimal Symmetry Breaking for Graph Problems’ who have Carnegie
Mellon University as their primary afiliation.”
• Superlative/Comparative: Superlative questions ask about the maximum and minimum
for a subject and comparative questions compare values between two subjects. We group
both types under one group. For example, “Who has published the most papers among
the authors of ‘k-Pareto optimality for many-objective genetic optimization’?”
• Union questions cover a single intent but for multiple subjects at the same time. For
example, “List all the papers that Pitas, Konstantinos published in ICML and ISCAS.”
• Double intent questions poses two user intentions, usually about the same subject. For
example, “In which venue was the paper ‘Interactive Knowledge Distillation for image
classification’ published and when?”
• Disambiguation questions requires identifying the correct subject in the question. For
example, “Which author with the name Li published the paper about Buck power
converters?”</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Dataset Statistics</title>
      <p>DBLP-QuAD consists of 10,000 unique question-query pairs grouped into train, valid and test
sets with a ratio of 7:1:2. The dataset covers 13,348 creators and publications, and 11 predicates
of the DBLP KG. For each query type in Table 1, the dataset includes 1,000 question-query pairs
each of which is equally divided as creator-focused or publication-focused. Additionally, among
the questions in DBLP-QuAD, 2,350 are temporal questions.</p>
      <p>Linguistic Diversity. In DBLP-QuAD, a natural language question has an average word
length of 17.32 words and an average character length of 114.1 characters. Similarly, a SPARQL
query has an average vocab length of 12.65 and an average character length of 249.48 characters.
Between the natural language question paraphrases, the average Jaccard similarity for unigram
and bigram are 0.62 and 0.47 (with standard deviations of 0.22 and 0.24) respectively. The
average Levenshtein edit distance between them is 32.99 (with standard deviation of 23.12).
We believe the metrics signify a decent level of linguistic diversity.</p>
      <p>Entity Linking. DBLP-QuAD also presents challenging entity linking with data
augmentation performed on literals during the generation process. The augmented literals present more
realistic and natural representation of the entity surface forms and literals compared to the
entries in the KG.</p>
      <p>Generalization. In the valid set 18.9% and in the test set 19.3% of instances were generated
using the withheld templates. Hence, these SPARQL query templates and natural language
question templates are unique to the valid and test sets. Table 2 shows the percent of questions
with diferent levels of generalization in the valid and test sets of the dataset.</p>
      <p>Dataset</p>
      <p>Valid
Test</p>
      <p>I.I.D
82.8%
81.2%</p>
      <p>Compositional
13.6%
15.1%</p>
      <p>Zero-shot
3.6%
3.8%</p>
    </sec>
    <sec id="sec-6">
      <title>6. Semantic Parsing Baseline</title>
      <p>
        To lay the foundation for future work on DBLP-QuAD, we also release baselines using the
recent work by Banerjee et al. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], where a pre-trained T5 model is fine-tuned [ 25] on the
LC-QuAD 2.0 dataset.
      </p>
      <p>
        Following Banerjee et al. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], we assume the entities and the relations are linked, and only
focus on query building. We formulate the source as shown in Figure 3, where for each natural
language question a prefix “ parse text to SPARQL query:” is added. The source string is
further concatenated with entity URIs and relation schema URIs separated by a special token
[ ]. The target text is the corresponding SPARQL query which is padded with the tokens
&lt;  &gt;&lt; / &gt;. We also make use of the sentinel tokens provided by T5 to represent the
DBLP prefixes e.g. &lt;extra_id_1&gt; denotes the prefix https://dblp.org/pid/, SPARQL vocabulary and
symbols. This step helps the T5-tokenizer to correctly fragment the target text during inference.
      </p>
      <p>We fine-tune T5-Base and T5-Small on DBLP-QuAD train set with a learning rate of 1e-4 for
5 epochs with an input as well as output text length of 512 and batch size of 4.</p>
      <sec id="sec-6-1">
        <title>6.1. Experiment Results</title>
        <p>We report the performance of the baseline model on the DBLP-QuAD test set. Firstly, we
report on the exact-match between the gold and the generated SPARQL query. For the
exactmatch accuracy we compare the generated and the gold query token by token after removing
whitespaces. Next, for each SPARQL query on the test set, we run both the gold and and the
query generated by the T5 baseline models using Virtuoso SPARQL endpoint to fetch answers
from the DBLP KG. Based on the answers collected, we report on the F1 score. The results are
reported on Table 3.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Limitations</title>
      <p>
        One of the drawbacks of our dataset generation framework is that natural questions are
synthetically generated. (CFQ [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] has a similar limitation.) Although the question templates were
human-written, only two people (authors of the paper) worked on the creation of the question
      </p>
      <p>Evaluation metrics
Exact-match Accuracy</p>
      <p>F1 Score
templates and was not crowd sourced from a group of researchers. Additionally, the questions
are generated by drawing data from a KG. Hence, the questions may not perfectly reflect the
distribution of user information need. However, the machine-generation process allows for
programmatic configuration of the questions, setting question characteristics, and controlling
dataset size. We utilize the advantage by programmatically augmenting text representations
and generating a large scholarly KGQA with complex SPARQL queries.</p>
      <p>Second, in generating valid and test sets, we utilize additional 19 template tuples which
account for about 20% of the template set. Therefore, the syntactic structure for 80% of the
generated data in valid and test would already be seen in the train set resulting in test leakage.
However, to limit the leakage on 80% of the data, we withhold 2 question templates in generating
the  set. Moreover, the data augmentation steps carried out would also add challenges in
the  and  sets.</p>
      <p>Another shortcoming of DBLP-QuAD is that the paper titles do not perfectly reflect user
behavior. When a user asks a question, they do not type in the full paper title and also some
papers are popularly known by a diferent short name. For example, the papers “Language
Models are Few-shot Learners” and “BERT: Pre-training of Deep Bidirectional Transformers
for Language Understanding” are also known as “GPT-3” and “BERT” respectively. This is a
challenging entity linking problem which requires further investigation. Despite the
shortcomings, we feel the large scholarly KGQA dataset would ignite more research interest in scholarly
KGQA.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>In this work, we presented a new KGQA dataset called DBLP-QuAD. The dataset is the largest
scholarly KGQA dataset with corresponding SPARQL queries. The dataset contains a wide
variety of questions and query types and we present the data generation framework and baseline
results. We hope this dataset proves to be a valuable resource for the community.</p>
      <p>As future work, we would like to build a robust question answering system for scholarly data
using this dataset.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Acknowledgements</title>
      <p>This research was supported by grants from NVIDIA and utilized NVIDIA 2 x RTX A5000 24GB.
Furthermore, we acknowledge the financial support from the Federal Ministry for Economic
Afairs and Energy of Germany in the project CoyPu (project number 01MK21007[G]) and
the German Research Foundation in the project NFDI4DS (project number 460234259). This
research is additonally funded by the “Idea and Venture Fund“ research grant by Universität
Hamburg, which is part of the Excellence Strategy of the Federal and State Governments.
3477495.3531841. arXiv:2204.12793.
[25] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu,
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, J.
Mach. Learn. Res. 21 (2020) 1–67.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bollacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paritosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sturge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , Freebase:
          <string-name>
            <given-names>A Collaboratively</given-names>
            <surname>Created</surname>
          </string-name>
          <article-title>Graph Database for Structuring Human Knowledge</article-title>
          ,
          <source>in: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, AcM</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1247</fpage>
          -
          <lpage>1250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Van Kleef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>DBpedia - A Large-Scale</surname>
          </string-name>
          ,
          <article-title>Multilingual Knowledge Base Extracted from Wikipedia, Semantic Web (</article-title>
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Vrandečić</surname>
            , Denny and Krötzsch, Markus, Wikidata:
            <given-names>A Free</given-names>
          </string-name>
          <string-name>
            <surname>Collaborative Knowledge Base</surname>
          </string-name>
          ,
          <source>Communications of the ACM</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Höfner</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Lehmann,</surname>
          </string-name>
          <article-title>AskNow: A Framework for Natural Language Query Formalization in SPARQL</article-title>
          , in: H.
          <string-name>
            <surname>Sack</surname>
            , E. Blomqvist, M. d'Aquin,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Ghidini</surname>
            ,
            <given-names>S. P.</given-names>
          </string-name>
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          Lange (Eds.),
          <source>The Semantic Web. Latest Advances and New Domains</source>
          , Springer International Publishing, Cham,
          <year>2016</year>
          , pp.
          <fpage>300</fpage>
          -
          <lpage>316</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lukovnikov</surname>
          </string-name>
          , G. Maheshwari,
          <string-name>
            <given-names>P.</given-names>
            <surname>Trivedi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <article-title>Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1907</year>
          .09361. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>1907</year>
          .
          <volume>09361</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Perevalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kovriguina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Both</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <article-title>Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis</article-title>
          ,
          <source>in: Proceedings of the Thirteenth Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>2998</fpage>
          -
          <lpage>3007</lpage>
          . URL: https: //aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>321</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Jaradeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stocker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <article-title>Question answering on scholarly knowledge graphs</article-title>
          ,
          <source>in: International Conference on Theory and Practice of Digital Libraries</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Know what you don't know: Unanswerable questions for SQuAD</article-title>
          , arXiv preprint arXiv:
          <year>1806</year>
          .
          <volume>03822</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kwiatkowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palomaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Redfield</surname>
          </string-name>
          , M. Collins,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Alberti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Epstein</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , et al.,
          <article-title>Natural questions: a benchmark for question answering research, Transactions of the Association for Computational Linguistics 7 (</article-title>
          <year>2019</year>
          )
          <fpage>453</fpage>
          -
          <lpage>466</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Trivedi</surname>
          </string-name>
          , G. Maheshwari,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Lehmann,</surname>
          </string-name>
          <article-title>LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs</article-title>
          , in: C.
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Tamma</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Lecue</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Cudré-Mauroux</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lange</surname>
          </string-name>
          , J. Heflin (Eds.),
          <source>The Semantic Web - ISWC</source>
          <year>2017</year>
          , volume
          <volume>10588</volume>
          , Springer International Publishing, Cham,
          <year>2017</year>
          , pp.
          <fpage>210</fpage>
          -
          <lpage>218</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -68204-4_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Aji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Safari</surname>
          </string-name>
          ,
          <article-title>Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering</article-title>
          ,
          <source>arXiv preprint arXiv:2210.01613</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          ,
          <article-title>Large-scale semantic parsing via schema matching and lexicon extension, in: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2013</year>
          , pp.
          <fpage>423</fpage>
          -
          <lpage>433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Haarmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Röder</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Napolitano, 7th Open Challenge on Question Answering over Linked Data (QALD-7)</article-title>
          , in: M.
          <string-name>
            <surname>Dragoni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Solanki</surname>
          </string-name>
          , E. Blomqvist (Eds.),
          <source>Semantic Web Challenges</source>
          , volume
          <volume>769</volume>
          , Springer International Publishing, Cham,
          <year>2017</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>69</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -69146-
          <issue>6</issue>
          _
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] W.-t. Yih,
          <string-name>
            <given-names>M.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meek</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Suh,</surname>
          </string-name>
          <article-title>The Value of Semantic Parse Labeling for Knowledge Base Question Answering, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Berlin, Germany,
          <year>2016</year>
          , pp.
          <fpage>201</fpage>
          -
          <lpage>206</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P16</fpage>
          -2033.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Talmor</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Berant,</surname>
          </string-name>
          <article-title>The Web as a Knowledge-base for Answering Complex Questions</article-title>
          ,
          <year>2018</year>
          . arXiv:
          <year>1803</year>
          .06643.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Berant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Frostig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Semantic Parsing on Freebase from QuestionAnswer Pairs</article-title>
          ,
          <source>in: Proceedings of the 2013 conference on empirical methods in natural language processing</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1533</fpage>
          -
          <lpage>1544</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Berant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Building a semantic parser overnight</article-title>
          ,
          <source>in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2015</year>
          , pp.
          <fpage>1332</fpage>
          -
          <lpage>1342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sadler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Srivatsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Gur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>On Generating Characteristicrich Question Sets for QA Evaluation</article-title>
          ,
          <source>in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Austin, Texas,
          <year>2016</year>
          , pp.
          <fpage>562</fpage>
          -
          <lpage>572</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D16</fpage>
          -1054.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdelkawi</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Lehmann,</surname>
          </string-name>
          <article-title>LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia</article-title>
          , in: C.
          <string-name>
            <surname>Ghidini</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maleshkova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Svátek</surname>
            ,
            <given-names>I. Cruz</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lefrançois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gandon</surname>
          </string-name>
          (Eds.),
          <source>The Semantic Web - ISWC</source>
          <year>2019</year>
          , volume
          <volume>11779</volume>
          , Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>69</fpage>
          -
          <lpage>78</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -30796-
          <issue>7</issue>
          _
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vanni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sadler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <surname>Beyond I.I.D.</surname>
          </string-name>
          :
          <article-title>Three Levels of Generalization for Question Answering on Knowledge Bases</article-title>
          ,
          <source>in: Proceedings of the Web Conference</source>
          <year>2021</year>
          , ACM, Ljubljana Slovenia,
          <year>2021</year>
          , pp.
          <fpage>3477</fpage>
          -
          <lpage>3488</lpage>
          . doi:
          <volume>10</volume>
          .1145/3442381. 3449992.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>H. Zhang,</surname>
          </string-name>
          <article-title>KQA pro: A dataset with explicit compositional programs for complex question answering over knowledge base, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>6101</fpage>
          -
          <lpage>6119</lpage>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2022</year>
          .
          <article-title>AssociationforComputationalLinguistics-long</article-title>
          .
          <volume>422</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D.</given-names>
            <surname>Keysers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schärli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Scales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Buisman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Furrer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kashubin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Momchev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sinopalnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Stafiniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tihon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tsarkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          , M. van
          <string-name>
            <surname>Zee</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Bousquet</surname>
          </string-name>
          ,
          <source>Measuring Compositional Generalization: A Comprehensive Method on Realistic Data</source>
          ,
          <year>2020</year>
          . arXiv:
          <year>1912</year>
          .09713.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ley</surname>
          </string-name>
          , The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives, in: G. Goos,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hartmanis</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. van Leeuwen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H. F.</given-names>
            <surname>Laender</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. L.</surname>
          </string-name>
          Oliveira (Eds.),
          <source>String Processing and Information Retrieval</source>
          , volume
          <volume>2476</volume>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2002</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . doi:
          <volume>10</volume>
          .1007/3-540-45735-
          <issue>6</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>D.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <article-title>Modern Baselines for SPARQL Semantic Parsing</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2260</fpage>
          -
          <lpage>2265</lpage>
          . doi:
          <volume>10</volume>
          .1145/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>