<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge Base Question Answering by Transformer-Based Graph Pattern Scoring</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcel Lamott</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jörn Hees</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrian Ulges</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)</institution>
          ,
          <addr-line>Kaiserslautern</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Hochschule Bonn-Rhein-Sieg</institution>
          ,
          <addr-line>Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>RheinMain University of Applied Sciences</institution>
          ,
          <addr-line>Wiesbaden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Question Answering (QA) has gained significant attention in recent years, with transformer-based models improving natural language processing. However, issues of explainability remain, as it is dificult to determine whether an answer is based on a true fact or a hallucination. Knowledge-based question answering (KBQA) methods can address this problem by retrieving answers from a knowledge graph. This paper proposes a hybrid approach to KBQA called FRED, which combines pattern-based entity retrieval with a transformer-based question encoder. The method uses an evolutionary approach to learn SPARQL patterns, which retrieve candidate entities from a knowledge base. The transformer-based regressor is then trained to estimate each pattern's expected F1 score for answering the question, resulting in a ranking of candidate entities. Unlike other approaches, FRED can attribute results to learned SPARQL patterns, making them more interpretable. The method is evaluated on two datasets and yields MAP scores of up to 73 percent, with the transformer-based interpretation falling only 4 pp short of an oracle run. Additionally, the learned patterns successfully complement manually generated ones and generalize well to novel questions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;knowledge graphs</kwd>
        <kwd>question answering</kwd>
        <kwd>transfer learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The field of question answering (QA) has received significant attention, particularly with the rise
of Deep Learning approaches that often focus on large language models. However, while these
models have shown promising results, they sufer from explainability issues and often focus
on knowledge from large text corpora instead of previously extracted and curated knowledge.
Knowledge graphs ofer an alternative approach for ML models to conduct transparent reasoning,
leading to the emergence of the field of knowledge base question answering (KBQA), where the
system retrieves the answer to a question not from a text corpus but from a knowledge graph.
As an example, consider the question</p>
      <p>
        “What language did the ancient Babylonians speak?”
In this case, Babylonians is the entity on which a certain fact is to be retrieved. We refer to this
entity as the source entity in the following. To retrieve the question’s answer, a query in the
well-known query language SPARQL [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] could be used, like this one:
      </p>
      <p>SELECT DISTINCT ?source ?target
WHERE {
?source ns:location.country.languages_spoken ?target .</p>
      <p>}</p>
      <p>Replacing the ?source variable with the source entity from the question and executing the
query against a knowledge base could deliver the answer Babylonian, to which we refer to as
the target entity. We refer to the above type of SPARQL query as a graph pattern in our work.</p>
      <p>In this paper, we present a hybrid approach towards KBQA. We leverage graph patterns in
combination with neural (transformer-based) question interpretation to address the limitations
of large text models and the lack of use of knowledge bases. Given a training set of questions,
each associated with a source entity and answer entities, we use an evolutionary graph pattern
learning algorithm to define a set of graph patterns covering the space of questions represented
in the training set.</p>
      <p>
        Given a new question, we match it to the graph patterns by fusing a neural question
representation with the graph pattern set. To do so, a transformer-based regressor is trained to
predict graph patterns’ F1 scores when retrieving the input question’s answer, and the resulting
predictions are used to fuse the graph pattern’s result candidate sets. Our main contributions in
this paper are:
1. A novel and explainable hybrid KBQA system, combining two main components: a graph
pattern learner (GPL) and a transformer-based scoring approach. The GPL component
learns graph patterns to query a SPARQL endpoint. The patterns are learned with an
evolutionary algorithm, but can also be complemented with manually created patterns
by experts. Additionally, we propose a novel scoring component fusing the pattern-based
generated candidates based on a transformer architecture to understand the natural
language input question.
2. We conduct extensive experiments on versions of the well-known WebQuestionsSP [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
and SimpleQuestions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] datasets. Our experimental results show that our approach
achieves MAP scores of up to 73 %, with the transformer-based interpretation falling only
4 percentage points short of an oracle run.
3. Furthermore, we compare learned and hand-crafted expert patterns, as well as their
combination, and show that the learned patterns successfully complement manually
generated patterns and generalize well to novel questions.
4. To support future research in the area of KBQA, we make our datasets and models publicly
available1.
      </p>
      <p>In the following, we provide an overview of related work in Section 2 before providing more
details on our proposed hybrid approach towards KBQA in Section 3. We describe the datasets
used in the experiments in Section 4 and present our experimental results in Section 5. Finally,
we discuss our results in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Information retrieval (IR) based QA relies on corpora of documents to retrieve answers from.
An example are search engines, many of which provide interfaces to IR based QA by indexing
large amounts of web pages, ranking them based on the search query, and retrieving answer
spans from passages. Recently, a spike in research and public interest in large language models
(LLMs) has not only provided an addition to search engines, but to the field of QA in general.
Commercial LLMs, such as ChatGPT 2 and Bard 3, or open source alternatives such as LLaMA [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
Alpaca [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Dolly 4 have received much attention due to their ability to produce humanlike
responses to open-domain inputs. They rely on large text corpora and sophisticated training
mechanisms incorporating methods from Deep Learning, Reinforcement Learning and expert
annotations. Their fundamental mechanism is to predict the next token which is most likely to
follow the previous tokens of an output and also incorporate the context from a conversation,
i.e., the dialog preceding the current prompt. While these models have demonstrated their
power in text generation, their usage for QA remains disputed: Not only are LLMs intransparent
and not easily explainable with regard to how the model arrived at a certain conclusion, they
are also known to sufer from hallucinations, i.e., cases when an LLM outputs made-up but
seemingly plausible facts not grounded on any training data. Despite recent eforts in faithful
question answering [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], this poses a dangerous drawback for QA applications.
      </p>
      <p>An alternative to this approach is presented by KBQA, which is not only fact grounded, but
also more transparent and explainable, since every answer can be traced back directly to the
underlying KB. KBQA can be divided into two categories: First, methods that directly yield
a ranking of target entities, typically by relying on Deep Learning and/or embeddings of the
knowledge graph. Second (and more similar to our approach), methods that explicitly generate
queries to retrieve information from the underlying KB. Representatives from the latter category
provide advantages with regard to transparency and explainability, as they not only provide
an answer, but also the exact query which retrieved it from the KB. Two examples for each
category are given in the following:</p>
      <p>
        Regarding approaches from the first category, one method that employs Deep Learning on
the knowledge graph is KEQA [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which leverages KG embeddings and a manually designed
distance metric to perform KBQA. To this end, the underlying KG is embedded using a pre-trained
KG embedding algorithm (e.g., TransE [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or TransR [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). A predicate learning model is trained,
which maps a question to a vector in the relation embedding space as the predicted relation
presentation. Similarly, a head entity learning model is trained, which maps a question to a
vector in the entity embedding space as the predicted head entity representation. Furthermore, a
head entity detection model is trained to identify several tokens in the question as the predicted
head entity name, which is used to reduce the number of head entity candidates in the KG. The
output from the three models is used in a manually designed distance metric, which returns
a fact in the KG that minimizes the distance the predicted head entity, predicted relation and
the predicted tail entity, where the latter is predicted using the KG embedding algorithm’s
relation function. KEQA focuses on simple questions referring to a single head entity and a
      </p>
      <sec id="sec-2-1">
        <title>2https://chat.openai.com/ 3https://bard.google.com/ 4https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html</title>
        <p>
          single relation and uses the SimpleQuestions [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] dataset.
        </p>
        <p>
          ReaRev [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] is a recent work achieving start-of-the-art (or nearly) performance on the three
common KBQA datasets WebQuestions [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], ComplexWebQuestions [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and MetaQA [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
Question- and KG-representations are aligned in a common latent space for reasoning over
the KG. To this end, the question is decoded into an initial set of instructions, which are dense
representations that are matched against relations in the KG. ReaRev then uses two alternating
steps: The reasoning step uses the instructions to perform a KG traversal and results in a set of
node representations, and the revise step uses the reasoning output to refine the instructions.
During the reasoning step, the execution order of the instructions is decided by the model
on the fly by emulating breadth-first search with graph neural networks. Both steps are used
alternately for  times to iteratively refine the resulting node representations. Afterwards, a
binary, GNN-based node classification is performed to select nodes which represent an answer
to the question. Our approach difers from KEQA and ReaRev in that both either rely on Deep
Learning / KG embeddings rather than explicit graph patterns.
        </p>
        <p>
          Regarding the second category of KBQA systems, one approach based on query construction
is SPBERT [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], which uses an adapted BERT [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] model (named BERT2SPBERT) to construct
a SPARQL query for a given question. To do so, changes in BERT’s self-attention layers are
introduced to allow the model to be used as a decoder, as BERT usually only represents an
encoder. Pre-training of the model employs a massive amount of SPARQL query logs (6.8M) from
the public DBpedia endpoint on two tasks: Masked Language Modeling and Word Structural
Objective. Fine-tuning uses five diferent datasets in SPARQL query construction: QALD-9 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ],
LC-QuAD [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and three adaptations of Mon [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>
          QUINT [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] learns an ensemble of SPARQL-style query templates from questions and their
answers. During training, questions are mapped to relations and types using lexicons, which
have been created using distant supervision from a corpus of web pages annotated with Freebase
entities. QUINT constructs query templates in three steps: 1. The smallest subgraph containing
all of the question’s and its answer entities is retrieved and converted into a backbone query. 2.
The backbone query is aligned to the question using the lexicons and the question’s dependency
parse tree. 3. A generalization step converts both the question and its corresponding query
into templates. During interference, these templates are matched against a given question
and ranked with a random forest classifier. For evaluation purposes, QUINT uses the datasets
WebQuestions [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and Free917 [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>Overall, our approach difers from SPBERT in that it does not generate queries on the fly
but rather learns a repository of graph patterns, and from QUINT since we generate the graph
patterns based on an evolutionary search.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <p>Our approach is illustrated in Figure 1: We assume a set of training questions to be given, each
question  containing a single source entity  from a knowledge graph. We assume the mention
of the entity in the question to be known, as well as a set of correct answer (or target) entities  .
Given this information, our approach features three key components: A graph pattern learner
(GPL, gray) learns typical constellations in the KG between source entities and target entities,</p>
      <sec id="sec-3-1">
        <title>1. Learn Graph</title>
      </sec>
      <sec id="sec-3-2">
        <title>Patterns</title>
      </sec>
      <sec id="sec-3-3">
        <title>2. Apply Graph</title>
      </sec>
      <sec id="sec-3-4">
        <title>Patterns</title>
      </sec>
      <sec id="sec-3-5">
        <title>3. Predict Patterns’</title>
      </sec>
      <sec id="sec-3-6">
        <title>F1 Scores</title>
      </sec>
      <sec id="sec-3-7">
        <title>4. Score</title>
      </sec>
      <sec id="sec-3-8">
        <title>Candidates</title>
        <p>training data
source
entities
United States
of America
question
question</p>
        <p>text
What is the name
of the currency
used in China?
answer
entities
United States</p>
        <p>dollar
source
entity
China</p>
        <p>GPL</p>
        <p>SPARQL
FRED</p>
        <p>Scoring</p>
        <p>graph patterns
?source fb:location.country.currency_used ?target</p>
        <p>candidate entities per pattern
BASIC countries, Renmimbi</p>
        <p>Group of Five,
China-CELAC Forum
simplified Chinese,
traditional Chinese
Fitness (F1) predictions
...
...
...</p>
        <p>ranked entities
1. Renmimbi 4. BASIC countries
2. Simplified Chinese 5. Traditional Chinese
3. Group of Five ...
resulting in a set of graph patterns 1, ..., . Each pattern is a SPARQL query, and acts as
a function that maps source entities  to sets of answer entities (). For example, given
the source entity s=m.06mt91 (Rihanna), the following pattern could retrieve her nationality:
SELECT DISTINCT ?t WHERE { ?s ns:people.person.nationality ?t.} Other,
more complex patterns could retrieve the names of her albums, etc.</p>
        <p>
          Given an input question, we feed its source entity  to all graph patterns, yielding sets of
candidate entities 1(), ..., (). The key concern is to estimate which patterns answer the
question “well”, such that their candidates should be prioritized. To do so, we use a
transformer-based prediction model (yellow) which – given a question , source entity , and pattern
 – estimates the 1 score of using ’s candidate set compared to the question’s true answer
set  . We use this estimated score to rank the patterns’ candidates (red).
3.1. Graph Pattern Learning
The foundation of our approach is a set of graph patterns, each a SPARQL query that takes a
source entity (mentioned in a question) and returns a set of target entities. We would like the
overall set of SPARQL patterns to match the true target entities with high recall and precision.
To learn said patterns, we utilize the graph pattern learner (GPL) by Hees et al. [
          <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
          ]: A
population of SPARQL queries is grown by an evolutionary search, which applies various mating
and mutation operations such as inserting and grounding variables, splitting and merging
variables, expanding nodes and simplifying the pattern, among others. As a pattern’s fitness
score, the GPL uses several indicators of the coverage and precision with which the pattern
yields target entities, how well the pattern complements the population, and how complex it is.
        </p>
        <p>As training data, the GPL utilizes a set of pairs of source and target entities. We derive these
pairs from training questions: Assuming that a question features a single source entity  and a
limited number of target entities 1, .., , we derive  pairs (, 1), ..., (, ). We collect all
these pairs in a ground truth set  . Given this ground truth, we apply the GPL with its default
parameters.</p>
        <p>Preprocessing For performance reasons, the number of source-target pairs should be limited
when executing GPL training5. Therefore, we partition  into chunks  1, ...,   and run
a separate GPL training per chunk. To ease the discovery of patterns, we group semantically
similar questions into the same chunk using their relation IDs (as later explained in Section 4).
Alternatively, questions could be clustered based on their text embeddings, for example (which
we leave for future work).</p>
        <p>
          Post-processing Note that the sets of patterns resulting from diferent chunks may contain
redundant and/or similar patterns. Therefore, we apply clustering with diferent models and
metrics, and then select the best patterns (with minimal loss in precision) from each cluster as a
representative (for details, refer to [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]). This reduces the overall set of graph patterns by ≈ 91
percent.
3.2. 1 pREDictor (FRED)
Given a question  with source entity  and a set of graph patterns, we estimate which patterns
match the question “well”. We measure this goodness-of-fit by the 1 score 1((),  ) between
the pattern’s retrieved candidates () and the true answer entities  , i.e. a graph pattern 
is defined to match a question  “well” if it retrieves the desired answer entities  with high
precision and recall.
        </p>
        <p>Since the true answer set  is unknown in practice, we estimate the 1 score using a regressor.
This regressor combines a representation of the question  (derived by a transformer encoder)
with a representation of the pattern (derived from its 1 scores on training questions).</p>
        <p>Pattern Representation: Given a pattern  and  training questions with
source entities 1, ...,  and answer sets 1, ..., , we collect the pattern’s 1 scores
1((1), 1), ..., 1((), ) in an -dimensional vector p. This vector carries the
information on which questions the pattern tends to work well, and is used as an embedding for the
pattern. Note that p is usually sparse, since an individual pattern answers a specific type of
question.</p>
        <p>
          Question Representation: We feed the question  into a BERT encoder [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Following
common practice, we define the question representation q by prepending a special classifier
token at position zero and using this token’s output embedding:
        </p>
        <p>q =  ()0
Also, to obtain a textual representation s of the source entity , we average those tokens , ..., 
5This is because the GPL – to evaluate patterns’ fitness during the training process – executes a SPARQL query
against the knowledge graph for each pattern and for all source-target pairs in batch fashion, which can lead to
timeouts.
that cover ’s mention in the question:
s =</p>
        <p>1
 −  + 1
·</p>
        <p>=
∑︁  ()
Both embeddings q and s feature  = 768 dimensions.</p>
        <p>Regressor: The regression model combines a question’s text representations q, s with 
graph pattern representations p1, ..., p: We first densify each pattern’s representation
p by
mapping it to the same dimensionality as the text embeddings:
p = ℎ( · p)</p>
        <p>where  ∈
matrix P</p>
        <p>R2×  is a learned matrix. We then collect all patterns’ dense embeddings in a
∈ R2×  and predict all patterns’ 1 scores as:
^ 1(q, s, P) = 
︁(
(q ∘ s) · P)︁
·  ′)︁
where ∘ denotes vector concatenation,  is the sigmoid function, and  ′ ∈ R×  is another
learned matrix. This results in  scores that approximate the 1 predictions for the  graph
patterns.</p>
        <p>Training: We train the regressor (including all BERT parameters,  and  ′) in a supervised
fashion on a set of target questions: Given a set of  questions and  graph patterns, we obtain
 ×  training samples, since the answer sets  of training questions are known, and we can
compute patterns’ true 1 scores as ground truth. We start training from a pre-trained BERT
encoder, and minimize a weighted mean squared error loss, whereas we assign a 10 times higher
weight to nonzero 1 scores (see details on hyperparameters in Section 5.1).
3.3. Scoring
^ 1, ..., ^ 1.</p>
        <p>1
Scoring is given a question  with source entity  and a set of graph patterns whose 1 scores
have been estimated by FRED as described in the last section. We denote these estimates with
score of a candidate  ∈  is then defined as:</p>
        <p>We apply each graph pattern to the source entity, obtaining sets of candidate entities

1(), ..., (). The overall set of candidate entities is defined as  := ∪=1(). The

=1</p>
        <p>() = ∑︁ ^ 1 · 1({}, ())
(1)
i.e. a candidate will get a high score if it appears as one of few results in patterns that supposedly
ift the question well. We rank candidates by their descending score.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Datasets</title>
      <p>
        We use the datasets WebQuestionsSP [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and SimpleQuestions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for our experiments. Both
are based on the popular knowledge graph Freebase6 [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. WebQuestionsSP is a collection of
4 737 English questions where each question includes a precise SPARQL query, Freebase IDs for
source- and answer entities and the source entity mention in the question. SimpleQuestions is
a collection of 108 442 simple, English questions, each with Freebase IDs for the single source
entity, the predicate and one answer entity (for questions with &gt; 1 plausible answers, one has
been selected randomly).
      </p>
      <p>For both datasets, we build custom splits by shufling the entire dataset and running it
through a filtering pipeline to ensure data integrity and to ensure compatibility with our
approach. Further, some measures are taken to limit compute load with the graph pattern
learning and subsequent processing of generated patterns.</p>
      <p>• Data Quality: We remove erroneous instances (such as duplicates or questions with
missing source entities), and drop questions with value answers, since these cannot be
covered by the GPL. For WebQuestionsSP, we only keep questions with a maximum
number of answers of 1 and 5 respectively, resulting in two diferent versions of the
dataset. This simultaneously eliminates outlier questions with a large number of answers
(up to 3 688) and also tests the approach’s applicability to answer questions with multiple
answers.
• Data Balance: As discussed in Section 3.1, we partition the training set of source-target
pairs into semantically coherent chunks. To do so, the relation ID is introduced: For
WebQuestionsSP this refers to the inferential chain, a set of predicates used in a question’s
SPARQL query. For SimpleQuestions it refers to the given predicate. To preserve dataset
balance, we only keep relation IDs with at least 10/5 and at most 200/200 questions for
SimpleQuestions/WebQuestionsSP, and randomly select at most 30 source-target pairs
per source entity.
• Downscaling: To limit the GPL’s execution time, we also downsample the unique source
entities to a total number of 2000.</p>
      <p>For each setting, we use the remaining data to randomly sample 4 versions of each dataset,
over which we report averaged results. We split into training data (90%), on which we train the
GPL and FRED, validation data (5%) which we use in GPL training and for early stopping, and
test data (5%) on which we report results. During splitting, we ensure that questions with the
same source entity are assigned to the same split.</p>
      <p>Finally, for the WebQuestionsSP based datasets we utilize the expert-generated patterns that
come with the dataset (we remove the FILTER statements to generalize the patterns, as we
found these statements to usually refer to question specific additional entities or values). Since
we will use these expert patterns as drop-in replacements for the learned GPL patterns, we will
use only those from the training splits. The metrics of the grouped datasets and the number of
expert patterns for WebQuestionsSP based datasets are shown in Table 1.</p>
      <sec id="sec-4-1">
        <title>6We use a dump from 8th September 2015.</title>
        <p>Dataset group
WQSP1
WQSP5</p>
        <p>SQ</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results</title>
      <p>This section presents our setup of the training and evaluating our model setup (Section 5.1),
and discusses quantitative results: Control runs give an upper bound on performance based
on the generated GPL graph patterns (Section 5.2), a comparison to question agnostic scoring
approaches evaluates the added value of FRED’s incorporation of text semantics into the answer
candidate ranking (Section 5.3), and an ablation study evaluates to what extent FRED relies
on the source entity mention during inference (Section 5.4). A comparison between learned
and expert patterns evaluates which of both sets of patterns generalize better on test questions,
and also evaluates the overlap in both sets (Section 5.5). Finally, the performance improvement
achieved when combining both learned and expert patterns is evaluated and compared to the
performance of each set of patterns alone (Section 5.6).
5.1. Setup
To evaluate how our FRED approach performs overall, we use the well-known quality measures
MAP (when comparing with a reference model that does yield ranked results) and 1 with a
ifxed cut-of rank (for reference models that do not).</p>
      <p>
        GPL trainings are performed using the GPL’s default parameters7. We set the number of
chunks such that each contains at most 20 diferent relation IDs, and subsampled each chunk
to max. #  = 800 samples. FRED trainings are performed using Adamax [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] with an
initial learning rate of  = 5− 5, batch size 4, and apply a dropout [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] rate of 10% before
multiplying with  ′. We trained for a maximum of 200 epochs, whereas early stopping is
applied using FRED’s MAP on the validation split. We start our training from the pre-trained
BERT checkpoint bert-base-uncased.
5.2. Control Runs
A first interesting question is what maximum performance could be achieved with our learned
graph patterns, assuming a perfect matching between question and patterns. To estimate this
upper bound, the results of FRED scoring at diferent cutof values are compared against a
simulated oracle classifier , which selects the best fitting pattern for a question, i.e. the one
achieving the highest F1 score, and yields this pattern’s result set. The results in Table 2 show
that the highest scored entity by FRED comes close to the results of the best fitting GPL pattern.
FRED attains near best possible performance on WQSP1 and SQ, whereas the diference to the
upper bound placed by the GPL is greater for WQSP5.
      </p>
      <p>Dataset
WQSP5
WQSP1</p>
      <p>
        SQ
Results of the control runs evaluation, reported as the mean and standard deviation of the F1 score
achieved on the test split over 4 runs. GPL Oracle shows an optimal matching between question and
graph patterns. The columns   show results for the top-k results of FRED scoring.
5.3. Question Agnostic Scoring
The graph pattern learner itself comes with several approaches for ranking its target
candidates [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. These approaches prioritize graph patterns based on their precision. Obviously,
these approaches ofer an alternative to our FRED scoring. However, while FRED incorporates
information from the input question, the GPL’s own scoring approaches only utilize information
from the GPL training and the resulting graph patterns to score a target candidate.
      </p>
      <p>To evaluate the added value of our model compared to such question-agnostic scoring, FRED
(see Equation 1) is compared against three GPL-internal scoring mechanisms Precisions, GP
Precisions and Logistic Regression. The results are shown in Table 3. We can see that FRED
provides a significant improvement over these question-agnostic scoring approaches.</p>
      <p>Dataset
WQSP5
WQSP1</p>
      <p>SQ</p>
      <p>FRED
on the test split and reported as the mean and standard deviation over all 4 runs on the respective
dataset version. FRED’s incorporation of the question provides a performance gain compared to the
other, question-agnostic scoring approaches.
5.4. Ablation Study
To evaluate to which extent the FRED model can be simplified without impacting performance,
an ablation study is performed, where the model omits the source mention s embedding and
instead relies solely on the question embedding q (see Section 3.2). The forward pass becomes
^ 1(q, P) = 
︁(
q · P)︁
·  ′
︁)
where the size of  is adjusted to  ×  dimensions, and consequently P
∈ R× , i.e. the
dense pattern embeddings now lie in a -dimensional space instead of a 2-dimensional one.</p>
      <p>The results of the evaluation are presented in Table 4 and show that the source mention
embedding can be omitted without a significant impact on the model’s performance. Our
interpretation is that the question embedding already captures enough semantics of a question,
such that the model is not required to interpret a question’s source mention embedding.</p>
      <p>Dataset
WQSP5
WQSP1</p>
      <p>SQ
5.5. Comparison between Learned and Expert Patterns
An example for three questions (Q) from WQSP1, their corresponding expert patterns (EP) and
the best matching GPL pattern (GPL) is given in the following. Δ denotes the cosine distance
between expert- and GPL pattern in the embedding space spanned by the graph patterns’
representations p (see Section 3.2).</p>
      <p>• Q: who was gerald ford vp?</p>
      <p>EP: SELECT DISTINCT ?source ?target WHERE {
?source ns:government.us_president.vice_president ?target .}
GPL: SELECT ?source ?target ?vcb0 WHERE {
?target key:wikipedia.lv ?vcb0 .
?target ns:government.us_vice_president.to_president ?source .}
Δ: ≈ 0.0
• Q: what language did the ancient babylonians speak?</p>
      <p>EP: SELECT DISTINCT ?source ?target WHERE {
?source ns:location.country.languages_spoken ?target .}
GPL: SELECT ?source ?target ?vcb0 WHERE { ?target
ns:language.human_language.countries_spoken_in ?source .
?target ns:language.human_language.writing_system ?vcb0 .}
Δ: 0.0065
• Q: what capital city of brazil?</p>
      <p>EP: SELECT DISTINCT ?source ?target WHERE {
?source ns:location.country.capital ?target .}
GPL: SELECT ?source ?target ?vcb0 WHERE {
?source ns:location.country.capital ?target .
?target ns:travel.travel_destination.tourist_attractions ?vcb0 . }
Δ: 0.2929</p>
      <p>We see that in the first two cases, the learned graph patterns and the expert patterns are
similar both in terms of the pattern’s graph structure, and in terms of the entities the patterns
retrieve (as expressed by Δ). In the third case, while the GPL found the predicate from the
expert pattern, it added another restrictive predicate which lead to a greater diference in the
result set. Overall, the examples above show that the GPL is able to learn patterns similar but
not necessarily identical to the expert patterns delivered with WebQuestionsSP.</p>
      <p>We investigate this question further quantitatively: For each expert pattern in the training
set, we find the closest GPL-learned pattern in terms of the distance Δ between the pattern
representations. We collect the distribution of this nearest-neighbor distance in Figure 2 in
a histogram, where 0 (left) corresponds to a low distance, i.e. expert patterns for which very
similar GPL patterns can be found, and 100 (right) expert patterns that are dissimilar from any
learned graph pattern. The plot indicates that for about 18% of all expert patterns a very similar
GPL pattern was learned (Δ = 0), while at the same time for 18% no similar GPL pattern is
found at all (Δ = 100).</p>
      <p>Finally, we investigate the question how well both expert and learned patterns cover the
targeted questions. We utilize the respective pattern populations drawn from / learned on the
training set, and count for each pattern the number of training/test questions that are “answered”
by the pattern (indicated by an F1 score &gt; 0). This gives an indicator about the generality of
both sets of patterns. The results for the WebQuestionsSP datasets are shown in Table 5: For
example, the first row indicates that a GPL-learned pattern covers 45.41 questions from the
training set on average. We observe that the counts on the training set are much higher than
on the test sets (which is mostly because the training set is 19 times larger). More importantly,
however, we observe that GPL-learned patterns cover significantly more ( ≈ 4 times as many)
questions than the expert-defined patterns, both on the training and test split. Correspondingly,
the GPL patterns are more “general” than the expert patterns.</p>
      <p>Dataset
WQSP1
WQSP1
WQSP5
WQSP5</p>
      <p>Split
train
test
train
test
5.6. Complementing Expert Patterns
We compare the accuracy obtained by FRED when using a population of (a) expert-defined
patterns, (b) GPL-learned patterns, and (c) the union of both sets. Results on the test splits
are illustrated in Table 6: The expert patterns have been derived from the precise SPARQL
queries delivered with WebQuestionsSP and – as expected – are more accurate than the GPL
generated patterns. While they do in fact achieve a higher average F1, the combination of both
GPL- and expert patterns performs better than each set of pattern alone. This indicates that
the GPL-learned patterns can ofer a high-quality, low-cost alternative to manually defining
patterns.</p>
      <p>Dataset
WQSP1 GPL</p>
      <p>WQSP1 Expert
WQSP1 GPL+Expert</p>
      <p>WQSP5 GPL</p>
      <p>WQSP5 Expert
WQSP5 GPL+Expert</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>This paper has presented FRED, an approach towards KBQA which combines an evolutionary
learning of graph patterns with a transformer-based matching of questions to patterns. Our
results indicate the general feasibility of the model, and demonstrate that our matching yields
an accuracy close to optimal pattern-based control runs. We have also compared the patterns
yielded by our approach to expert-generated patterns, and have demonstrated that both types
of patterns complement each other well.</p>
      <p>
        This work poses two relations to case-based reasoning (CBR): First, our model can – in a
wider sense – be seen as an instantiation of the classical CBR R4-cycle [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] (Retrieve, Reuse,
Revise, Retain): When viewing a “case” as a question paired with a path through the KG to
retrieve the correct answer, our work utilizes the graph pattern learner to form a curated base of
questions and learned graph patterns. For a new question, similar cases (represented by graph
patterns) are retrieved, and patterns are reused in a scoring process. During the retain step,
the question can be added to the set of questions corresponding to the graph pattern which
retrieved the answers. Second, CBR may form a particularly interesting use case for our model
in situations where “cases” are stored in a structured knowledge graph form (eg. in a medical
scenario, these could be patients with certain symptoms, diseases, tratment plans, history etc).
In these situations, graph pattern learning may reveal typical patterns of cases in the graph, and
our model could support CBR users (say, medical experts) by answering typical questions about
cases and exploring their relevant characteristics. Alternatively, our model could support CBR
systems (say, treatment recommenders) by utilizing our pattern-based embeddings as features,
for example for retrieval.
      </p>
      <p>
        Some limitations of our approach are of a rather technical nature: The current implementation
of the graph pattern learner assumes a single source entity per question to be given, and does
not cope with value answers (such as dates). Second, our current strategy of partitioning the
GPL training set is based on relation IDs (which are unlikely to be available in practice). An
interesting direction of future work will be to investigate diferent clustering strategies of
questions, likely based on text embeddings. Also, note that similar to other KBQA approaches,
our current model requires the source entity and its mention in the question to be known
(though we achieved comparable results without the latter in an ablation study). Investigating a
more extended pipeline that includes entity localization and linking might be of interest here.
Another future direction is a comparison against other query generating KBQA approaches,
like SPBERT and QUINT. These were ommited due to the focus on a diferent KB (DBpedia [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ])
in the first case and the absence of an available reference implementation in the latter.
      </p>
      <p>Most significantly, however, weighs the fact that our approach considers graph patterns to
be atomic, i.e. there are no combinations of generalizations of patterns at inference time. For
example, a pattern cannot generalize from finding an actor’s spouse to a singer’s spouse unless
the GPL has learned other individuals as part of its population. Finding strategies for more
adaptive patterns is certainly an interesting direction for future work.</p>
      <p>Finally, another interesting direction for future work is the incorporation into an LLM based
QA system, as LLMs have demonstrated high quality text generation capabilities, but often lack
a factual grounding of their output. By linking a generated answer to the question via graph
patterns, its grounding in a given knowledge base can be verified.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was supported by the German Federal Ministry of Education and Research, project
SCENT (project ID=13FH003KX0).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux, A. Seaborne, SPARQL Query Language for</article-title>
          RDF,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] W.-t. Yih,
          <string-name>
            <given-names>M.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meek</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Suh,</surname>
          </string-name>
          <article-title>The Value of Semantic Parse Labeling for Knowledge Base Question Answering, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>201</fpage>
          -
          <lpage>206</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <article-title>Large-scale Simple Question Answering with Memory Networks (</article-title>
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave, G. Lample, LLaMA: Open and
          <article-title>Eficient Foundation Language Models, arXiv preprint</article-title>
          ,
          <source>arXiv:2302.13971</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Taori</surname>
          </string-name>
          , I. Gulrajani,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dubois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Hashimoto</surname>
          </string-name>
          , Stanford Alpaca:
          <article-title>An Instruction-following LLaMA model</article-title>
          , https://github.com/tatsu-lab/ stanford_alpaca,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Creswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shanahan</surname>
          </string-name>
          ,
          <source>Faithful Reasoning Using Large Language Models</source>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2208</volume>
          .
          <fpage>14271</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Knowledge Graph Embedding Based Question Answering</article-title>
          ,
          <source>in: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining</source>
          ,
          <year>2019</year>
          , p.
          <fpage>105</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia-Duran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yakhnenko</surname>
          </string-name>
          ,
          <article-title>Translating Embeddings for Modeling Multi-relational Data</article-title>
          , in: C.
          <string-name>
            <surname>Burges</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Ghahramani</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Weinberger (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>26</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Learning Entity and Relation Embeddings for Knowledge Graph Completion</article-title>
          ,
          <source>in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2015</year>
          , p.
          <fpage>2181</fpage>
          -
          <lpage>2187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Mavromatis</surname>
          </string-name>
          , G. Karypis,
          <article-title>ReaRev: Adaptive Reasoning for Question Answering over Knowledge Graphs</article-title>
          ,
          <source>arXiv preprint arXiv:2210.13650</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Berant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Frostig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Semantic Parsing on Freebase from Question-Answer Pairs</article-title>
          ,
          <source>in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1533</fpage>
          -
          <lpage>1544</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Talmor</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Berant,</surname>
          </string-name>
          <article-title>The Web as a Knowledge-Base for Answering Complex Questions, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>641</fpage>
          -
          <lpage>651</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kozareva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Smola</surname>
          </string-name>
          , L. Song,
          <article-title>Variational Reasoning for Question Answering with Knowledge Graph</article-title>
          ,
          <source>in: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence</source>
          , AAAI'18/IAAI'18/EAAI'18,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Phan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Anibal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , T.-S. Nguyen,
          <string-name>
            <surname>SPBERT:</surname>
          </string-name>
          <article-title>An Eficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs</article-title>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Gusmita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Saleem, 9th Challenge on Question Answering over Linked Data (QALD-9) (invited paper)</article-title>
          ,
          <source>in: 17th International Semantic Web Conference (ISWC</source>
          <year>2018</year>
          ), Monterey, California, United States of America,
          <source>October 8th - 9th</source>
          ,
          <year>2018</year>
          .,
          <year>2018</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Trivedi</surname>
          </string-name>
          , G. Maheshwari,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Lehmann,</surname>
          </string-name>
          <article-title>LC-QuAD: A corpus for complex question answering over knowledge graphs</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2017</year>
          , pp.
          <fpage>210</fpage>
          -
          <lpage>218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Soru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moussallem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Publio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valdestilhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Esteves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. B.</given-names>
            <surname>Neto</surname>
          </string-name>
          ,
          <article-title>SPARQL as a Foreign Language</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on Semantic Systems - SEMANTiCS2017 Posters and Demos</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abujabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yahya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedewald</surname>
          </string-name>
          , G. Weikum,
          <source>Automated Template Generation for Question Answering over Knowledge Graphs, in: Proceedings of the 26th International Conference on World Wide Web</source>
          ,
          <year>2017</year>
          , p.
          <fpage>1191</fpage>
          -
          <lpage>1200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          ,
          <article-title>Large-scale Semantic Parsing via Schema Matching and Lexicon Extension, in: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hees</surname>
          </string-name>
          ,
          <article-title>Simulating Human Associations with Linked Data</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Technical University Kaiserslautern,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Folz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Borth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dengel</surname>
          </string-name>
          ,
          <article-title>An Evolutionary Algorithm to Learn SPARQL Queries for Source-Target-Pairs</article-title>
          , in: E. Blomqvist,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ciancarini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vitali</surname>
          </string-name>
          (Eds.),
          <source>Knowledge Engineering and Knowledge Management</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>352</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bollacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paritosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sturge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , Freebase:
          <string-name>
            <given-names>A Collaboratively</given-names>
            <surname>Created</surname>
          </string-name>
          <article-title>Graph Database for Structuring Human Knowledge</article-title>
          ,
          <source>in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data</source>
          ,
          <year>2008</year>
          , p.
          <fpage>1247</fpage>
          -
          <lpage>1250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>N.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <article-title>Dropout: A simple way to prevent neural networks from overfitting</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>15</volume>
          (
          <year>2014</year>
          )
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aamodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <article-title>Case-based reasoning: Foundational issues, methodological variations, and system approaches</article-title>
          ,
          <source>AI</source>
          Communications
          <volume>7</volume>
          (
          <year>2001</year>
          )
          <fpage>39</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Van Kleef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , C. Bizer, DBpedia
          <article-title>- A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia</article-title>
          ,
          <source>Semantic Web Journal</source>
          <volume>6</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>