<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Simple Question Answering Over a Domain-Speci c Knowledge Graph using BERT by Transfer Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mani Vegupatti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Nickles</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bharathi Raja Chakravarthi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland</institution>
          ,
          <addr-line>Galway</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science, National University of Ireland</institution>
          ,
          <addr-line>Galway</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>We build and evaluate a baseline for simple question answering over a domain-speci c knowledge graph by using a pretrained opendomain language model BERT. Training a neural network from scratch needs a large annotated dataset whereas transfer learning adapts a pretrained language model and allows task-speci c ne-tuning with limiteddata. However, building a domain-speci c language model needs a large amount of domain-speci c text, resource, and time for pretraining. But open-domain language models such as BERT are readily available for use. Hence, we evaluate the open-domain pretrained BERT for creating a domain-speci c question answering baseline model that requires less amount of training data. In this work, we built a BioMed domain simple question answering system by ne-tuning the open-domain BERT with a manually curated dataset of ~600 questions from the Drugbank knowledge graph published by Bio2RDF.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Graph</kwd>
        <kwd>Question Answering</kwd>
        <kwd>BERT</kwd>
        <kwd>Transfer Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Question Answering (QA) is one of the earliest research interests in arti cial
intelligence started from answering questions posed in natural language based
on underlying database data[
        <xref ref-type="bibr" rid="ref26 ref7">7, 26</xref>
        ], extracting answer from the given text
passage[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and recently focused on QA over a Knowledge Graph (KG). KG
represents the facts about entities as a graph, where the nodes represent the entities
which can be real-word persons, places, objects, concepts, events, and many
other and edges link the entities and serve as a predicate[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. QA over a KG aims
at providing answer for a natural language question by using the facts from the
KG and the two categories are Simple/Factoid QA and Complex QA[
        <xref ref-type="bibr" rid="ref2 ref22">2, 22</xref>
        ].
Simple QA is called simple not because the QA task is simple, but the answering
process requires simple reasoning processing involving only a single triple from
a KG[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Since the question can be answered using a single fact of a KG, it is
also known as single-factoid QA[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Complex QA requires a complex reasoning
process with hops over multiple triples of a KG to retrieve the answer[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>
        Training a neural network from scratch for QA needs a large volume of
annotated data and the creation of such dataset requires a lot of time and resources
that are scarce. However, transfer learning pretrains a Language Model (LM) to
learn task-independent context-based language representations from large
unannotated text and allows netuning for a downstream task with limited training
data. Transfer learning-based approaches use the pretrained language models
such as ELMO, ULMFiT, GPT and BERT to achieve better performance in
GLUE tasks with less amount of data and fewer epochs[
        <xref ref-type="bibr" rid="ref10 ref15 ref18 ref3">15, 10, 18, 3</xref>
        ].
      </p>
      <p>
        KGs can be classi ed into open-domain and domain-speci c. Open-domain
KG is a very large collection of coarse-grained facts without restriction to any
speci c domain whereas domain-speci c KG is relatively smaller size with
negrained facts dedicated to a single domain like life sciences, academic and tourism
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Increasing adoption of KGs in industry and multiple domains[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] fuels the
necessity of QA over a domain-speci c KG. Our focus in this work is to adapt an
open-domain pretrained language model like BERT for domain-speci c simple
QA over a KG in transfer learning settings and build a baseline model
architecture.
      </p>
      <p>
        We choose the biomedical domain of life sciences because of its complexity
and importance. The terminologies of biomedical are complex and signi cantly
di erent than open-domain. This will help to evaluate the e ectiveness of
opendomain BERT adoption to the domain-speci c QA task. Biomedical QA is
essential in improving health care and its growing importance attracted multiple
QA challenges, but there exists no dataset for simple QA over a biomedical KG
(refer Section 3). Hence, we created Drugbank simple question answering dataset
using the facts from Drugbank KG released by Bio2RDF[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Main contributions
of our work are:
{ Creation of Drugbank Simple Question Answer Dataset (Drugbank SQA) for
the task simple QA over a domain-speci c KG and annotations for subtasks
Named Entity Recognition (NER) and classi cation.
{ Building and evaluating the baseline model architecture for answering Simple
Question from the facts of a domain-speci c KG using pretrained
opendomain language model BERT in transfer learning settings.
{ Presenting the evaluation of various techniques for adaptation of open-domain
pretrained BERT LM in simple QA over a domain-speci c KG.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The methods used for simple QA over a KG can be broadly classi ed into
end-toend neural networks, baseline models and transformer-based models. End-to-end
neural network models employ a RNN-based single complex deep neural network
for the whole task[
        <xref ref-type="bibr" rid="ref1 ref2 ref20">1, 20, 2</xref>
        ] and often transform inputs using word or character
level embeddings [
        <xref ref-type="bibr" rid="ref13 ref8">8, 13</xref>
        ]. Baseline models divide the QA task into subtasks and
use simple models for conquering individual subtasks[
        <xref ref-type="bibr" rid="ref14 ref17 ref23">23, 17, 14</xref>
        ]. However, both
the approaches need a large amount of labeled data and depend on the sequence
modeling that increases the training time.
      </p>
      <p>
        Transformer-based models use only attention mechanism and remove RNN
from the architecture. These models achieve global long-term dependencies and
generalization by learning task-independent features[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Transformer based GPT
and BERT are shown to out-perform end-to-end neural networks in simple
QA over text passages using a pretrained language model and ne-tuning with
task-speci c data[
        <xref ref-type="bibr" rid="ref18 ref3">18, 3</xref>
        ]. BERT performs better than the former since it uses
bi-directional information along with attention[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for learning representations.
      </p>
      <p>
        BERT[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is designed to pre-train deep bidirectional representations from an
unlabeled text by jointly conditioning on both right and left context in all layers.
The pre-trained BERT model is netuned with just a single additional output
layer to create models for a wide range of the task, such as question
answering. Various transfer learning approaches for using BERT in the downstream
tasks like NER and classi cation and their e ectiveness are studied in[
        <xref ref-type="bibr" rid="ref16 ref3">3, 16</xref>
        ].
Simple QA over an open-domain KG using BERT was carried out with results
outperforming earlier Bi-LSTM models [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>DrugBank Simple Question Answering Dataset</title>
      <p>
        In biomedical question answering, TREC Genomics Track3 and QA4MRE4 are
the datasets for QA without KGs over a passage of text, QALD5 aims at QA
over interlinked datasets (SIDER, diseasome and drugbank) and BioASQ6
targets answering questions by combing various heterogeneous sources like texts,
databases, and triple stores[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. But we do not have a dataset for simple QA
over a KG, hence we created a new dataset7 of 566 questions out of ~3 million
facts available in the DrugBank KG.
      </p>
      <p>We created the dataset with a three-step process. First, we examined the
pattern of 3670K triples in the KG then eliminated the triples created for structuring
the KG like types and properties. Finally, to ensure enough coverage distinct
relations were selected from available relations and questions were generated with
English variation of the selected relations. Table 1 shows a few sample questions.
3 https://trec.nist.gov/data/genomics.html
4 http://nlp.uned.es/clef-qa/repository/qa4mre.php
5 http://qald.aksw.org/4/documents/qald-4.pdf
6 http://bioasq.org/
7 https://github.com/mani-vegupatti/SQA Over DrugBank KG/tree/master/dataset</p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>
        In our work, we will use a domain-speci c knowledge graph, conduct experiments
using the architectures inspired from the baseline models[
        <xref ref-type="bibr" rid="ref14 ref23">14, 23</xref>
        ] for single factoid
question-answering in transfer learning settings with open-domain pretrained
BERT language model.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Architecture</title>
        <p>We use the architecture of baseline model approach that divides the simple QA
task into subtasks subject identi cation, relation classi cation and subject
linking. This approach helps in data reduction, understandability and choosing the
best architecture for individual subtasks. Di erent models are built for
individual subtasks hence we can choose task-speci c architecture. Simple architecture
helps in reducing deep layers and in turn the training data. Finally, this makes it
easy to understand the performance of models in individual subtasks and parts
of the system. The architecture of the system is shown in below Figure 1.
{ Question: The question is asked in natural language to know a fact regarding
an entity in the KG.</p>
        <p>Example: q = `who produces penicillin V'
{ Subject Identi cation: Subject Identi cation is the process of predicting the
substring from the question phrase that matches the subject/entity of the
question.This is the problem of named entity recognition for the given text
and can be solved using the sequence labelling or token classi cation task
Example: s = `penicillin V' j q = `who produces penicillin V'
{ Relation Classi cation: Relation classi cation is used to predict the correct
relation for a given question over the available relations. This is formulated
and solved as a classi cation problem</p>
        <p>Example: r = `manufacturer' j q = `who produces penicillin V'
{ Graph Datastore: KG is stored as a collection of RDF triples in the graph
datastore. We use the n-triple format DrugBank KG released by Bio2RDF.
{ Inverted Index: Inverted index is created using n-grams of the entity labels
as keys and entity/entity-URI as value.
{ Subject Ranking and Linking: The string identi ed as the subject in the
subject identi cation module is looked up in the inverted index and Top K
subjects are selected based on fuzzy search and scoring. Fuzzy search helps
to identify the terms based on partial-string matching and scoring by using
techniques like Levenshtein distance/edit distance
{ Answer Generation: Answer generation is carried out by sending a SPARQL
query to the SPARQL endpoint of the given KG. SPARQL query is formed
using the subject/entity received from the subject ranking and linking
module, and relation/predicate obtained from the relation classi cation module.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Adapting BERT in Transfer Learning</title>
        <p>The subtasks of simple QA over a KG are solved by formulating them as NER
and classi cation problems. BERT can be adopted for downstream tasks in the
transfer learning settings using one of the two approaches feature extraction
or ne-tuning. We try both the approaches to nd the best approach for the
sub-tasks (refer section 4.3 and 4.4).</p>
        <p>
          { Feature Extraction: In the feature extraction approach, we will extract the
pretrained representations from the BERT model and use them as features
for the downstream tasks. Alternatively, pretrained layers with its weights
can be used as-is without ne-tuning. The advantages[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] of this approach
are:
        </p>
        <p>For the tasks that can not be solved by the transformer architecture,
performance can be improved by using task-speci c architectures with
contextualised BERT embeddings
It is computationally less expensive when the input pretrained
representations are not further changed during ne-tuning.
{ Fine-tuning: In the ne-tuning approach, a task-speci c classi cation layer
is added on top of the pretrained model. The parameters of the classi cation
layer are learned along with adjustment of the pretrained parameters of
the underlying layers while training on the required objective of the given
downstream task
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Subject Identi cation</title>
        <p>
          We built the module subject identi cation based on the concept named entity
recognition, which predicts the span of a given text that identi es the entity and
its type[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This module uses Begin-Inside-Outside (BIO) system is for tagging
the tokens of the given sentence during the learning/prediction. We want to
identify the chunk of words representing the subject/entity and do not want to
classify the type of subject, hence the nal tags used are B-E (Beginning of an
entity), I-E (Inside the entity) and O-E (Outside the entity).
        </p>
        <p>We have built three models, two based on feature extraction (S1, S3) and
one using the ne-tuning approach (S2) and selected the best model for building
the nal pipeline.</p>
        <p>
          S1 - Sequence Labelling with BiLSTM + CRF using BERT word embeddings:
Previous works[
          <xref ref-type="bibr" rid="ref14 ref17">17, 14</xref>
          ] on building baseline models for simple QA over an
open-domain KG achieved the top score by using BiLSTM+CRF in the NER
task. Hence for comparison purpose, we built the BiLSTM+CRF sequence
labelling model using BERT word embeddings. BERT provides context-based
word embeddings based on the local context in which the word appears and
help to overcome problems like polysemy. This model employes a feature
extraction approach in which the word embeddings from the pretrained BERT
LM is used as input to LSTM encoder and nal tagging is obtained from
CRF decoder.
        </p>
        <p>S2 - Token Classi er with fully ne-tuned BERT layers:</p>
        <p>We formulated the problem as a token classi cation and built the model using
the ne-tuning approach. In this design, the pretrained BERT model is used
as a base for providing input to the classi er layer which is a linear classi er
with softmax activation. While ne-tuning the model on the downstream
NER task, all parameters of the pretrained BERT model are also modi ed
along with the parameters of the linear classi er with the token classi cation
objective function. The cross-entropy loss function is used for training and
the probability of the token calculated as below,</p>
        <p>P (tjhi) = softmax (wi hi + b)
(1)
Relation classi cation is the module for predicting the right relation/predicate
for the given question from the list of relations obtained from the KG that
connects the subject with the object. In a simple QA task, it is assumed that
each question will have at max only one relation for any given question. It is
solved using multiclass sequence/text classi cation i.e. for the given sequence of
words, predict the best class(relation) that represents the given question from
the available classes. It is formulated as below,</p>
        <p>P (ri j [x1 x2 x3 ::: xn])
(2)
We selected the best model for the nal pipeline from the four models we built
of which three (R1, R2 and R4) are based on feature extraction and one (R3) is
based on ne-tuning approach.
and next sentence prediction. The output at [CLS] token is passed as an
input to the next fully connected dense layer with the softmax activation
which serves as the classi cation layer. the weights of the fully connected
layer and the base layers are jointly adjusted while netuning the model
on the multiclass classi cation task with task-speci c data and categorical
cross-entropy loss function
R4 - Relation Classi er with fully frozen BERT layers:</p>
        <p>In this model, the base layers adapted from the pretrained BERT LM
is fully frozen. While ne-tuning, the pretrained layer's weights are not
adjusted and only weights of top classi er layer are updated
4.5</p>
      </sec>
      <sec id="sec-4-4">
        <title>Subject Ranking and Linking</title>
        <p>The substring of words returned from the question by the NER task can be
an exact or partial match of the actual entity. We use this module to nd the
actual entity. In this module, we created an inverted index that has n-grams of
entities as dictionary terms with entities as a posting list and used the
FuzzyWuzzy package8 to search the actual entity from the inverted index based on
the substring match of predicted entity string in the dictionary terms.
4.6</p>
      </sec>
      <sec id="sec-4-5">
        <title>Answer Generation</title>
        <p>We nd answer for given a question from the KG by formulating the SPARQL
query using the subject(s) and relations(r) returned by the previous modules
subject liking and relation classi cation respectively. The answer(object) is
retrieved using the query \SELECT ?object where s p ?object".
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental settings and Evaluations</title>
      <sec id="sec-5-1">
        <title>Model Settings</title>
        <p>
          We have used the pretrained BERT language model `bert-base-uncased'9 from
Huggingface Transformers to build all the seven models and details are as below:
{ Input data split and preprocessing: We have retained 20 per cent of data from
Drugbank SQA dataset as test data. Remaining 80 per cent data is further
split into training and validation with 80:20 ratio. We used strati cation
to retain the class balance across datasets. The numbers of examples in
training, validation, and testing datasets are 406, 46, and 116 respectively.
We tokenised the questions with word piece model[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], added special tokens
[CLS] at the start and [SEP] at the end.
8 https://github.com/seatgeek/fuzzywuzzy
9 https://huggingface.co/transformers/v2.4.0/model doc/bert.html
{ Feature extraction approach: The feature extraction based models used word
embeddings, sentence embeddings or frozen pretrained layers. The word
vector is constructed by summing the vectors of word pieces and word
embedding is obtained by adding the last four layers. The sentence embedding is
obtained from [CLS] token position of last hidden state. When pretrained
layers are used for feeding the input feature, their weights are not updated
during the ne-tuning process.
{ Fine tunning approach: The pretrained layers of BERT were also ne-tuned
along with the top task-speci c layer using the Drugbank SQA training data
with task-speci c objective function.
5.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Results</title>
        <p>
          For the subtasks subject identi cation and relation classi cation, we have built
models with BERT using both transfer learning approaches feature extraction
and ne-tuning. We used F-Score for entity-level evaluation of NER tasks[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] and
accuracy for relation classi cation and nal entity-relation pair predictions[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In
both the tasks, fully ne-tuned models outperform the feature extraction based
models as shown in Table 2 and 3.
        </p>
        <p>
          The module entity linking which uses fuzzy-search improves the entity-level
accuracy of the NER and in turn increases the nal accuracy of the (entity,
relation) pair. The nal answer (entity-relation pair) prediction accuracy of our
model for simple QA over a domain-speci c KG along with results of various
approaches of open-domain QA are shown in below in Table 4.
In NER task, earlier work on open-domain QA with a large training dataset
(75.9 K training examples) reported 91% and 89.8% F-Score using BiLSTM and
CRF models respectively[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. In domain-speci c task with a training dataset
of ~400 examples, this architecture with BERT word embeddings could achieve
only 37.1% because the training data is not su cient to learn the ~50000
parameters and open-domain word embeddings have di culty in recognising the
domain-speci c entities. Another feature extraction based model that uses frozen
pretrained layers can reach 81.0% but still lower than score 95.5% of the fully
ne-tuned model because open-domain trained frozen layers still do not fully
recognise domain-speci c entities.
        </p>
        <p>In relation classi cation, feature extraction based BiLSTM with BERT word
embeddings model (68.4%) outperforms rest of feature extraction based
models SVM with BERT sentence embeddings (64.9%) and frozen BERT layers
(43.0%). However, the top score 96.5% is achieved by fully ne-tuned BERT
model. This again indicates the inability of the open-domain BERT models to
understand domain-speci c terms without further ne-tuning with
domain-taskspeci c data.</p>
        <p>
          The open-domain QA reference models use a large dataset of ~100K
questions from freebase[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] whereas our models use the DrugBank SQA dataset of
~600 questions. Since the datasets used were di erent, the results are not
directly comparable but used for understanding current performance levels. Our
research aim is to build a baseline model for domain-speci c simple QA by
transfer learning from open-domain trained BERT LM for the environments with a
scarcity of data, time and resource. With this research aim, our baseline
architecture is able to achieve State Of The Art (SOTA) results (95.6% accuracy for
entity-relation pair prediction) with fully- ne tuned BERT models for both the
subtasks and entity-linking with fuzzy-search.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>To the best of our knowledge, this work is the rst baseline model pipeline for
answering a simple question over a domain-speci c KG by using open-domain
trained LM BERT in transfer learning settings. We have contributed further by
creating the Drugbank SQA dataset by using facts from the DrugBank KG and
annotated with required BIO tagging and target classes for the subtasks NER
and classi cation. We have presented an architecture for the baseline
domainspeci c simple QA model pipeline that contains subtasks subject identi cation,
relation classi cation, subject-relation linking and answer generation which
produces a SOTA result of 95% accuracy for DrugBank SQA dataset. We have
also presented multiple methods for adaption of open-domain BERT in
domainspeci c tasks of QA and evaluated their e ectiveness.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chopra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
          </string-name>
          , J.:
          <article-title>Question answering with subgraph embeddings</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>615</volume>
          {
          <fpage>620</fpage>
          . ACL, Doha, Qatar (Oct
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usunier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chopra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
          </string-name>
          , J.:
          <article-title>Large-scale simple question answering with memory networks</article-title>
          .
          <source>arXiv preprint arXiv:1506</source>
          .
          <year>02075</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the ACL</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <fpage>4186</fpage>
          . ACL, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dimitrakis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sgontzos</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tzitzikas</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>A survey on question answering systems over linked data and documents</article-title>
          .
          <source>Journal of Intelligent Information Systems</source>
          pp.
          <volume>1</volume>
          {
          <issue>27</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callahan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cruz-Toledo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ansell</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Emonet</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belleau</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Droit</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Bio2rdf release 3: A larger connected network of linked data for the life sciences</article-title>
          .
          <source>In: Proceedings of the 2014 International Conference on Posters &amp; Demonstrations</source>
          Track - Volume
          <volume>1272</volume>
          . p.
          <volume>401</volume>
          {
          <fpage>404</fpage>
          . ISWC-PD'
          <fpage>14</fpage>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
            .org, Aachen,
            <given-names>DEU</given-names>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Esuli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Evaluating information extraction</article-title>
          .
          <source>In: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          . pp.
          <volume>100</volume>
          {
          <fpage>111</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Green</given-names>
            <surname>Jr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.F.</given-names>
            ,
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.K.</given-names>
            ,
            <surname>Chomsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Laughery</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>Baseball: an automatic question-answerer</article-title>
          .
          <source>In: Papers presented at the May 9-11</source>
          ,
          <year>1961</year>
          ,
          <article-title>western joint IREAIEE-ACM computer conference</article-title>
          . pp.
          <volume>219</volume>
          {
          <issue>224</issue>
          (
          <year>1961</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golub</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Character-level question answering with attention</article-title>
          .
          <source>In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>1598</volume>
          {
          <fpage>1607</fpage>
          .
          <string-name>
            <surname>ACL</surname>
          </string-name>
          , Austin, Texas (Nov
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomqvist</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cochez</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Melo</surname>
          </string-name>
          , G.,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gayo</surname>
            ,
            <given-names>J.E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Knowledge graphs</article-title>
          . arXiv preprint arXiv:
          <year>2003</year>
          .
          <volume>02320</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruder</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Universal language model ne-tuning for text classi cation</article-title>
          .
          <source>In: Proceedings of the 56th Annual Meeting of the ACL (Volume 1: Long Papers)</source>
          . pp.
          <volume>328</volume>
          {
          <fpage>339</fpage>
          . ACL, Melbourne,
          <source>Australia (Jul</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            , A., Han,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A survey on deep learning for named entity recognition</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lukovnikov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
          </string-name>
          , J.:
          <article-title>Pretrained transformers for simple question answering over knowledge graphs</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          . pp.
          <volume>470</volume>
          {
          <fpage>486</fpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lukovnikov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Neural network-based question answering over knowledge graphs on word and character level</article-title>
          .
          <source>In: Proceedings of the 26th international conference on World Wide Web</source>
          . pp.
          <volume>1211</volume>
          {
          <issue>1220</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mohammed</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Strong baselines for simple question answering over knowledge graphs with and without neural networks</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies</source>
          , Volume
          <volume>2</volume>
          (Short Papers). pp.
          <volume>291</volume>
          {
          <fpage>296</fpage>
          .
          <string-name>
            <surname>ACL</surname>
          </string-name>
          , New Orleans,
          <source>Louisiana (Jun</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long Papers). pp.
          <volume>2227</volume>
          {
          <fpage>2237</fpage>
          .
          <string-name>
            <surname>ACL</surname>
          </string-name>
          , New Orleans,
          <source>Louisiana (Jun</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruder</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>To tune or not to tune? adapting pretrained representations to diverse tasks</article-title>
          .
          <source>In: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)</source>
          . pp.
          <volume>7</volume>
          {
          <fpage>14</fpage>
          . ACL, Florence,
          <source>Italy (Aug</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Petrochuk</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
          </string-name>
          , L.:
          <article-title>SimpleQuestions nearly solved: A new upperbound and baseline approach</article-title>
          .
          <source>In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>554</volume>
          {
          <fpage>558</fpage>
          . ACL, Brussels, Belgium (OctNov
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narasimhan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salimans</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Improving language understanding by generative pre-training</article-title>
          .
          <source>URL https://s3-us-west-2</source>
          . amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper.
          <source>pdf</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Simperl</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norton</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Acosta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maleshkova</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikroyannidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mulholland</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Power</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Using linked data e ectively (</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Sukhbaatar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szlam</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fergus</surname>
          </string-name>
          , R.:
          <article-title>End-to-end memory networks</article-title>
          .
          <source>In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume</source>
          <volume>2</volume>
          . p.
          <volume>2440</volume>
          {
          <fpage>2448</fpage>
          . NIPS'15, MIT Press, Cambridge, MA, USA (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Tjong Kim Sang</surname>
            ,
            <given-names>E.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Meulder</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition</article-title>
          .
          <source>In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003</source>
          . pp.
          <volume>142</volume>
          {
          <issue>147</issue>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Trivedi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maheshwari</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
          </string-name>
          , J.:
          <article-title>Lc-quad: A corpus for complex question answering over knowledge graphs</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          . pp.
          <volume>210</volume>
          {
          <fpage>218</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Ture</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jojic</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>No need to pay attention: Simple recurrent neural networks work!</article-title>
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>2866</volume>
          {
          <fpage>2872</fpage>
          . ACL, Copenhagen, Denmark (Sep
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tice</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          :
          <article-title>Building a question answering test collection</article-title>
          .
          <source>In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <volume>200</volume>
          {
          <issue>207</issue>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Wasim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahmood</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
          </string-name>
          , U.G.:
          <article-title>A survey of datasets for biomedical question answering systems</article-title>
          .
          <source>International Journal of Advanced Computer Science and Applications</source>
          <volume>8</volume>
          (
          <issue>7</issue>
          ),
          <volume>484</volume>
          {
          <fpage>488</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Woods</surname>
            ,
            <given-names>W.A.</given-names>
          </string-name>
          , WA, W.:
          <article-title>Lunar rocks in natural english: Explorations in natural language question answering</article-title>
          . (
          <year>1977</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>