<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Question Embeddings for Semantic Answer Type Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eleanor Bill</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ernesto Jiménez-Ruiz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>City, University of London</institution>
          ,
          <addr-line>London, EC1V 0HB</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper considers an answer type and category prediction challenge for a set of natural language questions, and proposes a question answering classification system based on word and DBpedia knowledge graph embeddings. The questions are parsed for keywords, nouns and noun phrases before word and knowledge graph embeddings are applied to the parts of the question. The vectors produced are used to train multiple multi-layer perceptron models, one for each answer type in a multiclass one-vs-all classification system for both answer category prediction and answer type prediction. Different combinations of vectors and the effect of creating additional positive and negative training samples are evaluated in order to find the best classification system. The classification system that predict the answer category with highest accuracy are the classifiers trained on knowledge graph embedded noun phrases vectors from the original training data, with an accuracy of 0.793. The vector combination that produces the highest NDCG values for answer category accuracy is the word embeddings from the parsed question keyword and nouns parsed from the original training data, with NDCG@5 and NDCG@10 values of 0.471 and 0.440 respectively for the top five and ten predicted answer types.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Web</kwd>
        <kwd>knowledge graph embedding</kwd>
        <kwd>answer type prediction</kwd>
        <kwd>question answering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <sec id="sec-2-1">
        <title>The SMART challenge</title>
        <p>
          The challenge [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] provided a training set of natural language questions alongside a
single given answer category (Boolean, literal or resource) and 1-6 given answer types.
The provided datasets contain 21,964 (train - 17,571, test - 4,393) questions. The
challenge was to achieve the highest accuracy for answer category prediction and highest
NDCG values for answer type prediction. The methodology described in this paper
attempts to achieve this using knowledge graph and word embeddings alongside
question parsing and multi-layer perceptron models.
        </p>
        <p>The target ontologies and pre-computed knowledge graph embeddings used in this
project are built on the free and open knowledge graph DBpedia. DBpedia is a
knowledge graph consisting of grouped information from various Wikimedia projects.</p>
        <p>
          Copyright © 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
It can be queried with SQL-based query languages such as SPARQL. The English
language version classifies 4.2 million items with a variety of semantic types such as
peoples and places [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The pre-trained word embeddings from fastText were also used in
the question answering system [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
1.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Question answering</title>
        <p>
          Question answering is a computer science field concerned with answering questions
posed to a system in natural language. The goal of a question answering system is to
retrieve answers to natural language questions, rather than full documents or relevant
passages as in most information retrieval systems. Open domain QA systems are
commercially available in products such as Apple’s Siri and Amazon’s Alexa [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] while
closed domain QA systems are used in specialized fields, such as medical diagnosis.
        </p>
        <p>Question answering systems rely on very large datasets of collated information, such
as Wikidata – a knowledge base which is parsed from Wikipedia. These knowledge
bases can be structured in different ways: relational databases, or in Wikidata’s case, as
a knowledge graph, to ensure maximum efficiency and to allow for more meaningful
links within the objects in the knowledge base.</p>
        <p>
          Many modern QA algorithms learn to embed both question and answer into a
lowdimensional space and select the answer by finding the similarity of their features [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          A question answering process ultimately needs to fulfil three steps: to parse the
natural language question to allow for meaningful understanding of what the user is asking,
to retrieve the relevant facts from the knowledge base, and to deliver the facts in an
appropriate answer [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
1.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Knowledge graphs</title>
        <p>
          Knowledge graphs are a type of knowledge base: information is represented in a data
structure. A knowledge graph is an abstract representation of information in the form
of a directed labeled multi-graph, where nodes represent entities of interest and edges
represent relations between these entities. Facts join these rules in the graph in the form
of subject-predicate-object triples (or head, relation, tail) e.g., London, capital, United
Kingdom [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] where the fact represented by this link in the knowledge graph is that
London is the capital of the United Kingdom.
        </p>
        <p>The meaning, or semantics, of the data in the knowledge graph is encoded alongside
the data, in the form of the ontology. An ontology is a set of logical rules that allow
inference about the data in the knowledge graph that govern the object types and
relationships between entities. This inference can allow implicit information to be derived
from the explicit information in the graph. If knowledge graphs are provided with new
information or a change in ontology, they are able to apply the rules to the new
information, and this can in turn increase accuracy in terms of e.g., question answering
classification.</p>
        <p>
          Knowledge graph embeddings differ from each other based on three criteria: the
choice of the representations of entities and relationships, the scoring function and the
loss function. Representations of entities and relationships are most commonly
expressed using vector representations of real numbers, but common alternatives are
matrices and complex vectors. The subject-predicate-object triple can be represented as a
Boolean tensor where Q can equal either True or False. The subject and object entities
and the predicate are mapped to their latent representations and then to probability P ∈
[
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]. The scoring function estimates the likelihood of the subject-predicate-object
triple by aggregating the information coming from it. The loss function defines the
element of the system that needs to be minimised during knowledge graph embedding
model training in order to provide the best result on test data [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
2
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related work</title>
      <p>
        Answer type prediction systems take form in many ways and pull on many parts of the
question. Bogatyy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] applies a transition-based neural network to parse syntactic
structure of a question and using this infers the answer type. To do this, Bogatyy’s model
annotates parts of the query using a type inventory (specifically Freebase, the precursor
to Wikidata) to represent the types they may be associated with. For example, ‘who’ is
annotated with ‘people’. This is done both at a coarser level - as with ‘who’ - and at a
finer level, as with a phrase such as ‘horror movie’. The syntactic structure of the query
is then annotated using globally normalized transition-based neural network parser.
This model outperforms a logistic regression baseline that only uses the type inventory
and doesn’t take into account syntactic structure of the queries.
      </p>
      <p>
        Answer type prediction can be used to improve semantic parsing, as in Yavuz et. al
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Their model uses a bidirectional LSTM model to infer answer types in conjunction
with semantic parsing, which maps a natural language question into its semantic
representation – logical form that relates to meaning stored structurally in knowledge bases.
It does this by recursively computing vector representations for the entities present in
the question. Syntactic features are also used here in the form of dependency trees,
before constructing a bidirectional LSTM neural network over this final representation
of the question and predicting the answer type. Again, a combination of different
methods, models and parsers improve the accuracy of the question answering system.
      </p>
      <p>
        Knowledge bases may suffer from lack of completion and potentially outdated or
missing information. To counteract this, Sun. et. al [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] use knowledge bases alongside
information mined directly from the Web to successfully improve question answering
F1 score. Another source of inaccuracy from lack of information may be due to
infrequency of answers and answer types within test data. Murdock et. al [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] adapted to the
challenge associated with finding specialist answers to questions that featured on the
quiz show Jeopardy. They predicted fine-grain answer types using DeepQA to analyse
the question and produce candidate answers independent of answer type before using
type coercion to choose the candidate answer most likely to be correct.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>The answer type prediction system</title>
      <p>
        The steps within the development of this system involve the parsing of the natural
language questions, the reuse of vectors using knowledge graph embeddings and/or word
embeddings and the parsed parts of the questions, the training of MLP classifiers using
these vectors and the evaluation of the classifiers. The system is implemented in
Python, using pre-existing libraries and knowledge bases: fastText [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], DBpedia [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and
SPARQL endpoint [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        The system uses word and knowledge graph embeddings on the parsed questions
the word embeddings of the key 'wh' part of the question (what/when/why etc.), word
embeddings of the parsed nouns, knowledge graph embeddings of word embeddings of
the parsed noun phrases and word embeddings of the types of the knowledge graph
entities. A knowledge graph access interface [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is used to query DBpedia via
SPARQL, the standard query language for Linked Open Data and semantic graph
databases. Multi-layer perceptron classifiers, one per type/category, are trained on the
concatenated vector, and the similarly vectorised test data is given a probability
estimate for each type by the related classifier. The top ten types in terms of probability are
the type results. The categories are found in a similar way, with only the top type in
terms of probability provided for evaluation.
      </p>
      <p>
        Additional heuristics are also applied to the system to increase performance. Some
of which are provided by the SMART challenge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
- If the category is "resource", answer types are ontology classes from the
      </p>
      <p>DBpedia ontology.
- If the category is "literal", answer types are either "number", "date", "string"
or "Boolean" answer type.</p>
      <p>- If the category is "Boolean" the answer type is always "Boolean".
3.1</p>
      <sec id="sec-4-1">
        <title>Question parsing</title>
        <p>
          The question is first parsed for nouns using the Natural Language Toolkit (NLTK) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
word_tokenize() and pos_tag() functions, which splits the question up into words and
tags them with parts-of-speech taggers. The NN (common nouns), NNP (proper nouns),
NNS (common noun plural forms) and NNPS (proper noun plural forms) tags are all
collectively treated as nouns and added to a noun list for each question. Noun phrases
(names etc.) are then parsed using the noun phrases function in TextBlob, a Python
library for processing textual data.
        </p>
        <p>Finally, the ‘wh’ part of the question is extracted ('why', 'where', 'when', 'how',
'which' etc.) using a search function. This is done to extract the option most likely to be
important to the question as judged by the list above, e.g., ‘where’ is prioritised above
‘when’, ‘when’ above ‘how’. This encompasses all the questions so the majority of
them return a question keyword, while acknowledging how some of the questions were
worded. For example, in the question “Where is the Empire State Building?” the
‘where’ is most important, whereas in “Is the Empire State Building in New York?” the
‘is the’ is the most important, although both questions contain ‘is the’. This list was
generated by iteratively looking through the questions in the question data to check the
existing list covered them, and if it didn’t a new keyword from that question was added
to the list. The ordering of the list was subjective.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Building positive and negative samples</title>
        <p>Additional positive and negative training samples are created in order to optimize
performance of the classifiers. Negative training samples are created in two ways. The
first: by shuffling the answer categories and types independent of the questions to get a
random answer category/type associated with the questions. This is a coarse way of
generating negative samples, and in likelihood leads to a less precise and accurate
classifier than swapping out answer categories and types close to the original, according to
the class hierarchy.</p>
        <p>The second method of creating negative samples involves using sibling types of
types associated with the parsed question, accessed through the ontology. For the
example type ‘gymnast’, the ancestor type ‘athlete’ is accessed and then a disjoint type,
descendant from the ancestor such as ‘basketball player’ is returned as a sibling type.
This is an example of more fine-grained negative samples, which are useful for
finetuning the classifiers.</p>
        <p>Both methods of creating negative samples are used in the system, future work
could use these individually to compare which has a greater effect on the results of the
classification accuracy.</p>
        <p>
          Additional positive training data is created by getting alternative but similar entities
to those associated with the parsed question, using the SPARQL endpoint. For example:
- The original entity: http://dbpedia.org/resource/Serena_Williams
- The associated type: http://dbpedia.org/ontology/TennisPlayer
- The similar entity: http://dbpedia.org/resource/Roger_Federer
This is implemented using the endpoint class in the KG access framework [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], which
uses an API to access related entities and classes in the knowledge graph. Here, the
getEntitiesForDBPediaClass() function is used.
        </p>
        <p>The vectors used to train the classifier consist of several concatenated vectors based
on the above parsed parts of the question and word and knowledge graph embedding
applications. The structure of the final concatenated vector is as follows:
- First position: the word embedding of the ‘wh’ question part.
- Second position: the word embedding of the set of nouns.
- Third position: the knowledge graph embedding of the noun phrases.
- Fourth position: the word embedding of the types (e.g.</p>
        <p>'http://dbpedia.org/ontology/OfficeHolder') of the found KG entities.
3.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Training and using classifiers</title>
        <p>A separate binary classifier is built for each semantic type and category in the data, the
type in this case being the most fine-grained type associated with the question. The
system then uses a one-vs-all strategy, in which binary classification for each task is
used collectively as multi-class classification. Given a classification problem with N
possible classes, N binary classifiers are trained - one for each possible outcome. These
classifiers are trained on the previously described positive and negative specific and
general samples built using pre-computed word and knowledge graph embeddings on
the parsed questions, in the form of a stack of word vectors.</p>
        <p>The model used in this project to train the classifiers is a multi-layer perceptron, a
feed-forward, supervised neural network composed of several layers of neurons. An
MLP model has a minimum of three layers: input, output and at least one hidden layer
of neurons which extract appropriate features and weight components of the input layer.
An MLP can be made more complex by increasing the number of hidden layers, should
that be required.</p>
        <p>Once all classifiers are trained, test data can be passed through the classifiers and
generate probabilities of the test data belonging to that category/type. The higher
probability categories/types are considered those most likely to be associated with that test
data point.</p>
        <p>
          Heuristics In addition to the classifiers, two sets of heuristics are applied to the system:
the first applying rules based on the question keyword part of the question, and the
second based on the rules described by the SMART challenge described above [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. This
helped improve the performance of the classifiers.
3.4
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>Evaluation</title>
        <p>The evaluation of the system uses the evaluation script provided by the challenge.
Answer category predication is considered as a multi-class classification problem and
accuracy score as the key performance metric, as there are only three answer categories
and little ambiguity between them. Accuracy is calculated via a direct comparison
between the predicted type as returned by the system and the actual type.</p>
        <p>
          The pre-written evaluation code uses hierarchical target type identification for
assessing accuracy, in which the system provides a list of ten most likely target types and
these are evaluated by judging the semantic distance between them and the actual types
in the test data, applying decay as the ranking increases. The specific metric used is the
metric lenient NDCG@k with a linear decay [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. This allows several ranked potential
types to be evaluated as a whole, giving a tailored overview of the accuracy of the
classifier.
4
4.1
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <sec id="sec-5-1">
        <title>System parameters</title>
        <p>System parameters to consider included length of the vector, amount of training data
per classifier, ratio of positive and negative samples per classifier. The lengths of the
vectors were 300 floats long for word embeddings and 200 floats long for the
knowledge graph embeddings, the concatenated vectors being however long the vectors
were individually combined. The amount of training data depended on how much was
available for each type/category but was always more than or equal to 22 vectors, due
to the creation of additional training data. Ideally this would have been greater, but the
time taken to create more positive vectors constrained this. The ratio of positive and
negative samples per classifier was always a 1:1 ratio.</p>
        <p>The hyperparameters of any MLP classifier are the number of hidden layers, the
momentum and the learning rate. The number of hidden layers was set to the default of
the scikit-learn MLP classifier, which was a single hidden layer with 100 neurons. The
momentum and learning rates were also the defaults. The maximum number of
iterations (in case of convergence taking longer) was set to 300 for each classifier.
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Original vs. additional training data</title>
        <p>Comparing the results from just using the original training data versus using the original
training data with additional positive samples, and new negative samples allowed
evaluation of the quality of the manufactured training data and discussion on whether the
original training data was sufficient. The original training data was fairly limited in
terms of spanning the distribution of types it did - with 17,528 rows and 310 types
sometimes a type only had one or two samples with which to train a classifier. The
creation of new positive training data allowed this to be expanded somewhat, although
due to time limitations not as much additional training data was created as originally
desired. 158,264 new positive samples were created and 28,557 new negative samples.
While the overall data pot was still unbalanced, this increase in training data allowed
each classifier to be trained on a balanced sample of positive and negative vectors that
was larger than could be used with just the original training data. The more relevant
negative samples also allowed the classifiers to be more fine-tuned.
4.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Vector structure comparisons</title>
        <p>Tests were carried out in which different parts of the vectors were used to train the
classifiers. This allowed for a better understanding of the quality of different parts of
the vectors, for example the word embedding of the noun phrases was relatively
unsuccessful compared to the knowledge graph embedding of the noun phrases or word
embedding of the nouns, so this vector was not used in the final classifier training
experiments and was deleted from the final concatenated vector. The results of these tests are
in the tables below.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Results Table 1. Original training data results.</title>
      <p>WE ‘wh’ + WE nouns + KGE noun phrases
WE ‘wh’ + KGE noun phrases + WE types of KGE entities
All vectors combined</p>
    </sec>
    <sec id="sec-7">
      <title>Discussion and conclusions</title>
      <p>Discussion of results
The better the category accuracy results are, the worse the type NDCG values, with a
negative correlation of -0.89. This is unexpected, as one would expect a correct answer
category prediction to facilitate a more accurate answer type prediction. It also makes
it difficult to know which vector combination performs best overall. Also unexpected
is that the additional created training data seems to produce no benefit in terms of
classifier accuracy gain. The answer category prediction accuracy from the classifiers
trained on the additional training data is a little higher than those trained on only the
original training data, but the answer type prediction results are better with the original
training data.</p>
      <p>Individually, the classifiers trained on knowledge graph embedded noun phrases
vectors from the original training data produced the highest accuracy for answer
category prediction, while the classifiers trained on the word embeddings from the parsed
question keyword and nouns parsed from the original training data produced the highest
NDCG values for answer type prediction. Across all combinations, the NDCG@5
values are always higher than the NDCG@10 values - this makes sense considering the
more types predicted by the system, the less relevant they become as their ranking gets
lower.</p>
      <p>The final test set results submitted to the task returned an answer category prediction
accuracy of 0.79 and answer type NCDG values of 0.31 and 0.30 respectively. These
results were gathered from their own test set so are not the same as those in the tables
included here. The category prediction accuracies for the other participants in the
challenge ranged from 0.74 to 0.98, while the NDCG values ranged from 0.54 to 0.79. For
category accuracy, this places this system within a good range of results, while the type
prediction under-performed compared to other submissions.
6.2</p>
      <sec id="sec-7-1">
        <title>Further conclusions and future work</title>
        <p>Given longer to work on the project, I would experiment with optimising
hyperparameters of the models and experimenting with different models to see if performance
improved.</p>
        <p>Due to the generalised question data, the wide ontology and the large knowledge
base it uses, I believe this system could be similarly applied to other natural language
questions and achieve similar results. The results are reproducible across different test
data sets.</p>
        <p>There are several unanswered questions pertaining to the system results I would like
to investigate further, for example the ineffectiveness of additional training data in
terms of performance, and why an increase in answer category prediction accuracy is
correlated with a decrease in NDCG values for the answer type prediction. A good
starting point could be to compare the strategies that were used to generate more
samples in terms of their effects on the accuracy of the classification system.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Mihindukulasooriya</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gliozzo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngonga</surname>
            <given-names>Ngomo</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C.</given-names>
            ,
            <surname>Ricardo</surname>
          </string-name>
          ,
          <string-name>
            <surname>U.</surname>
          </string-name>
          ,
          <article-title>SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Semantic Web Challenge</article-title>
          . CoRR/arXiv/abs/
          <year>2012</year>
          .00555 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. DBpedia, https://wiki.dbpedia.org/about, last accessed
          <year>2020</year>
          /10/20.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jégou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
          </string-name>
          , T.:
          <article-title>FastText.zip: Compressing text classification models</article-title>
          .
          <source>arXiv:1612</source>
          .03651 [cs] (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
          </string-name>
          , V.:
          <article-title>A Survey of Text Question Answering Techniques</article-title>
          .
          <source>In: International Journal of Computer Applications</source>
          <volume>53</volume>
          , no.
          <issue>4</issue>
          :
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woloszyn</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barone</surname>
            ,
            <given-names>D. A. C.</given-names>
          </string-name>
          :
          <string-name>
            <surname>When</surname>
          </string-name>
          , Where, Who,
          <article-title>What or Why? A Hybrid Model to Question Answering Systems</article-title>
          .
          <source>In: Computational Processing of the Portuguese Language</source>
          ,
          <fpage>136</fpage>
          -
          <lpage>46</lpage>
          . Lecture Notes in Computer Science. Cham: Springer International Publishing (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Deep Learning in Question Answering</article-title>
          .
          <source>In: Deep Learning in Natural Language Processing</source>
          , p.
          <fpage>185</fpage>
          -
          <lpage>217</lpage>
          . Singapore: Springer, (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bianchi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossiello</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Costabello</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmonari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minervini</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          '
          <article-title>Knowledge Graph Embeddings and Explainable AI'</article-title>
          . ArXiv:
          <year>2004</year>
          .14843 [Cs]
          <article-title>(</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Bogatyy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <article-title>Predicting answer types for question-answering, (</article-title>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Yavuz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivatsa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <article-title>Improving Semantic Parsing via Answer Type Inference</article-title>
          ,
          <source>In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <fpage>149</fpage>
          -
          <lpage>159</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Ma, H.,
          <string-name>
            <surname>Yih</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Open Domain Question Answering via Semantic Enrichment,
          <source>In: WWW '15: Proceedings of the 24th International Conference on World Wide WebMay</source>
          ,
          <fpage>1045</fpage>
          -
          <lpage>1055</lpage>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Murdock</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welty</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrucci</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gondek</surname>
            ,
            <given-names>D. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Kanayama</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <article-title>Typing candidate answers using type coercion</article-title>
          ,
          <source>In: IBM J. Res &amp; Dev</source>
          . Vol.
          <volume>56</volume>
          , (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <article-title>Enriching Word Vectors with Subword Information</article-title>
          , arXiv:
          <fpage>1607</fpage>
          .
          <fpage>04606</fpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. W3.org, SparqlEndpoints, https://www.w3.org/wiki/SparqlEndpoints, last accessed
          <year>2020</year>
          /11/30.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Loper</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Natural Language Processing with Python</surname>
          </string-name>
          ,
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumayer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <article-title>Hierarchical target type identification for entity-oriented queries</article-title>
          .
          <source>In: Proceedings of the 21st ACM International Conference on Information and Knowledge</source>
          Management - CIKM '
          <fpage>12</fpage>
          . (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>E.J.</given-names>
          </string-name>
          ,
          <source>Tabular Data Semantics for Python</source>
          , https://github.com/ernestojimenezruiz/tabular
          <article-title>-data-semantics-py</article-title>
          ,
          <source>last accessed</source>
          <year>2020</year>
          /10/20.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>