<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards an Approach based on Knowledge Graph Refinement for Relation Linking and Entity Linking</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Azanzi Jiomekong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brice Foko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vadel Tsague</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Uriel Melie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gaoussou Camara</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Yaounde I</institution>
          ,
          <addr-line>Yaounde</addr-line>
          ,
          <country country="CM">Cameroon</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Unité de Formation et de Recherche en Sciences Appliquées et des TIC, Université Alioune Diop de Bambey</institution>
          ,
          <addr-line>Bambey</addr-line>
          ,
          <country country="SN">Sénégal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>This paper presents our contribution to the SMART 3.0 Relation Linking and Entity Linking research problems. This contribution consists of an approach based on knowledge graph refinement techniques for completing the graph with missing entities and relations. The model defined is inspired by the TransE algorithm and a scoring function that defines the distance between two nodes in the graph. The scoring function for Relation Linking is defined using Naive Bayes. Concerning Entity Linking, we identify and extract candidate terms from questions. Thereafter, these terms are used to search for relevant entities in the knowledge graph using LookUp and Wikibase API.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Question Answering</kwd>
        <kwd>Relation Linking</kwd>
        <kwd>Entity Linking</kwd>
        <kwd>Knowledge Graphs Refinement</kwd>
        <kwd>Wikidata</kwd>
        <kwd>DBpedia</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>{
" Q u e s t i o n " : " Where i s U n i v e r s i t y o f</p>
      <p>Yaounde I l o c a t e d ? " ,
" r e l a t i o n s " : [
" dbo : c o u n t r y " ,
" dbo : s t a t e " ,
" dbo : c i t y "
]
}
}
SPARQL Query with RL
SELECT DISTINCT ∗ WHERE {
dbr : U n i v e r s i t y _ o f _ Y a o u n d _ I dbo :</p>
      <p>c o u n t r y ? o1 .
dbr : U n i v e r s i t y _ o f _ Y a o u n d _ I dbo : s t a t e</p>
      <p>? o2 .
dbr : U n i v e r s i t y _ o f _ Y a o u n d _ I dbo : c i t y</p>
      <p>? o3 .</p>
      <p>Result: 3 good results
tre_Region_(Cameroon), :Yaoundé
:Cameroon,
:CenSELECT DISTINCT ∗ WHERE{
dbr : U n i v e r s i t y _ o f _ Y a o u n d _ I</p>
      <p>? p ? o .
}
Result: 40 results
sponding SPARQL query. The illustration is given in table 1. This example shows that Relation
Linking is a crucial component to enable QA over Knowledge Graphs.</p>
      <p>
        SMART 2021 was devoted to the Answer Type prediction task and Relation prediction task.
However, only 2 (25%) papers were proposed for the relation prediction task. Comparison of
related work results are presented by the table 2. This can be normal because relation linking
for question answering is known to be a hard task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In efect, some questions have multiple
relations, some relations are semantically far and sometimes tokens deciding the relations are
spread across the question. On the other hand, some relations are implicit in the text, and there
are lexical gaps in relation surface from the KG properties labels.
      </p>
      <p>
        System KG
Khaoula et al. Wikidata
Nadine [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
Khaoula and DBpedia
Nadine [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
Thang et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] DBpedia
Thang et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] Wikidata
      </p>
      <p>Approach
BERT
BERT
BERT
BERT</p>
      <p>Techniques
fastai+data augmentation</p>
      <p>Precision Recall
0.75094 0.8163
fastai+data augmentation</p>
      <p>
        On the other hand, Entity Linking [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] consists of identifying entities in a question given
{
" Q u e s t i o n " : " Which i s t h e p r o f e s s i o n o f
      </p>
      <p>Ousmane Sonko ? " ,
" e n t i t i e s " : [ " h t t p : / / d b p e d i a . o r g / r e s o u r c e /</p>
      <p>Ousmane_Sonko " ]
SELECT DISTINCT ∗ WHERE{</p>
      <p>
        ? s dbr : o c c u p a t i o n ? o .
}
Result: more than 10,000 results
in natural language and linking them to the KG, so that they can be used for retrieving the
correct answer. In efect, the ability to know the entities hidden in the question of a user query
expressed in natural language allows us to signicfiantly narrow down the search space for an
answer [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For instance, given the following question: "Which is the profession of Ousmane
Sonko?" It is possible to reduce the search space from more than 10,000 entities while querying
all the 1-hop entities of the "Ousmane_Sonko” resource in DBpedia "dbr:Ousmane_Sonko
dbp:occupation" to one good result. This is done by predicting the following entity: http:
//dbpedia.org/resource/Ousmane_Sonko. Thereafter, this entity is used to translate the user
question into its corresponding SPARQL query. The illustration is given in the table 3. This
example shows that EL is a crucial component in KGQA systems.
      </p>
      <p>
        To solve the EL and RL research problems, SMART 3.0 provides us with two versions of
large-scale dataset: one dataset is a DBpedia dataset (composed of 760 classes) and the second
one is a Wikidata dataset (composed of 50K classes). In addition to these datasets, a restricted
vocabulary of all the relations that are used in the datasets for the RL task is provided. Using
these datasets we are proposing a solution to the EL and RL problems. This solution is based on
Knowledge Graphs refinement techniques [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Whatever the method used to construct KGs,
it is known that they will never be perfect. Thus, to increase their utility, various refinement
methods are proposed to add missing knowledge such as missing entities, missing types of
entities, and/or missing relations that exist between entities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To complete a knowledge
graph, internal methods use information hidden in the graph and external methods use external
knowledge sources such as text corpora, existing vocabularies, ontologies and/or knowledge
graphs. In this research, we are using external KG refinement techniques for link prediction
and entity discovery. Thus, to solve the RL task, we are considering the restricted vocabulary
provided as our external knowledge. To solve the EL task, we consider the DBpedia knowledge
graph as our external knowledge and our goal will be to identify terms in the question and map
these terms in the graph in order to determine the most relevant entities.
      </p>
      <p>To implement our solutions, we needed to set up a development environment. Thus, we used:
• Operating system: Ubuntu,
• Programming languages: JavaScript,
• JavaScript library: Natural4.</p>
      <p>The source code is provided on GitHub5.</p>
      <p>In the rest of the paper, we present the Relation Linking task in Section 2, the Entity Linking
task in Section 3 and the conclusion in Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Relation Linking</title>
      <p>To solve the Relation Linking task, we processed as follow: analysis of the dataset (Section 2.1),
model definition (Section 2.2) and model training and use (Section 2.3).</p>
      <sec id="sec-2-1">
        <title>2.1. Analysis of the dataset</title>
        <p>The aim of this step was to gather, portray and provide a deeper understanding of the structure
of the dataset so as to provide eficient solutions. As presented by the organizers, SMART 3.0
datasets are designed to provide one or many relations given a question in natural language.</p>
        <p>To understand the RL task we thought it necessary to investigate a subset of this dataset. Thus,
iffty questions for each dataset were randomly selected and their annotations automatically
removed. Once the annotations were removed, a manual annotation of this subset of the dataset
took place.</p>
        <p>The main findings from these datasets:
• Data are not equally distributed amongst the diferent relations;
• Questions types can be divided into Wh-, How many, Of which , Thus, Did, etc. The
taxonomy presented by the Fig. 1 presents the details;
• Questions types can be used to identify a type of relation in the vocabulary.
Given the finding of the analysis of the dataset, we build the question taxonomy, presented by
the Fig. 1. This taxonomy will be used to build the model.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Model definition</title>
        <p>The aim of Model definition step was to define a model that can be applied to any dataset
to solve the Relation Linking research problem. The first thing we did was to define a graph
structure that can be used to represent questions and the relations to predict (see Fig. 2). This
consists of matching the taxonomy of the Fig. 1 to the corresponding relations. Thereafter,
each question is matched to the corresponding node in the questions taxonomy. The idea
is to say that, if a question is linked to a node in the taxonomy and this node is linked to a
relation in the vocabulary, thus, this question is linked to this relation. Thus, we defined the
"hasEmbedding" relation between a question and its embedding in the taxonomy. To define</p>
        <sec id="sec-2-2-1">
          <title>4https://github.com/NaturalNode/natural 5https://github.com/jiofidelus/tsotsa/tree/SMART_22</title>
          <p>
            this relation, knowledge should be extracted from the question. Knowledge extraction is the
creation of knowledge from structured (relational databases, XML), semi-structured (source
code) and unstructured (text, documents, images) sources [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. The current case consists of
extracting terms from unstructured sources which are questions.
          </p>
          <p>We define the Relation Linking dataset as a multi-relation data using a Knowledge graph (see
Fig. 2) given by the quadruple  = (, , ,  ):
•  = { ∈  } the set of nodes. These nodes are questions, embedding vectors and
relations that are hidden in questions;
•  = {(, ,  )} the set of edges. These are triples, with node  connected to node 
via the relation . These are the relations between two questions, a question and the set
of relations hidden in this question;
•  = {()} denotes the node types. The type of nodes relations are their labels and the
types of node questions are their id and the title of the question;
•  = { ∈ } denotes the relation types. These are the labels of the relations.
In addition to this graph, we suppose that we have an external knowledge source (vocabulary
provided by the organizers) that can be used to complete the graph with missing knowledge.</p>
          <p>The nodes in the graph of the Fig. 2 contains the following relations:
• hasEmbedding: this relation defines how closeness a question is with an embedding
vector defined in the taxonomy of question. This is very useful because when two
questions have the same embedding in the taxonomy, they will have the same relation(s).
• "hasRelation": represents the relation between a node and a term in the reference
vocabulary.</p>
          <p>Given this new configuration of the training dataset, we reduced the problem of Relation
Linking to the problem of KG completion. To solve this problem, our goal will be to provide an
eficient tool to complete the graph by adding new facts from the external vocabulary furnished
by the SMART organizers. Thus, our task is to predict the link between a question and a set of
relations.</p>
          <p>
            Inspired by the TransE algorithm [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] for learning low-dimensional embedding of entities, we
defined the model presented by the equation 1. We consider that relationships are represented
as translations. The model is defined as a function which takes the node and a relation and
predicts all the nodes that are related to this relation. This prediction is based on the proximity
between two nodes.
(1)
(2)
 : ×  ×  →−
          </p>
          <p>R
(, ,  ) = || +  −  ||
v +  ≈  if (, ,  ) ≈ 
The function  is used to verify if the triples in the knowledge graph holds. The triple (, ,  )
holds if (, ,  ) ≈  - in this case, we can say that  +  ≈  .  being the model
parameter. If  reaches a certain value (or belongs to a certain interval of values), thus, we can
say that  and  are linked with the relation .</p>
          <p>The model that we just defined is based on node embedding, the nodes being the questions
and the list of relations of this question. Thus, we should find a way to model these nodes in
such a way that they can be used to train our models to predict relations. This is done using the
"question vector (question2vect)" function.</p>
          <p>Before the models were trained, the first thing we did was to encode the questions so as to
simplify the definition of (Question, hasEmbedding, Vector) triples. We extracted knowledge
from the question title and used this knowledge to define the embedding vector of each question.
To this end, we suppose that a question can be well represented by some terms in its title and we
define the 2 function (see equation 2). This function takes a question and returns
a vector containing the terms used as the embedding of this question.</p>
          <p>2() = (1, 2, ..., )</p>
          <p>In equation 2, (,  &lt;= ) denotes the diferent terms (made up of words or group of words)
in the question title that can be representative of the question. The "question2vect" algorithm is
given by the listing 1.</p>
          <p>Listing 1: Question2vect algorithm to transform a question into a vector</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Models training and use</title>
        <p>We used the equation 1 to define specific models for each relations.</p>
        <p>• Learning the "hasEmbedding" relation: the function ℎ is given by the
equation 3.</p>
        <p>ℎ(, ℎ,  ) = | −  | = 
(3)
To evaluate the function ℎ, we are using the Levenshtein distance. The
overall algorithm is given by the listing 2.</p>
        <sec id="sec-2-3-1">
          <title>Listing 2: f_hasEmbedding algorithm</title>
          <p>I n p u t :</p>
          <p>Q : Array o f S t r i n g / / A l l words c o n t a i n e d i n t h e q u e s t i o n t i t l e
V : V e c t o r o f t e r m s / / The t e r m s c o n t a i n e d i n t h e Embedded v e c t o r
Output :
a l p h a : I n t / / T h i s i s t h e d i s t a n c e between Q ( t h e q u e s t i o n ) and V (
t h e embedding v e c t o r )
S t e p s :
1 ) n = number o f s t r i n g c o n t a i n e d i n Q
2 ) m = number o f t e r m s c o n t a i n e d i n V
/ / R e c u r s i v e i m p l e m e n t a t i o n o f t h e L e v e n s h t e i n d i s t a n c e
3 ) C a l c u l a t e t h e d i s t a n c e between Q and V
3 . 1 ) i f m=0 r e t u r n n
3 . 2 ) i f n=0 r e t u r n m
3 . 3 ) i f n=m do HasEmbedded ( Queue (Q) , Queue ( V ) )
3 . 4 ) e l s e r e t u r n 1+ min ( HasEmbedded ( Queue (Q) , V ) , HasEmbedded ( Q ,</p>
          <p>Queue ( V ) , HasEmbedded ( Queue (Q) , Queue ( V ) ) )
/ / The f u n c t i o n min ( x_1 , . . . , x_n ) o f s t e p ( 3 . 4 ) r e t u r n s t h e minimum
v a l u e o f i t s p a r a m e t e r s
/ / The f u n c t i o n Queue ( S ) t a k e s i n p a r a m e t e r t h e q u e s t i o n t i t l e and r e t u r n
an a r r a y o f s t r i n g c o n t a i n i n g t h e words whose composed t h e q u e s t i o n
e x c e p t t h e f i r s t word
To extract knowledge from questions, we used Term Frequency–Inverse Document
Frequency (TF-IDF) technique. We identified and removed the less frequent and stop
words in questions to obtain the information that can be used to well represent the
question. These information were used to define the ℎ model.
Globally, two vectors are closed if  ≈  ,  being the model parameter defined during
the training process. During the experiment, we vary  to obtain diferent performances.
• Learning "hasRelation": To predict "hasRelation", relation, we consider that each
embedding node has a probability to belong to a set of relations. The scoring function defining
this probability can thus be obtained using statistical learning, Naive Bayes, etc. In this
research we used Naive Bayes. The algorithm 3 describes how we determine the relations
between a node embedding and a set of relations.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>Listing 3: f_hasRelation algorithm</title>
          <p>I n p u t :</p>
          <p>V : a v e c t o r o f embedded v e c t o r s
Candidates relations to be predicted</p>
          <p>Given in the external vocabulary</p>
          <p>To train the ℎ model, we used DBpedia and Wikidata datasets, with the goal to
determine the triples (V, hasRelation, R). This consists of filling the matrix presented by the Fig.
3 with values, so that each value represents the probability that a term in the embedding vector
has as relation an element of the vocabulary.</p>
          <p>Z =</p>
          <p>One question per row</p>
          <p>Model refinement consists of extending the taxonomy of question to obtain a larger number
of nodes. This is done by varying the parameter  given in the equation 1. To finalize this work,
we applied our approach (with embedding vectors of diferent sizes) on the test data and we
got our results analyzed by the SMART 3.0 organizers. The results by considering diferent
embedding vectors are presented by the table 4 for DBpedia and table 5 for Wikidata. In these
tables:</p>
          <p>
            Method
Naive Bayes
Naive Bayes + WH-taxonomy ( = 1)
Naive Bayes + taxonomy ( &gt; 4)
Naive Bayes + taxonomy ( ∈ [
            <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1 − 4</xref>
            ] )
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Entity Linking</title>
      <p>To solve the EL task, a simple software engineering technique consisting of extracting terms
from questions and using these terms to search for entities in the KG were used (see Fig. 4).</p>
      <p>To well understand the EL task, we selected ten questions from the training dataset, we
identified the entities in the question titles and we compared them to the provided answers.
Thereafter, we make simple queries on the DBpedia online using Wikibase6 and DBpedia
LookUp7 API (the listing 4 presents an example) to search for the same entity in the DBpedia
KG.</p>
      <p>Listing 4: Example of the use of LookUp and Wikibase API
/ / Querying u s i n g LookUp API
h t t p s : / / l o o k u p . d b p e d i a . org / a p i / s e a r c h ? query =Ousmane%20 Sonko
/ / Querying u s i n g W i k i b a s e API
h t t p s : / / en . w i k i p e d i a . org /w/ a p i . php ? a c t i o n = query&amp;g e n e r a t o r = a l l p a g e s &amp;prop =
p a g e p r o p s | i n f o&amp;i n p r o p = u r l&amp;ppprop = w i k i b a s e _ i t e m&amp;g a p l i m i t =5&amp; gapfrom =Ousmane
%20 Sonko</p>
      <p>Globally, we reduced the EL task into two sub-problems: entity retrieval in question and
entity mapping to the KG (illustrated by the Fig. 4). Thus, for each question :
• Search all terms candidates that might be entities in the question . In our case, a term
can be a word or a group of words. Examples of terms can be "Tchad", "Success Masra".
• Use the terms identified to search for candidates entities in the KG.</p>
      <p>If we take the example of the question "Where was Maurice Kamto borned?". From this
question, the terms candidates are "Maurice", "Kamto", "was Maurice", "Maurice Kamto", "Kamto</p>
      <sec id="sec-3-1">
        <title>6https://www.mediawiki.org/wiki/API:Main_page 7https://lookup.dbpedia.org/</title>
        <p>Large scale</p>
        <p>graph
borned", "was Maurice Kamto", "was Maurice Kamto born". From these terms, we realized
that the verbs are not pertinent words to find entities and we put in place a filter process to
retrieve and remove all the verbs. Applied to our example, we obtain the following candidates:
{"Maurice", "Kamto", "Maurice Kamto"}. These three terms are used to query the KG online to
search for entities. Once the entities are retrieved, we count the occurrence of each result (see
Table 6). The entity having the greatest number of occurrences is selected as the entity we are
searching for. In the case of our example, the entity which has 3 occurrences over 1 for the
others is Maurice_Kamto.</p>
        <p>Entities retrieved
dbr:Maurice_(emperor)
dbr:Mauritius
dbr:Joseph_Fotso
Maurice_Kamto</p>
        <p>NB occurrence
1
1
1
3</p>
        <p>In the example we just presented, the entities were very easy to retrieve. However, the
analysis of the dataset showed that the situation is not always so easy. The analysis of the
diferent types of question showed that the terms to seek for entities can be:
1. A proper name: "Success Masra", "Cameroon", "Ousmane Sonko";
2. A common name: "Marketing",
3. A group of words between two stop words.</p>
        <p>The cases one and two can be easily solved. However, the third case is the most dificult
because one has to determine which are the stop words between the entities. To identify the
entities defined by the third case, we made two tables, one composed of the stop words and
the second one composed of stop words that are generally used to link terms and form entities.
Thus, for each question, we match the question with the stop word and search for the entity in
the KG.</p>
        <p>The stop words are used to delimit the names of entities to be searched in the question. The
stop word table is created using the set of English language stop words. We have added the
term "END_EL" at the end of all the questions. This term is considered as a stop word and is
used to complete the stop words table. The overall stop words are used to build a set of term
candidates. This approach allowed us to define the diferent ways an entity can be presented
in a question. If we take the example: "where is the University of Yaounde 1 ?" The stop word
"of" and "END_EL" are used to define the term "Yaoundé I" and the stop word "the" and "of" are
used to define the term "University" as a candidate term to search for an entity in the KG. Given
that the word "of" is contained in the second table generally used to link words and that "of" is
between two term candidates that are "University" and "Yaounde I", a new term "University of
Yaounde I" is added to the list of term candidates. This approach is illustrated by the Fig. 5 and
listing 5 presents the algorithm.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Listing 5: The algorithm executed to solve the EL task</title>
        <p>I n p u t :</p>
        <p>Q : Array o f q u e s t i o n / / A l l t h e q u e s t i o n s from t e s t d a t a
s t o p w o r d s : Array o f S t r i n g / / The E n g l i s h s t o p words l i s t . Examples : of ,
the , in , or , e t c .
f i l t e r w o r d s : Array o f S t r i n g / / The f i l t e r words l i s t . Examples : ne , of ,
t h e
S t e p s :</p>
        <p>F o r e a c h q u e s t i o n i n Q :
1 ) Remove t h e s p e c i a l c h a r a c t e r s : \ , / " # : %
2 ) Put a l l t h e s t r i n g s i n l o w e r c a s e
3 ) T o k e n i z a t i o n o f r e s u l t o f t h e s t e p 2
4 ) Match t o k e n s o b t a i n e d i n s t e p ( 3 ) w it h { s t o p w o r d s } t o g e t E n t i t i e s
c a n d i d a t e s
/ / The s t e p 5 a l l o w t o o b t a i n t h e embedding v e c t o r V a s s o c i a t e d t o t h e
q u e s t i o n
5 ) Match e n t i t i e s c a n d i d a t e s w it h { f i l t e r w o r d s } t o p r o d u c e and c l e a n
c a n d i d a t e s l i s t
7 ) Use w i k i b a s e API and LookUp API t o s e a r c h e n t i t i e s
8 ) s a v e t h e e n t i t y r e t r i e v e d
We applied this approach on the test data. Table 7 presents the results obtained.</p>
        <p>Precision
0.41999</p>
        <p>Recall
0.54481</p>
        <p>F1-score
0.45729</p>
        <p>We approach the problem of entity linking as software developers. However, this has the
following limits: Set-up the stop_words arrays for the question is a dificult task. On the
other hand, the number of combinations to determine the entity is very high, making the time
complexity very high.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>To solve the RL and EL tasks, this paper proposes an approach based on KG refinement
techniques. This consists of using external knowledge to complete the graph with missing entities
and missing relations.</p>
      <p>Concerning Relation Linking, we used the vocabulary provided by SMART 3.0 organizers as
our external knowledge. Thereafter, we used Naive Bayes to build the embedding matrix that
will be used to predict links between a question and its relation(s). The evaluation on the test
data gave the max Precision of 0.53426, Recall of 0.47036 and F-measure of 0.48661 for DBpedia
and Precision of 0.27901, Recall of 0.31036 and F1-score of 0.28541 for Wikidata.</p>
      <p>Concerning Entity Linking, we used the KG itself as external knowledge. Thus, our goal was
to identify the most relevant terms (one word or a group of words) that can be matched to the
KG and from these terms, determine relevant entities. The EL task was used on DBpedia dataset
only and the evaluation by the SMART 3.0 organizers gave the Precision of 0.41999, the Recall
of 0.54481 and the F1-score of 0.45729.</p>
      <p>
        We started this challenge with no prior knowledge on Entity Linking and Relation Linking
tasks and we proposed a solution. Future work consists of exploring diferent machine learning,
deep learning and natural language processing models that can be used to help us improve
the quality of our results. We particularly found good papers [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] on relation linking and
entity linking [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">9, 10, 11</xref>
        ]. We are planning to explore and test these methods to solve the Entity
Linking and Relation Linking tasks.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We would like to thank the ISWC conference and the SMART challenge organizers and all the
reviewers. This work was partially supported by Neuralearn.ai (Start-up Cameroon).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Rossiello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          , I. Abdelaziz,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bornea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gliozzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naseem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kapanipathi</surname>
          </string-name>
          ,
          <article-title>Generative relation linking for question answering over knowledge bases, 2021</article-title>
          . URL: https://arxiv.org/abs/2108.07337. doi:
          <volume>10</volume>
          .48550/ARXIV.2108.07337.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] Reaching out for the Answer: Relation Prediction</article-title>
          , volume
          <volume>3119</volume>
          of
          <article-title>SeMantic Answer Type and Relation Prediction Task at ISWC 2021 Semantic Web Challenge (SMART2021)</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] The Combination of BERT and Data Oversampling for Relation Set Prediction</article-title>
          , volume
          <volume>3119</volume>
          of
          <article-title>SeMantic Answer Type and Relation Prediction Task at ISWC 2021 Semantic Web Challenge (SMART2021)</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Hua,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stratos</surname>
          </string-name>
          , Entqa:
          <article-title>Entity linking as question answering</article-title>
          ,
          <source>CoRR abs/2110</source>
          .02369 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2110.02369.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Knowledge graph refinement: A survey of approaches and evaluation methods</article-title>
          ,
          <source>Semant. Web</source>
          <volume>8</volume>
          (
          <year>2017</year>
          )
          <fpage>489</fpage>
          -
          <lpage>508</lpage>
          . URL: https://doi.org/10.3233/SW-160218. doi:
          <volume>10</volume>
          .3233/SW-160218.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jiomekong</surname>
          </string-name>
          , G. Camara,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tchuente</surname>
          </string-name>
          ,
          <article-title>Extracting ontological knowledge from java source code using hidden markov models</article-title>
          ,
          <source>Open Computer Science</source>
          <volume>9</volume>
          (
          <year>2019</year>
          )
          <fpage>181</fpage>
          -
          <lpage>199</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia-Durán</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yakhnenko</surname>
          </string-name>
          ,
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          ,
          <source>in: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS'13</source>
          , Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2013</year>
          , p.
          <fpage>2787</fpage>
          -
          <lpage>2795</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Rossiello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          , I. Abdelaziz,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bornea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gliozzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naseem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kapanipathi</surname>
          </string-name>
          ,
          <article-title>Generative relation linking for question answering over knowledge bases</article-title>
          ,
          <source>in: The Semantic Web - ISWC</source>
          <year>2021</year>
          : 20th International Semantic Web Conference,
          <string-name>
            <surname>ISWC</surname>
          </string-name>
          <year>2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          ,
          <source>October 24-28</source>
          ,
          <year>2021</year>
          , Proceedings, Springer-Verlag, Berlin, Heidelberg,
          <year>2021</year>
          , p.
          <fpage>321</fpage>
          -
          <lpage>337</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -88361-4_
          <fpage>19</fpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>030</fpage>
          -88361-4_
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. a.</given-names>
            <surname>Gomes</surname>
          </string-name>
          , R. C. denbsp;Mello,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ströele</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. F.</surname>
          </string-name>
          <article-title>denbsp;Souza, A study of approaches to answering complex questions over knowledge bases</article-title>
          ,
          <source>Knowl. Inf. Syst</source>
          .
          <volume>64</volume>
          (
          <year>2022</year>
          )
          <fpage>2849</fpage>
          -
          <lpage>2881</lpage>
          . URL: https://doi.org/10.1007/s10115-022-01737-x. doi:
          <volume>10</volume>
          .1007/s10115-022-01737-x.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <article-title>Learning entity linking features for emerging entities</article-title>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .48550/ARXIV.2208.03877.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Hua,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stratos</surname>
          </string-name>
          , Entqa: Entity linking as question answering,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2110.02369. doi:
          <volume>10</volume>
          .48550/ARXIV.2110.02369.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>