Keyword Search Over RDF Graphs Using WordNet
                Mohamad Rihany                             Zoubida Kedad                           Stéphane Lopes
                   DAVID Lab                               DAVID Lab                                 DAVID Lab
              Université de Versailles                Université de Versailles                 Université de Versailles
             Saint-Quentin-en-Yvelines               Saint-Quentin-en-Yvelines                Saint-Quentin-en-Yvelines
                Versailles, France                      Versailles, France                        Versailles, France
             mohamad.rihany@uvsq.fr                   zoubida.kedad@uvsq.fr                    stephane.lopes@uvsq.fr


   Abstract—An increasing amount of interlinked RDF datasets         resource. The user should also be familiar with the SPARQL
are published on the Web. These datasets can be queried using        query language.
languages such as Sparql, in which graph patterns are evaluated         An alternative way of querying RDF dataset is keyword
against the data. In such languages, some knowledge about
the dataset is required in order to formulate a query, such as       search, which consists in formulating a query as a set of key-
the resources, types or properties existing in the dataset. An       words and extracting the subgraphs corresponding to the input
alternative way of querying RDF data is keyword search, which        keywords. Keyword search over RDF datasets raises several
could be very useful when the content of the dataset is not known.   challenges. One of them is finding the relevant elements by
One of the problems we are faced with is the gap between the         matching the keyword query with the elements of the datasets,
keywords of the query and the terms used in the dataset. In this
paper, we introduce a keyword search approach for RDF data           taking into account differences of terminologies which may
which aims at solving this terminological gap. Our approach          exist between them. Another challenge is to aggregate the
makes use of external knowledge provided by online linguistic        relevant elements and to built the subgraphs representing
resources and proposes a ranking method if several results are       possible answers to the initial query.
returned for a given query. We have performed some experiments          Let’s consider the RDF data graph about movies given in
on the DBpedia and the AIFB datasets to illustrate the scalability
and efficiency of our approach.                                      figure 1.

                    I. INTRODUCTION
   There is an increasing amount of RDF [1] datasets published
on the Web, enabling knowledge extraction for numerous
applications. An RDF dataset could be viewed as a graph
where nodes are resources or litterals and where labeled
edges represent properties. In RDF, the building block is a
triple, which is of the form (subject, predicate, object). The
RDF Schema (RDFS) language is used to introduce useful
semantics to RDF triples. It provides a built-in vocabulary
for asserting user defined schemas within the RDF model.                               Fig. 1: RDF graph dataset
This vocabulary can be used to specify URIs as being of
a specific type (classes, properties and instances), to denote          Let Q={2008, performing, film-maker} be a keyword query.
special relationships between URIs. The flexibility of the RDF       If a user issues this keyword query on the data graph of figure
data model allows the representation of both schema and              1, then the answer will be empty. However, knowing that
instance information in the form of RDF triples.                     "performing" is similar to "starring" and that "film-maker" is
   RDF datasets can be queried using languages such as the           similar to "director", we can see that there is in the graph some
SPARQL language where queries are specified as graph pat-            data that is relevant to the query. This result could be retrieved
terns evaluated against the dataset. A SPARQL query consists         if we could bridge the terminological gap between the terms
of a set of triples where the subject, predicate and/or object       of the query and the ones used in the dataset.
can consist of variables. The idea is to match the triples in           In this paper, we introduce a keyword search approach
the SPARQL query with the existing RDF triples and find              which takes keywords as input and returns the best matching
solutions to the variables. In order to formulate these queries,     subgraphs as an answer to this query. The key contribution is
some knowledge about the dataset is therefore required. This         to use an external source of knowledge providing semantic
could be related to the subject, which could either be a specific    relations in order to find the elements in the dataset that
resource, or a class existing in the dataset, or to a predicate,     match the query keywords, and to rank the set of possible
which represents a property describing either a class or a           answers using a ranking method based on the semantic
                                                                     relations which have been used during the matching process.
  *PhD is funded by CNRS-L and ANR (project CAIR)


                                                                                                                                          75
   The rest of this paper is organized as follows. The approach     •   Aggregating Graph Elements
overview is provided in section 2. We detail our solution               Once the matching elements in the RDF graph are
for matching keywords with graph elements using external                identified for each keyword, the problem is to built the
knowledge in Section 3. Section 4 presents the process of               final result from these elements, and to aggregate them
building the end results from the matching elements.The                 into a connected subgraph representing an answer to the
ranking method is discussed in section 5. Section 6 presents            query. Each keyword can be associated to more than
our experiments and section 7 reviews the related works.                one element in the RDF graph; we consider that each
Finally, we conclude the paper and present some future works            combination of matching elements where there is one
in Section 8.                                                           element for each keyword is a possible answer to the
                                                                        query. The problem is to built the subgraph containing
                 II. APPROACH OVERVIEW
                                                                        the elements of the considered combination.
  In this section we provide an overview of our keyword
search approach presented in Figure 2.                              •   Result Ranking
                                                                        As each keyword may have more than one matching
                                                                        element in the dataset, there may be several possible
                                                                        results to the query. The problem is to rank the different
                                                                        results and to find a ranking method capable of identying
                                                                        the results that are closer to the initial query than others.


                                                                     III. M ATCHING K EYWORDS WITH GRAPH ELEMENTS

                                                                     In this section, we present our approach for matching the
                                                                  keywords with the data graph using external knowledge. Let
                 Fig. 2: Approach overview                        the keyword query Q={k1 , k2 , k3 , ...kn } be the input. For each
                                                                  keyword ki we check if there exists an exact match in the
   Our goal is to provide a keyword search approach on            elements of the data graph or not. If such element does not
RDF datasets, which is an alternative way to query these          exist, we search the considered external knowledge source in
datasets. In our approach, the user will enter the keywords       order to find some semantic relations between a keyword and
composing the query as an input and get a set of RDF graphs       a concept such that this latter concept has a matching element
corresponding to the keywords as an output; but the query         in the data graph. At the end, we obtain for each ki a set of
is not always expressed using the same terms as the ones          matching elements from the data graph.
used in the dataset. Our goal is to bridge the gap between           For example, let us consider the keyword query Q={2008,
the keywords and the dataset terminology. We propose the use      performing, film-maker}. We can observe from figure 1 that
of an external knowledge source providing semantic relations      the keywords "performing" and "film-maker" have no match-
between concepts. In our work, we have used the Wordnet           ing elements, but if some external knowledge source is avail-
online linguistic dictionary.                                     able and provides us with the information that "performing" is
   Figure 2 shows our framework for keyword search. It            similar to "starring" and "film-maker" is similar to "director",
comprises three components: matching, aggregating and             then some matching elements for the keywords of Q can be
ranking.                                                          found, and they are presented in figure 3.

  •   Matching keywords
      The matching component takes as input the keyword
      query and searches for each keyword the matching
      elements in the dataset. This is done by comparing
      the keyword to each graph element (resource, class or
      property) and returning the matching ones. In some
      cases, the user may enter a keyword for which an exact
      match can not be found in the dataset, but some graph
      element could be close to the keyword, it could for
      example be a synonym, or a close concept. The problem
      is to identify the equivalent concepts and the close
      concepts to some keyword in the dataset. To do so, we
      suggest the use of external knowledge sources such as
                                                                  Fig. 3: Matching Elements for Each Keyword in the Query Q
      online linguistic dictionnaries.


                                                                                                                                        76
A. Matching using external knowledge                                element in the dataset, we query the WordNet database to
                                                                    search for one of the previous semantic relations.
   In our work, we have used WordNet as an external
knowledge source. WordNet is a large lexical database of
English. Nouns, verbs, adjectives and adverbs are grouped into
sets of cognitive synonyms (synsets), each one expressing
a distinct concept. Synsets are interlinked by means
of conceptual-semantic and lexical relations. WordNet’s
structure makes it a useful tool for computational linguistics
and natural language processing. This lexical database can
help to find the best matching between the keyword queries
and the dataset elements using the provided semantic relations
(Synonyms, Antonyms...). We have decomposed the matching
keywords process into two phases. The first one is the search
in the dataset for exact matches for a given keyword. The
second phase is the search for close matching elements in the              Fig. 5: Searching for Close Matching Elements
dataset.
                                                                       These relations do not express equivalence, but express
                                                                    some sort of closeness between two concepts. For example,
                                                                    if a meronymy relation is found between ki and a concept c,
                                                                    and if c has an exact match in the dataset, i.e. a graph element
                                                                    labeled c, then this latter is a close concept to ki . Indeed, if c
                                                                    does not represent an equivalent concept, it still represents a
                                                                    close concept as c is part-of ki because the two are linked by
                                                                    a meronymy relation according to Wordnet.
                                                                       Close matching elements are searched for each keyword
                                                                    without exact matching element. If some keyword in the
                                                                    query has neither exact matching elements nor close matching
                                                                    elements, this means that there is no answer to the query.

                                                                       There are many semantic relations in WordNet, some of
       Fig. 4: Searching for Exact Matching Elements
                                                                    them are very useful to help us find the best matching elements
                                                                    between the query and the dataset. In our work, we have
   In figure 4 we have represented the first phase, which is the    considered the following relations to search for matching
search for exact matches int the dataset. Consider a keyword        elements.
query Q={k1 , k2 , k3 , ...kn }. For each keyword ki we perform        • Synonyms: a concept that means exactly the same as
two searching tasks in parallel; the first one is searching the          another.
dataset for properties, resources or classes that have the same        • Antonym: a concept opposite in meaning to another.
name as the considered keyword, i.e., nodes and edges in the           • Hyponym: a concept whose meaning denotes a subordi-
data graph having ki as label. The second task is to search the          nate.
knowledge base for synonyms and antonyms of the considered             • Hypernym: a concept whose meaning denotes a superor-
keyword. This consists in querying WordNet to extract the                dinate.
semantic relations (synonyms antonyms and hypernyms) with              • Substance meronym: a concept that is a substance of
ki and find the exact matching of those semantic relations in            another concept.
the data graph.                                                        • Part meronym: a concept that is part of another concept.
   If the search for exact matching elements fails, then a search      • Member meronym: a concept that is part of another
for close elements is performed. If some matching element is             concept.
found in the dataset for every keyword, then the search process        • Substance of holonym: a concept that has another concept
ends.                                                                    as a substance.
In figure 5 we have represented the search for close matching          • Part of holonym: a concept that has another concept as a
elements. This process will be executed for a given keyword              part.
only if no exact matching element has been found. It is                • Member of holonym: a concept that has another concept
similar to the search for exact matches and differs only on              as a member.
the considered semantic relations in Wordnet. The semantic             • Cause to: a verb that is the cause of a result.
relations we are interested in are hyponymy, holonymy and              • Troponym: a verb that is particular way to do another.
meronymy. For each keyword ki which has no exact matching


                                                                                                                                          77
B. Matching Algorithm
   In this section, we present the matching algorithm             Algorithm Matching The Keywords Of The Query With The
underlying our approach, which matches the keywords of            Graph
the query with the dataset. Let us start by presenting the         1: keywordQuery = {k1, k2, k3, ...ki}
notations used in the algorithm. Let K={k1 , k2 , k3 ...kn } be    2: procedure M ATCHING (keywordQuery)
the keyword query, ME(ki ) a function to extract the matching      3:    hashmap(keyword, relatedElements)
elements for ki in the dataset (these elements can be literals,    4:    for each keyword ki in keywordQuery do
instances, classes, properties, etc.).                             5:         ME(ki) . extract the matching elements for the
The semantic relations between the keyword query ki and               keyword query ki
WordNet are extracted by using some functions, for example         6:         S ← synonym(ki) . S is set of synonyms to the
Synonym(ki ) is used to extract the set of synonyms for the           keyword ki
keyword query ki .                                                 7:         for each sj in S do
For each keyword ki in the query, the matching elements            8:             M E(ki) ← M E(ki) + M E(sj)           . extract
ME(i ) are extracted (line4-5), and then WordNet is queried to        the matching elements for the synonym sj and add them
extract the set SR of Synonyms, Antonyms and Hyponyms.                as matching element to the keyword query ki
For each element of SR, the dataset is accessed to check           9:         end for
if there is a matching element (line6-16). For example, the       10:         a ← Antonym(ki)              . function Antonym
search for matching elements using the antonymy relation is           take ki as input and return keyword similar to it as output
done by issuing the following query to Wordnet.                       after executing the following query {ki Antonym to ?x,
                                                                      ?x antonym to ?y.} and ki different from ?y
                  SELECT ?y                                       11:         M E(ki) ← M E(ki) + M E(a)
                  WHERE                                           12:         Hyper ← Hypernym(ki)           . Hyper is set of
                  {ki Antonym_to ?x.                                  Hypernyms to the keyword ki
                  ?x Antonym_to ?y.}                              13:         for each hj in Hyper do
                  And ki dif f erent f rom ?y)                    14:             M E(ki) ← M E(ki) + M E(hj) . extract the
                                                                      matching elements for the Hypernym hj and add them as
   If no matching element has been found, then a search               matching element to the keyword query ki
for close matching elements is performed (line 17). In this       15:         end for
phase, WordNet is queried to find close matching elements         16:         hashmap.add(ki, M E(ki))
by searching for the hypernymy, holonymy and meronymy             17:         if ME(ki) is empty then
semantic relations for ki . For each concept c related to ki by   18:             Hypo ← Hyponym(ki)          . Hypo is set of
one of these relations, we search for elements in the dataset         Hyponyms to the keyword ki
labeled c; these elements are added to the set of matching        19:             for each hj in Hypo do
elements for the keyword ki (line 17-31).                         20:                 M E(ki) ← M E(ki) + M E(hj)
                                                                  21:             end for
           IV. AGGREGATING GRAPH ELEMENTS
                                                                  22:             M ero ← M eronym(ki)        . M ero is set of
   After obtaining the set of matching graph elements for each        Myronyms to the keyword ki
keyword, the subgraphs representing the possible answers to       23:             for each mj in M ero do
the query are built. Each answer is a connected minimal sub-      24:                 M E(ki) ← M E(ki) + M E(mj)               .
graph which contains for each keyword ki one corresponding            extract the matching elements for the meronym mj and
matching element.                                                     add them as matching element to the keyword query ki
Considering that there is a set of matching elements for          25:             end for
each keyword, we first derive the possible combinations by        26:             Holo ← Holonym(ki)           . Holo is set of
computing the cartesian product of the different sets of match-       Holonym to the keyword ki
ing elements. Then, for each combination, we determine the        27:             for each hj in M do
minimal connected subgraph containing the matching elements       28:                 M E(ki) ← M E(ki) + M E(hj) . extract
of the considered combination. Each subgraph is a possible            the matching elements for the Holonym hj and add them
answer to the initial query. Let us consider the example of the       as matching element to the keyword query ki
keyword query given in Figure 3, with the set of matching         29:             end for
elements corresponding to each keyword. In figure 6, we can       30:             hashmap.add(ki, M E(ki))
see all the possible combinations, obtained by performing the     31:         end if
cartesian product of the sets of matching elements shown in       32:     end for
figure 3.                                                         33:     Return hashmap
   From each combination, a connected subgraph will be            34: end procedure
extracted. This subgraph represents a possible result to the
query; it is built by introducing the minimum number of nodes


                                                                                                                                    78
                                                                  matching elements. We calculate the ranking score as follows:

                                                                                  Score = 1 − [wa ∗A+(1−w
                                                                                                     N
                                                                                                          a )∗L]


                                                                     Where A is the number of approximate matching elements,
                                                                  L is the number of linking elements, N is the total number
                                                                  of nodes and edges in the subgraph and wa is the weight for A.

                                                                     Intuitively, the above score expresses that the less linking
                                                                  elements in a subgraph, the better the solution. It also ex-
                                                                  presses that the more exact matching elements in a subgraph,
        Fig. 6: Combinations of Matching Elements                 the better the solution.


which do not correspond to matching elements. To do so, we
use the shortest path algorithms (Dijkstra’s Algorithm).
   A matching element can a be node or an edge. If we
compute the shortest path between two matching elements
representing both a node, then the shortest path between them
is the shortest path between the corresponding two nodes in                                 Solution A
the data graph. If one of the matching elements is an edge
between nodes n1 and n2 and the other matching element is
a node n, then the shortest path is determined between n and
one of the two nodes n1 and n2.
   If the two matching elements are both edges, then the short-
est path is determined between one of the nodes connected by
the first matching element to one of the nodes connected by
the second matching element.                                                               Solution B
   For each combination, our algorithm for extracting a sub-
                                                                              Fig. 7: Some Possible Solutions for Q
graph starts by randomly selecting one of the matching ele-
ments and then finds all the shortest paths from this matching
element to all the other matching elements in the combination.       Let us consider, in our running example, the possible
Let us consider two matching elements mi and mj .                 solutions for the query Q={2008, performing, film-maker}
   ni ∈ mi and nj ∈ mj such that @ni2 ∈ mi , @nj2 ∈               given in figure 7. For solution A, the number of exact matching
mj , ni 6= ni2 , nj 6= nj2 and |(ni2 − −nj2 )| < |(ni − −nj )|    elements is equal to one, the number of approximate matching
   where |(x − −y)| is the size of the shortest path between      elements is equal to two, and the number of linking elements
nodes x and y. To construct the result, we combine the shortest   is equal to one; the total number of nodes and edges N is
paths between all pairs of matching elements. When all the        equal to seven. The final score is therefore equal to 0.814. For
combinations are processed, we obtain a set of subgraphs, each    solution B, the number of exact matching elements is equal to
one representing a possible result to the query.                  one, the number of approximate matching elements is equal
                                                                  to two, the number of linking elements is equal to two; the
                   V. R ANKING R ESULTS                           number N of nodes and edges is equal to nine. The final score
                                                                  is therefore equal to 0.778. Solution A is better than solution
   The elicitation of all the combinations of graph elements      B. In A, the result is the movie "Righteous kill" released in
and the aggregation of graph elements for each combination        "2008" directed by "John Avnet" and "Al Pacino" is one of
lead to several subgraphs, each one being a possible answer       the stars of the movie, while solution B describes the movie
for the query. One problem is to rank these answers, and          "Righteous kill" released in "2008" with "Al Pacino" being
to determine if there are better results than others. In our      one of the stars, also starring in the movie "Heat" directed by
approach, we ranked the results according to the matching         "Michael Mann". Therefore solution A focuses on the release
process. Let us first define the notations used in the ranking    date, stars and director of one movie which is "Righteous kill",
method. Exact matching elements are the elements found            while solution B presents the release date of "Righteous kill"
during the first phase of the matching, i.e. the search for       and the director of "Heat".
exact matching elements; approximate matching elements
are the elements found during the search for close matching
elements; we denote by linking elements the nodes that are
in the subgraph result but are neither exact nor approximate


                                                                                                                                     79
              VI. E XPERIMENTAL EVALUATION

   Our approach is implemented in Java, we have used the
Jena API for the manipulation of RDF data. For indexing and
searching the keyword query, we have used the Lucene API.
The Jung API is used for graph manipulation and visualization.
   In the rest of this section, we describe our experiments
to validate the performances of our approach. Our goal is
to observe the performance of using WordNet as an external
knowledge to fill the gap between the keywords and the dataset
terminologies, as well as the ranking model with various
keyword queries. All the experiments have been done on Intel        Fig. 8: Average Execution Time According to the Size of the
Core i7 with 32GB RAM.                                              Query

                                                                       As we can observe from figure 8, the execution time
A. Datasets                                                         increases when the number of the keywords increases for
                                                                    both datasets. We can also see that the execution time for
    We have used two datasets: AIFB and DBpedia. AIFB is
                                                                    AIFB is greater than the execution time of DBpedia because
a dataset containing data taken from the AIFB institute, at
                                                                    the size of data in AIFB is greater than the size in DBpedia.
Karlsruhe University. It is about entities of research communi-
ties such as persons, organizations, publications (bibliographic
metadata) and their relationships. The dataset contains 8281
                                                                       The type and number of keyword elements also affect the
entities and 29 233 triples. DBpedia is a project aiming to
                                                                    execution time; for example Q1 {carol_lombard / 5940 / the-
extract structured content from the information created in the
                                                                    oritical / edwin} in table I takes 4.91 sec (fig. 9) and contains
Wikipedia project. The extracted data is related to movies their
                                                                    4 keywords while Q6 {poor / 1990 / mind / Ellen_Burstyn}
title, stares, director, released data and other properties. This
                                                                    needs 31.57 sec (fig. 9) to be executed: the two queries contain
dataset contains 30 793 triples.
                                                                    the same number of keywords (4), but the difference in the
    The size of the keyword queries was between 2 and 8             execution time is due to the keyword elements. In Q1 the four
keywords. The total number of queries was 20 queries (10            keywords appear 14, 18, 0 and 10 times respectively as we
for each dataset) in order to cover all the different WordNet       can see from table I but the keywords in Q6 appears 20, 258,
semantic relations during the matching stage.                       0 and 2 times respectively.


B. Methodology

   We have tested and compared our keyword search approach
both with and without the use of external knowledge. We will
refer to the approach with external knowledge by semantic
approach since we use semantic relations, and we will refer
to the approach without external knowledge as the basic
approach.


C. Results

   The query size was between 3 and 8 keywords. Table 1
shows some examples of keyword queries, the number of
nodes and edges containing this keyword in the data graph, and         Fig. 9: Execution Time for Each Query over DBpedia
the semantic relations between the keywords that do not appear
in the data graph and WordNet. For example, Query 1 (carole            The differences in terminology between the keyword query
lombard-5940-theoretical-edwin) consists of 4 keywords, these       and the dataset also affect the execution time (the keyword
keywords appear in the data graph (nodes and edges) 14, 18,         query does not match with any element in the dataset).
0 and 10 times respectively, this means that theoretical is not     Consider the queries Q3 and Q4 in table I; Q3 consists of
in the dataset but we can replace it with academic since there      4 keywords, and these keywords appear 2, 1, 4 and 0 times
is a hypernymy relation between academic and theoretical in         respectively in the dataset while Q4 has 3 keywords and these
WordNet.                                                            keywords appear 0, 1 and 0 times respectively in the dataset.


                                                                                                                                        80
                      Query Number      keyword Query       number of appear Nodes/edges   WordNet Semantic Relations
                                        carole_lombard      14
                                        5940                18
                            Q1                                                             hypernym
                                        theoretical         0
                                        edwin               10
                                        swedishfilms        2
                                        lagercrantz         4
                            Q2                                                             hyponym
                                        1997                97
                                        plosion             0
                                        psychiatric         2
                                        hampshire           1
                            Q3                                                             hyponym
                                        ziskin              4
                                        system              0
                                        bring               0                              antonym
                            Q4          cut                 1
                                        mind                0                              hypernym
                                        solar_system        0                              holonym
                            Q5          secondary           0                              synonym
                                        vocabulary          0                              meronym

                                                      TABLE I: Keyword Queries

                                  query                     Q1    Q2    Q3     Q4   Q5     Q6     Q7    Q8    Q9    Q10
                   number of results(semantic approach)     2     173   7      8    50     250    110   132   78    34
                   number of results(basic approach)        1     150   1      0    0      54     25    0     2     2

                               TABLE II: Number of Results for each Keyword Query(DBpedia)

                                    query                    Q1    Q2    Q3    Q4    Q5    Q6     Q7    Q8    Q9    Q10
                     number of results(semantic approach)    9     10    21    25    32    38     20    18    25    5
                     number of results(basic approach)       3     4     10    7     12    22     14    7     13    0

                                 TABLE III: Number of Results for each Keyword Query(AIFB)


But the execution time for Q4 (7.76 sec) is greater than the             check the top-k results for each query and give the number of
execution time of Q3 (5.08 sec); this is because Q4 requires             relevant results to calculate the Top-K precision according to
to access WordNet two times to search for semantic relations             this equation:
involving the keywords, while Q3 requires only one access.                                        N umberOf RelevantResults
   As we can observe from tables II and III, the number of re-               T op − kP recision =                                  (1)
                                                                                                                K
sults increases when WordNet is used during keyword search,
because using WordNet increases the number of matching                     All the results were above 0.92 as shown in table 4; this
elements. This means that the number of combinations and                 means that the results were accurate according to the users.
therefore the number of results both increase.                                             Data         AIFB            DBpedia
                                                                                            K           5    10         5     10
                                                                                      Top-K precision   0.97 0.95       0.98  0.97

                                                                                           TABLE IV: Top-K precision

                                                                                                 VII. R ELATED WORKS
                                                                            Keyword search and the translation of a keyword query
                                                                         into a formal query have been the topic of several research
                                                                         works. The early research works were on keyword query over
                                                                         relational databases [2], [3], then XML data [4] and RDF data
                                                                         [6], [5], [8], [7]. One of the important problems addressed
                                                                         by these works is how to fill the gap between the keywords
                                                                         in the query and the terms used in the dataset. The SPARK
                                                                         approach [5] consists in finding the corresponding ontology
                                                                         for each term in the keyword query and try to map and find
                                                                         a relation between the ontology and the keyword query; other
    Fig. 10: Execution Time for Each Query over AIFB                     approaches use external knowledge or resources such as in
                                                                         the Q2semantic approach [8], where Wikipedia is used to
  To check the effectiveness of the evaluation we have used              extract related keywords; for each keyword in the dataset, a
10 queries from the tables II and III and asked three users to           document is created containing features that are matched to


                                                                                                                                          81
the keyword Query. The approach described in [9] also uses               [6] He, Hao, Haixun Wang, Jun Yang, and Philip S. Yu. "BLINKS: ranked
some external knowledge; it uses the supporting entity pairs                keyword searches on graphs." In Proceedings of the 2007 ACM SIGMOD
                                                                            international conference on Management of data, pp. 305-316. ACM, 2007.
in order to paraphrase dictionary records semantic equivalence           [7] W. Le, F. Li, A. Kementsietsidis, and S. Duan. Scalable keyword search
between relation phrase and dataset; but the supporting entity              on large rdf data. TKDE, 26:2774–2788, 2014
pairs are specific to Wikipedia and the New York Times [10].             [8] Wang, Haofen, Kang Zhang, Qiaoling Liu, Thanh Tran, and Yong
                                                                            Yu. "Q2semantic: A lightweight keyword interface to semantic search."
   In order to aggragate the matching elements in a dataset,                In European Semantic Web Conference, pp. 584-598. Springer, Berlin,
SPARK[5] uses an ontology to discover the relations between                 Heidelberg, 2008.
the keywords and the dataset and uses a minimal spanning tree            [9] Zou, Lei, Ruizhe Huang, Haixun Wang, Jeffrey Xu Yu, Wenqiang He,
                                                                            and Dongyan Zhao. "Natural language question answering over RDF: a
algorithm to create a possible query graph. The approaches                  graph data driven approach." In Proceedings of the 2014 ACM SIGMOD
described in [8], [6], [11], [14], [15] transform the data graph            international conference on Management of data, pp. 313-324. ACM, 2014.
into a summarized graph; some of them start from the leaf                [10] Nakashole, Ndapandula, Gerhard Weikum, and Fabian Suchanek.
                                                                            "PATTY: a taxonomy of relational patterns with semantic types." In
nodes containing the keyword query and do a traversal until                 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural
all the paths converge to the same node, and the other works                Language Processing and Computational Natural Language Learning, pp.
use the summarized graph and try to extract a SPARQL query                  1135-1145. Association for Computational Linguistics, 2012.
                                                                         [11] Li, Feifei, Wangchao Le, Songyun Duan, and Anastasios Kementsiet-
by finding relationships between the nodes.[12] classifies the              sidis. "Scalable keyword search on large RDF data." IEEE Transactions
keywords into two sets: the first one contains the vertices and             on Knowledge and Data Engineering 1 (2014): 1.
the second one contains the edges; then the final possible               [12] Han, Shuo, Lei Zou, Jeffery Xu Yu, and Dongyan Zhao. "Keyword
                                                                            Search on RDF Graphs-A Query Graph Assembly Approach." In Proceed-
solutions are computed. In all these works, the keyword in the              ings of the 2017 ACM on Conference on Information and Knowledge
query are matched with the nodes of the considered graph,                   Management, pp. 227-236. ACM, 2017.
unlike our approach which considers semantic relations and               [13] Ouksili, Hanane, Zoubida Kedad, Stéphane Lopes, and Sylvaine Nugier.
                                                                            "Using Patterns for Keyword Search in RDF Graphs." In EDBT/ICDT
searches for matching elements in both the nodes and the edges              Workshops. 2017.
in order to build the final result.                                      [14] Ayvaz, Serkan, and Mehmet Aydar. "Using RDF Summary Graph
                                                                            For Keyword-based Semantic Searches." arXiv preprint arXiv:1707.03602
                                                                            (2017).
          VIII. CONCLUSION AND FUTURE WORKS                              [15] Lin, Xiao-Qing, Zong-Min Ma, and Li Yan. "RDF Keyword Search Us-
                                                                            ing a Type-based Summary." JOURNAL OF INFORMATION SCIENCE
   In this paper, we have proposed an approach for keyword                  AND ENGINEERING 34, no. 2 (2018): 489-504.
search in an RDF dataset, which represents an alternative to the
use of query languages such as Sparql. We have focused on the
problem of handling differences in the terminologies between
the RDF dataset and the keyword query. We have proposed a
novel solution which relies on an external knowledge source.
In our approach, the answer to the keyword query is a set of
subgraphs containing for each keyword one matching graph
element. We have described how we enrich the set of solutions
by using WordNet. We have also provided a ranking model
to rank the resulting subgraphs. This model is based on the
number of semantic relations extracted from WordNet and
the number of nodes and edges for each subgraph. We have
conducted some experiments which have shown that external
knowledge gives more results for some queries and sometimes
returns an answer where other approaches fail to.
   In future works, we will study the possible improvements
to the aggregation of matching elements and try to find other
ways of combining them. We will also study scalability issues
and enable efficient keyword search for massive datasets.

                           R EFERENCES
[1] https://www.w3.org/RDF/.
[2] V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in
   relational databases. In VLDB, pages 670–681, 2002.
[3] L. Qin, J. X. Yu, L. Chang, and Y. Tao. Querying communities in
   relational databases. In ICDE, pages 724–735, 2009.
[4] Guo, Lin, Feng Shao, Chavdar Botev, and Jayavel Shanmugasundaram.
   "XRANK: Ranked keyword search over XML documents." In Proceedings
   of the 2003 ACM SIGMOD international conference on Management of
   data, pp. 16-27. ACM, 2003.
[5] Zhou, Qi, Chong Wang, Miao Xiong, Haofen Wang, and Yong Yu.
   "SPARK: adapting keyword query to semantic search." In The Semantic
   Web, pp. 694-707. Springer, Berlin, Heidelberg, 2007.


                                                                                                                                                       82