Keyword Search Over RDF Graphs Using WordNet Mohamad Rihany Zoubida Kedad Stéphane Lopes DAVID Lab DAVID Lab DAVID Lab Université de Versailles Université de Versailles Université de Versailles Saint-Quentin-en-Yvelines Saint-Quentin-en-Yvelines Saint-Quentin-en-Yvelines Versailles, France Versailles, France Versailles, France mohamad.rihany@uvsq.fr zoubida.kedad@uvsq.fr stephane.lopes@uvsq.fr Abstract—An increasing amount of interlinked RDF datasets resource. The user should also be familiar with the SPARQL are published on the Web. These datasets can be queried using query language. languages such as Sparql, in which graph patterns are evaluated An alternative way of querying RDF dataset is keyword against the data. In such languages, some knowledge about the dataset is required in order to formulate a query, such as search, which consists in formulating a query as a set of key- the resources, types or properties existing in the dataset. An words and extracting the subgraphs corresponding to the input alternative way of querying RDF data is keyword search, which keywords. Keyword search over RDF datasets raises several could be very useful when the content of the dataset is not known. challenges. One of them is finding the relevant elements by One of the problems we are faced with is the gap between the matching the keyword query with the elements of the datasets, keywords of the query and the terms used in the dataset. In this paper, we introduce a keyword search approach for RDF data taking into account differences of terminologies which may which aims at solving this terminological gap. Our approach exist between them. Another challenge is to aggregate the makes use of external knowledge provided by online linguistic relevant elements and to built the subgraphs representing resources and proposes a ranking method if several results are possible answers to the initial query. returned for a given query. We have performed some experiments Let’s consider the RDF data graph about movies given in on the DBpedia and the AIFB datasets to illustrate the scalability and efficiency of our approach. figure 1. I. INTRODUCTION There is an increasing amount of RDF [1] datasets published on the Web, enabling knowledge extraction for numerous applications. An RDF dataset could be viewed as a graph where nodes are resources or litterals and where labeled edges represent properties. In RDF, the building block is a triple, which is of the form (subject, predicate, object). The RDF Schema (RDFS) language is used to introduce useful semantics to RDF triples. It provides a built-in vocabulary for asserting user defined schemas within the RDF model. Fig. 1: RDF graph dataset This vocabulary can be used to specify URIs as being of a specific type (classes, properties and instances), to denote Let Q={2008, performing, film-maker} be a keyword query. special relationships between URIs. The flexibility of the RDF If a user issues this keyword query on the data graph of figure data model allows the representation of both schema and 1, then the answer will be empty. However, knowing that instance information in the form of RDF triples. "performing" is similar to "starring" and that "film-maker" is RDF datasets can be queried using languages such as the similar to "director", we can see that there is in the graph some SPARQL language where queries are specified as graph pat- data that is relevant to the query. This result could be retrieved terns evaluated against the dataset. A SPARQL query consists if we could bridge the terminological gap between the terms of a set of triples where the subject, predicate and/or object of the query and the ones used in the dataset. can consist of variables. The idea is to match the triples in In this paper, we introduce a keyword search approach the SPARQL query with the existing RDF triples and find which takes keywords as input and returns the best matching solutions to the variables. In order to formulate these queries, subgraphs as an answer to this query. The key contribution is some knowledge about the dataset is therefore required. This to use an external source of knowledge providing semantic could be related to the subject, which could either be a specific relations in order to find the elements in the dataset that resource, or a class existing in the dataset, or to a predicate, match the query keywords, and to rank the set of possible which represents a property describing either a class or a answers using a ranking method based on the semantic relations which have been used during the matching process. *PhD is funded by CNRS-L and ANR (project CAIR) 75 The rest of this paper is organized as follows. The approach • Aggregating Graph Elements overview is provided in section 2. We detail our solution Once the matching elements in the RDF graph are for matching keywords with graph elements using external identified for each keyword, the problem is to built the knowledge in Section 3. Section 4 presents the process of final result from these elements, and to aggregate them building the end results from the matching elements.The into a connected subgraph representing an answer to the ranking method is discussed in section 5. Section 6 presents query. Each keyword can be associated to more than our experiments and section 7 reviews the related works. one element in the RDF graph; we consider that each Finally, we conclude the paper and present some future works combination of matching elements where there is one in Section 8. element for each keyword is a possible answer to the query. The problem is to built the subgraph containing II. APPROACH OVERVIEW the elements of the considered combination. In this section we provide an overview of our keyword search approach presented in Figure 2. • Result Ranking As each keyword may have more than one matching element in the dataset, there may be several possible results to the query. The problem is to rank the different results and to find a ranking method capable of identying the results that are closer to the initial query than others. III. M ATCHING K EYWORDS WITH GRAPH ELEMENTS In this section, we present our approach for matching the keywords with the data graph using external knowledge. Let Fig. 2: Approach overview the keyword query Q={k1 , k2 , k3 , ...kn } be the input. For each keyword ki we check if there exists an exact match in the Our goal is to provide a keyword search approach on elements of the data graph or not. If such element does not RDF datasets, which is an alternative way to query these exist, we search the considered external knowledge source in datasets. In our approach, the user will enter the keywords order to find some semantic relations between a keyword and composing the query as an input and get a set of RDF graphs a concept such that this latter concept has a matching element corresponding to the keywords as an output; but the query in the data graph. At the end, we obtain for each ki a set of is not always expressed using the same terms as the ones matching elements from the data graph. used in the dataset. Our goal is to bridge the gap between For example, let us consider the keyword query Q={2008, the keywords and the dataset terminology. We propose the use performing, film-maker}. We can observe from figure 1 that of an external knowledge source providing semantic relations the keywords "performing" and "film-maker" have no match- between concepts. In our work, we have used the Wordnet ing elements, but if some external knowledge source is avail- online linguistic dictionary. able and provides us with the information that "performing" is Figure 2 shows our framework for keyword search. It similar to "starring" and "film-maker" is similar to "director", comprises three components: matching, aggregating and then some matching elements for the keywords of Q can be ranking. found, and they are presented in figure 3. • Matching keywords The matching component takes as input the keyword query and searches for each keyword the matching elements in the dataset. This is done by comparing the keyword to each graph element (resource, class or property) and returning the matching ones. In some cases, the user may enter a keyword for which an exact match can not be found in the dataset, but some graph element could be close to the keyword, it could for example be a synonym, or a close concept. The problem is to identify the equivalent concepts and the close concepts to some keyword in the dataset. To do so, we suggest the use of external knowledge sources such as Fig. 3: Matching Elements for Each Keyword in the Query Q online linguistic dictionnaries. 76 A. Matching using external knowledge element in the dataset, we query the WordNet database to search for one of the previous semantic relations. In our work, we have used WordNet as an external knowledge source. WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each one expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing. This lexical database can help to find the best matching between the keyword queries and the dataset elements using the provided semantic relations (Synonyms, Antonyms...). We have decomposed the matching keywords process into two phases. The first one is the search in the dataset for exact matches for a given keyword. The second phase is the search for close matching elements in the Fig. 5: Searching for Close Matching Elements dataset. These relations do not express equivalence, but express some sort of closeness between two concepts. For example, if a meronymy relation is found between ki and a concept c, and if c has an exact match in the dataset, i.e. a graph element labeled c, then this latter is a close concept to ki . Indeed, if c does not represent an equivalent concept, it still represents a close concept as c is part-of ki because the two are linked by a meronymy relation according to Wordnet. Close matching elements are searched for each keyword without exact matching element. If some keyword in the query has neither exact matching elements nor close matching elements, this means that there is no answer to the query. There are many semantic relations in WordNet, some of Fig. 4: Searching for Exact Matching Elements them are very useful to help us find the best matching elements between the query and the dataset. In our work, we have In figure 4 we have represented the first phase, which is the considered the following relations to search for matching search for exact matches int the dataset. Consider a keyword elements. query Q={k1 , k2 , k3 , ...kn }. For each keyword ki we perform • Synonyms: a concept that means exactly the same as two searching tasks in parallel; the first one is searching the another. dataset for properties, resources or classes that have the same • Antonym: a concept opposite in meaning to another. name as the considered keyword, i.e., nodes and edges in the • Hyponym: a concept whose meaning denotes a subordi- data graph having ki as label. The second task is to search the nate. knowledge base for synonyms and antonyms of the considered • Hypernym: a concept whose meaning denotes a superor- keyword. This consists in querying WordNet to extract the dinate. semantic relations (synonyms antonyms and hypernyms) with • Substance meronym: a concept that is a substance of ki and find the exact matching of those semantic relations in another concept. the data graph. • Part meronym: a concept that is part of another concept. If the search for exact matching elements fails, then a search • Member meronym: a concept that is part of another for close elements is performed. If some matching element is concept. found in the dataset for every keyword, then the search process • Substance of holonym: a concept that has another concept ends. as a substance. In figure 5 we have represented the search for close matching • Part of holonym: a concept that has another concept as a elements. This process will be executed for a given keyword part. only if no exact matching element has been found. It is • Member of holonym: a concept that has another concept similar to the search for exact matches and differs only on as a member. the considered semantic relations in Wordnet. The semantic • Cause to: a verb that is the cause of a result. relations we are interested in are hyponymy, holonymy and • Troponym: a verb that is particular way to do another. meronymy. For each keyword ki which has no exact matching 77 B. Matching Algorithm In this section, we present the matching algorithm Algorithm Matching The Keywords Of The Query With The underlying our approach, which matches the keywords of Graph the query with the dataset. Let us start by presenting the 1: keywordQuery = {k1, k2, k3, ...ki} notations used in the algorithm. Let K={k1 , k2 , k3 ...kn } be 2: procedure M ATCHING (keywordQuery) the keyword query, ME(ki ) a function to extract the matching 3: hashmap(keyword, relatedElements) elements for ki in the dataset (these elements can be literals, 4: for each keyword ki in keywordQuery do instances, classes, properties, etc.). 5: ME(ki) . extract the matching elements for the The semantic relations between the keyword query ki and keyword query ki WordNet are extracted by using some functions, for example 6: S ← synonym(ki) . S is set of synonyms to the Synonym(ki ) is used to extract the set of synonyms for the keyword ki keyword query ki . 7: for each sj in S do For each keyword ki in the query, the matching elements 8: M E(ki) ← M E(ki) + M E(sj) . extract ME(i ) are extracted (line4-5), and then WordNet is queried to the matching elements for the synonym sj and add them extract the set SR of Synonyms, Antonyms and Hyponyms. as matching element to the keyword query ki For each element of SR, the dataset is accessed to check 9: end for if there is a matching element (line6-16). For example, the 10: a ← Antonym(ki) . function Antonym search for matching elements using the antonymy relation is take ki as input and return keyword similar to it as output done by issuing the following query to Wordnet. after executing the following query {ki Antonym to ?x, ?x antonym to ?y.} and ki different from ?y SELECT ?y 11: M E(ki) ← M E(ki) + M E(a) WHERE 12: Hyper ← Hypernym(ki) . Hyper is set of {ki Antonym_to ?x. Hypernyms to the keyword ki ?x Antonym_to ?y.} 13: for each hj in Hyper do And ki dif f erent f rom ?y) 14: M E(ki) ← M E(ki) + M E(hj) . extract the matching elements for the Hypernym hj and add them as If no matching element has been found, then a search matching element to the keyword query ki for close matching elements is performed (line 17). In this 15: end for phase, WordNet is queried to find close matching elements 16: hashmap.add(ki, M E(ki)) by searching for the hypernymy, holonymy and meronymy 17: if ME(ki) is empty then semantic relations for ki . For each concept c related to ki by 18: Hypo ← Hyponym(ki) . Hypo is set of one of these relations, we search for elements in the dataset Hyponyms to the keyword ki labeled c; these elements are added to the set of matching 19: for each hj in Hypo do elements for the keyword ki (line 17-31). 20: M E(ki) ← M E(ki) + M E(hj) 21: end for IV. AGGREGATING GRAPH ELEMENTS 22: M ero ← M eronym(ki) . M ero is set of After obtaining the set of matching graph elements for each Myronyms to the keyword ki keyword, the subgraphs representing the possible answers to 23: for each mj in M ero do the query are built. Each answer is a connected minimal sub- 24: M E(ki) ← M E(ki) + M E(mj) . graph which contains for each keyword ki one corresponding extract the matching elements for the meronym mj and matching element. add them as matching element to the keyword query ki Considering that there is a set of matching elements for 25: end for each keyword, we first derive the possible combinations by 26: Holo ← Holonym(ki) . Holo is set of computing the cartesian product of the different sets of match- Holonym to the keyword ki ing elements. Then, for each combination, we determine the 27: for each hj in M do minimal connected subgraph containing the matching elements 28: M E(ki) ← M E(ki) + M E(hj) . extract of the considered combination. Each subgraph is a possible the matching elements for the Holonym hj and add them answer to the initial query. Let us consider the example of the as matching element to the keyword query ki keyword query given in Figure 3, with the set of matching 29: end for elements corresponding to each keyword. In figure 6, we can 30: hashmap.add(ki, M E(ki)) see all the possible combinations, obtained by performing the 31: end if cartesian product of the sets of matching elements shown in 32: end for figure 3. 33: Return hashmap From each combination, a connected subgraph will be 34: end procedure extracted. This subgraph represents a possible result to the query; it is built by introducing the minimum number of nodes 78 matching elements. We calculate the ranking score as follows: Score = 1 − [wa ∗A+(1−w N a )∗L] Where A is the number of approximate matching elements, L is the number of linking elements, N is the total number of nodes and edges in the subgraph and wa is the weight for A. Intuitively, the above score expresses that the less linking elements in a subgraph, the better the solution. It also ex- presses that the more exact matching elements in a subgraph, Fig. 6: Combinations of Matching Elements the better the solution. which do not correspond to matching elements. To do so, we use the shortest path algorithms (Dijkstra’s Algorithm). A matching element can a be node or an edge. If we compute the shortest path between two matching elements representing both a node, then the shortest path between them is the shortest path between the corresponding two nodes in Solution A the data graph. If one of the matching elements is an edge between nodes n1 and n2 and the other matching element is a node n, then the shortest path is determined between n and one of the two nodes n1 and n2. If the two matching elements are both edges, then the short- est path is determined between one of the nodes connected by the first matching element to one of the nodes connected by the second matching element. Solution B For each combination, our algorithm for extracting a sub- Fig. 7: Some Possible Solutions for Q graph starts by randomly selecting one of the matching ele- ments and then finds all the shortest paths from this matching element to all the other matching elements in the combination. Let us consider, in our running example, the possible Let us consider two matching elements mi and mj . solutions for the query Q={2008, performing, film-maker} ni ∈ mi and nj ∈ mj such that @ni2 ∈ mi , @nj2 ∈ given in figure 7. For solution A, the number of exact matching mj , ni 6= ni2 , nj 6= nj2 and |(ni2 − −nj2 )| < |(ni − −nj )| elements is equal to one, the number of approximate matching where |(x − −y)| is the size of the shortest path between elements is equal to two, and the number of linking elements nodes x and y. To construct the result, we combine the shortest is equal to one; the total number of nodes and edges N is paths between all pairs of matching elements. When all the equal to seven. The final score is therefore equal to 0.814. For combinations are processed, we obtain a set of subgraphs, each solution B, the number of exact matching elements is equal to one representing a possible result to the query. one, the number of approximate matching elements is equal to two, the number of linking elements is equal to two; the V. R ANKING R ESULTS number N of nodes and edges is equal to nine. The final score is therefore equal to 0.778. Solution A is better than solution The elicitation of all the combinations of graph elements B. In A, the result is the movie "Righteous kill" released in and the aggregation of graph elements for each combination "2008" directed by "John Avnet" and "Al Pacino" is one of lead to several subgraphs, each one being a possible answer the stars of the movie, while solution B describes the movie for the query. One problem is to rank these answers, and "Righteous kill" released in "2008" with "Al Pacino" being to determine if there are better results than others. In our one of the stars, also starring in the movie "Heat" directed by approach, we ranked the results according to the matching "Michael Mann". Therefore solution A focuses on the release process. Let us first define the notations used in the ranking date, stars and director of one movie which is "Righteous kill", method. Exact matching elements are the elements found while solution B presents the release date of "Righteous kill" during the first phase of the matching, i.e. the search for and the director of "Heat". exact matching elements; approximate matching elements are the elements found during the search for close matching elements; we denote by linking elements the nodes that are in the subgraph result but are neither exact nor approximate 79 VI. E XPERIMENTAL EVALUATION Our approach is implemented in Java, we have used the Jena API for the manipulation of RDF data. For indexing and searching the keyword query, we have used the Lucene API. The Jung API is used for graph manipulation and visualization. In the rest of this section, we describe our experiments to validate the performances of our approach. Our goal is to observe the performance of using WordNet as an external knowledge to fill the gap between the keywords and the dataset terminologies, as well as the ranking model with various keyword queries. All the experiments have been done on Intel Fig. 8: Average Execution Time According to the Size of the Core i7 with 32GB RAM. Query As we can observe from figure 8, the execution time A. Datasets increases when the number of the keywords increases for both datasets. We can also see that the execution time for We have used two datasets: AIFB and DBpedia. AIFB is AIFB is greater than the execution time of DBpedia because a dataset containing data taken from the AIFB institute, at the size of data in AIFB is greater than the size in DBpedia. Karlsruhe University. It is about entities of research communi- ties such as persons, organizations, publications (bibliographic metadata) and their relationships. The dataset contains 8281 The type and number of keyword elements also affect the entities and 29 233 triples. DBpedia is a project aiming to execution time; for example Q1 {carol_lombard / 5940 / the- extract structured content from the information created in the oritical / edwin} in table I takes 4.91 sec (fig. 9) and contains Wikipedia project. The extracted data is related to movies their 4 keywords while Q6 {poor / 1990 / mind / Ellen_Burstyn} title, stares, director, released data and other properties. This needs 31.57 sec (fig. 9) to be executed: the two queries contain dataset contains 30 793 triples. the same number of keywords (4), but the difference in the The size of the keyword queries was between 2 and 8 execution time is due to the keyword elements. In Q1 the four keywords. The total number of queries was 20 queries (10 keywords appear 14, 18, 0 and 10 times respectively as we for each dataset) in order to cover all the different WordNet can see from table I but the keywords in Q6 appears 20, 258, semantic relations during the matching stage. 0 and 2 times respectively. B. Methodology We have tested and compared our keyword search approach both with and without the use of external knowledge. We will refer to the approach with external knowledge by semantic approach since we use semantic relations, and we will refer to the approach without external knowledge as the basic approach. C. Results The query size was between 3 and 8 keywords. Table 1 shows some examples of keyword queries, the number of nodes and edges containing this keyword in the data graph, and Fig. 9: Execution Time for Each Query over DBpedia the semantic relations between the keywords that do not appear in the data graph and WordNet. For example, Query 1 (carole The differences in terminology between the keyword query lombard-5940-theoretical-edwin) consists of 4 keywords, these and the dataset also affect the execution time (the keyword keywords appear in the data graph (nodes and edges) 14, 18, query does not match with any element in the dataset). 0 and 10 times respectively, this means that theoretical is not Consider the queries Q3 and Q4 in table I; Q3 consists of in the dataset but we can replace it with academic since there 4 keywords, and these keywords appear 2, 1, 4 and 0 times is a hypernymy relation between academic and theoretical in respectively in the dataset while Q4 has 3 keywords and these WordNet. keywords appear 0, 1 and 0 times respectively in the dataset. 80 Query Number keyword Query number of appear Nodes/edges WordNet Semantic Relations carole_lombard 14 5940 18 Q1 hypernym theoretical 0 edwin 10 swedishfilms 2 lagercrantz 4 Q2 hyponym 1997 97 plosion 0 psychiatric 2 hampshire 1 Q3 hyponym ziskin 4 system 0 bring 0 antonym Q4 cut 1 mind 0 hypernym solar_system 0 holonym Q5 secondary 0 synonym vocabulary 0 meronym TABLE I: Keyword Queries query Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 number of results(semantic approach) 2 173 7 8 50 250 110 132 78 34 number of results(basic approach) 1 150 1 0 0 54 25 0 2 2 TABLE II: Number of Results for each Keyword Query(DBpedia) query Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 number of results(semantic approach) 9 10 21 25 32 38 20 18 25 5 number of results(basic approach) 3 4 10 7 12 22 14 7 13 0 TABLE III: Number of Results for each Keyword Query(AIFB) But the execution time for Q4 (7.76 sec) is greater than the check the top-k results for each query and give the number of execution time of Q3 (5.08 sec); this is because Q4 requires relevant results to calculate the Top-K precision according to to access WordNet two times to search for semantic relations this equation: involving the keywords, while Q3 requires only one access. N umberOf RelevantResults As we can observe from tables II and III, the number of re- T op − kP recision = (1) K sults increases when WordNet is used during keyword search, because using WordNet increases the number of matching All the results were above 0.92 as shown in table 4; this elements. This means that the number of combinations and means that the results were accurate according to the users. therefore the number of results both increase. Data AIFB DBpedia K 5 10 5 10 Top-K precision 0.97 0.95 0.98 0.97 TABLE IV: Top-K precision VII. R ELATED WORKS Keyword search and the translation of a keyword query into a formal query have been the topic of several research works. The early research works were on keyword query over relational databases [2], [3], then XML data [4] and RDF data [6], [5], [8], [7]. One of the important problems addressed by these works is how to fill the gap between the keywords in the query and the terms used in the dataset. The SPARK approach [5] consists in finding the corresponding ontology for each term in the keyword query and try to map and find a relation between the ontology and the keyword query; other Fig. 10: Execution Time for Each Query over AIFB approaches use external knowledge or resources such as in the Q2semantic approach [8], where Wikipedia is used to To check the effectiveness of the evaluation we have used extract related keywords; for each keyword in the dataset, a 10 queries from the tables II and III and asked three users to document is created containing features that are matched to 81 the keyword Query. The approach described in [9] also uses [6] He, Hao, Haixun Wang, Jun Yang, and Philip S. Yu. "BLINKS: ranked some external knowledge; it uses the supporting entity pairs keyword searches on graphs." In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 305-316. ACM, 2007. in order to paraphrase dictionary records semantic equivalence [7] W. Le, F. Li, A. Kementsietsidis, and S. Duan. Scalable keyword search between relation phrase and dataset; but the supporting entity on large rdf data. TKDE, 26:2774–2788, 2014 pairs are specific to Wikipedia and the New York Times [10]. [8] Wang, Haofen, Kang Zhang, Qiaoling Liu, Thanh Tran, and Yong Yu. "Q2semantic: A lightweight keyword interface to semantic search." In order to aggragate the matching elements in a dataset, In European Semantic Web Conference, pp. 584-598. Springer, Berlin, SPARK[5] uses an ontology to discover the relations between Heidelberg, 2008. the keywords and the dataset and uses a minimal spanning tree [9] Zou, Lei, Ruizhe Huang, Haixun Wang, Jeffrey Xu Yu, Wenqiang He, and Dongyan Zhao. "Natural language question answering over RDF: a algorithm to create a possible query graph. The approaches graph data driven approach." In Proceedings of the 2014 ACM SIGMOD described in [8], [6], [11], [14], [15] transform the data graph international conference on Management of data, pp. 313-324. ACM, 2014. into a summarized graph; some of them start from the leaf [10] Nakashole, Ndapandula, Gerhard Weikum, and Fabian Suchanek. "PATTY: a taxonomy of relational patterns with semantic types." In nodes containing the keyword query and do a traversal until Proceedings of the 2012 Joint Conference on Empirical Methods in Natural all the paths converge to the same node, and the other works Language Processing and Computational Natural Language Learning, pp. use the summarized graph and try to extract a SPARQL query 1135-1145. Association for Computational Linguistics, 2012. [11] Li, Feifei, Wangchao Le, Songyun Duan, and Anastasios Kementsiet- by finding relationships between the nodes.[12] classifies the sidis. "Scalable keyword search on large RDF data." IEEE Transactions keywords into two sets: the first one contains the vertices and on Knowledge and Data Engineering 1 (2014): 1. the second one contains the edges; then the final possible [12] Han, Shuo, Lei Zou, Jeffery Xu Yu, and Dongyan Zhao. "Keyword Search on RDF Graphs-A Query Graph Assembly Approach." In Proceed- solutions are computed. In all these works, the keyword in the ings of the 2017 ACM on Conference on Information and Knowledge query are matched with the nodes of the considered graph, Management, pp. 227-236. ACM, 2017. unlike our approach which considers semantic relations and [13] Ouksili, Hanane, Zoubida Kedad, Stéphane Lopes, and Sylvaine Nugier. "Using Patterns for Keyword Search in RDF Graphs." In EDBT/ICDT searches for matching elements in both the nodes and the edges Workshops. 2017. in order to build the final result. [14] Ayvaz, Serkan, and Mehmet Aydar. "Using RDF Summary Graph For Keyword-based Semantic Searches." arXiv preprint arXiv:1707.03602 (2017). VIII. CONCLUSION AND FUTURE WORKS [15] Lin, Xiao-Qing, Zong-Min Ma, and Li Yan. "RDF Keyword Search Us- ing a Type-based Summary." JOURNAL OF INFORMATION SCIENCE In this paper, we have proposed an approach for keyword AND ENGINEERING 34, no. 2 (2018): 489-504. search in an RDF dataset, which represents an alternative to the use of query languages such as Sparql. We have focused on the problem of handling differences in the terminologies between the RDF dataset and the keyword query. We have proposed a novel solution which relies on an external knowledge source. In our approach, the answer to the keyword query is a set of subgraphs containing for each keyword one matching graph element. We have described how we enrich the set of solutions by using WordNet. We have also provided a ranking model to rank the resulting subgraphs. This model is based on the number of semantic relations extracted from WordNet and the number of nodes and edges for each subgraph. We have conducted some experiments which have shown that external knowledge gives more results for some queries and sometimes returns an answer where other approaches fail to. In future works, we will study the possible improvements to the aggregation of matching elements and try to find other ways of combining them. We will also study scalability issues and enable efficient keyword search for massive datasets. R EFERENCES [1] https://www.w3.org/RDF/. [2] V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, pages 670–681, 2002. [3] L. Qin, J. X. Yu, L. Chang, and Y. Tao. Querying communities in relational databases. In ICDE, pages 724–735, 2009. [4] Guo, Lin, Feng Shao, Chavdar Botev, and Jayavel Shanmugasundaram. "XRANK: Ranked keyword search over XML documents." In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 16-27. ACM, 2003. [5] Zhou, Qi, Chong Wang, Miao Xiong, Haofen Wang, and Yong Yu. "SPARK: adapting keyword query to semantic search." In The Semantic Web, pp. 694-707. Springer, Berlin, Heidelberg, 2007. 82