Keyword-Based Semantic Search Engine Koios++
          Björn Forcher1 , Andreas Giloj1 , and Erich Weichselgartner1
                    Leibniz Institute for Psychology Information,
                    Germany, email: rstname.lastname@zpid.de


      Abstract.   In this paper, we describe the keyword-based semantic search
      engine KOIOS++ which interprets the keywords and computes a set of
      SPARQL queries. The special feature of KOIOS++ is that it leverages
      not only the class hierarchy but also the property hierarchy. The algo-
      rithm and data structures of KOIOS++ are based on a well-established
      approach that we extended by minor adjustments of the data structures
      and a sophisticated weighting strategy.

      Key words:    keyword-based semantic search, RDFS semantics


1 Introduction
The RDF data model is a popular data model of the Semantic Web which can be
easily transferred to a simple directed graph. RDFS extends RDF and provides
representational constructs for describing ontologies, for instance the properties
rdfs:subClassOf and rdfs:subPropertyOf. Both properties are transitive relations
which are used to describe class or property hierarchies of ontologies.
    Finding information in graph-shaped data, keyword search seems to be nat-
ural because it is the de facto standard for current search engines. The eld of
keyword-based search on graph-structured data in general, and in particular over
RDF data, is a prevalent research topic and corresponding eorts can be referred
in [1], [2], and [3] at dierent glances. Tran et al. [4] describe an interesting ap-
proach of keyword-based semantic search for computing the top-k ranked search
results from RFD(S) graphs. They rst compute queries from the keywords, al-
lowing the users to choose one of them, and nally to process the query using
the underlying database engine. Nevertheless, these approaches do not focus on
the hierarchy of properties and thus, they cannot make use of the transitive tree
of the rdfs:subPropertyOf relation. Recent publications of that group focused on
the processing of the approximated top-k ranked results [5] to reduce computa-
tion time but they did not consider further aspects of the RDFS semantics. The
primary motivation of our work is to make the rdfs:subPropertyOf available for
keyword-based semantic search on graph shaped data. We extend the approach
of Tran et al. by using a special graph mapping and a sophisticated weighting
strategy which is implemented in the KOIOS++ search engine. We claim that in
this way we get a semantic benet because we can favor certain graph patterns
during the search. As a result, it is possible, for instance, to make use of the
property hierarchy by leveraging the rdfs:subPropertyOf relation.
  The paper is structured as follows. Sec. 2 introduces the search engine KOI-
OS++ including its search algorithm and data structures. We conclude with a
brief summary and a small outlook (Sec. 3).

2 Koios++
As mentioned in the previous chapter, the approach of KOIOS++ is based on
the work of Tran et al.[4] which is depicted in Figure 1. In the Data Preprocess-
ing two main data structures are built from the RDF(S) data, namely keyword
index and summary graph. The graph represents a summarization of the RDF(S)
data comprising only structural information. Thus, it contains only classes and
properties, but no literals and instances. Literals and instances are integrated at
runtime by means of the keyword index. In general, the keyword index is used to
map keywords to elements of the RDF(S) data. The basic thinking behind the
establishment of both structures is to enable a high performance search along
with scalability. The summary graph (kept in memory) contains only as much
information as necessary whereas the keyword index (native database) represents
an entry point containing additional information.
                  Keywords                                                               Conjunctive Queries


               Keyword-Element-        Augmentation of               Top-K Graph             Element-Query-
                   Mapping               Graph Index                 Exploration                Mapping

                                                                                              Query Computation


                                     Keyword Index                   Summary Graph


                                    Keyword Indexing                  Summarization


                                                                                   Data Preprocessing


                                                       RDF(S) Data


                   Fig. 1.        Data Processing and Query Computation

    One contribution of our work is the adjustment of the summary graph in
order to leverage both, the class hierarchy and the property hierarchy. Figure 2
is intended to point out the main dierence between the two approaches. The
gure contains some example triples and the corresponding summary. In con-
trast to the summarization of Tran et al. a triple with two instances is mapped
to a directed edge. Source and target correspond to the classes of the instances
and the property of the triple is noted as label of the edge. In our approach every
argument of a triple is mapped to a typed node (class or property node). The
node of the subject and the node of the predicate are connected by a directed
edge without label. The same applies for predicate and object. The only excep-
tion to that rule are triples including a 'rdfs:subClassOf' or 'rdfs:subPropertyOf'
predicate. In these cases the nodes are directly linked with an edge.
For computing SPARQL queries, the loose keywords are rst disassembled into
its constituent parts. For simplicity, let the keywords already be separated. Thus,
there is a set of h terms T = {t1 , ..., th }, whereas ti ∈ T is either a single word
or a multi-word expression. In the following step, the terms are mapped to el-
ements of the input RDF(S) data which are called M-Resources. In general, a
term ti ∈ T is mapped to a set of j resources Ri = {ri1 , ..., rij }. As follows,
there are h sets of M-Resources R1 , ..., Rh that are used for further processing.
The resource sets R1 to Rh were distributed to h threads and in each thread Zi
a graph exploration is performed on the prepared (augmented) summary graph
for each M-Resource riq ∈ Ri . Hence, many paths were explored starting from
riq . In case there is a resource rc that is reached by any path in each thread a
connecting subgraph Gs of the knowledge base can be constructed consisting of
h paths. The resource rc is called C-Resource which connects all paths with one
another. The outcome of the graph search algorithm is a set of weighted sub-
graphs G1 , ..., Gq , whose size can be restricted by an upper weight limit and a
general time limit. The weighting strategy can be separated into two parts. The
static part weights all nodes and edges in the preprocessing step (further infor-
mation is presented in Tran et al. [4]). The dynamic part of the weighting takes
place at the runtime of the system and concerns visited paths only. The weight
wpn for a new path pn is based on the path before po , the new edge ea and node
vb , and the resulting path pattern mn : w(pn ) = w(po ) + w(ea ) + w(vb ) + w(mn ).
The last steps of KOIOS++ are straightforward. For each subgraph G1 , ..., Gq
a conjunctive SPARQL query is constructed and presented as semantic network
to the user. Subsequently, the user can select relevant queries to retrieve the
corresponding answer from the triplestore.
Consider the example keywords 'person' and 'worked'. The rst one is mapped
to the class 'zpid:Person' and the second one is mapped to the property
'zpid:WorkedOn'. The exploration may track two paths starting from both nodes
which may end up in the nodes 'zpid:Librarian' or 'zpid:Author'. As follows, two
dierent SPARQL queries can be constructed.
The presented data structure has an inherent disadvantage because the queries
may contain statements that are not included in the origin data, for instance, 'a
librarian wrote an article'. This information is not necessarily incorrect, but if
this relation is not entailed in the triplestore unnecessary graph traveling is done.
As follows, the algorithm gets expensive and time-consuming. This is one reason,
why we integrated the dynamic weighting as described above. If the graph trav-
eling ends in a path pn with an unwanted pattern the additional weight w(mn )
is set to innite (if not unwanted w(mn ) = 0). That means that the new path
is (very likely) not considered for further exploration and query construction.
Thus, the described graph mapping in combination with the dynamic weighting
enables the integration of the property hierarchy without constructing unwanted
SPARQL queries.
However, this kind of weighting strategy could also be used to foster (w(mn ) < 0)
certain graph patterns which is important for explanation scenarios.
                zpid:Author                        zpid:Publication      zpid:Librarian                       zpid:Metadata


                      rdf:type                    rdf:type                       rdf:type                   rdf:type


                                    zpid:wrote                                              zpid:wrote
                zpid:ErichW                        zpid:Article-01       zpid:JuergenW                       zpid:Abstract-02


                                                                      Summarization


                                                      zpid:Author                                         zpid:Article
                     zpid:Person
                                                                              zpid:wrote


                   zpid:workedOn                     zpid:Librarian                                      zpid:Metadata


                                   Fig. 2.       Summarization of some triples


3 Conclusion
In this paper, we presented the semantic search engine KOIOS++ that enables
a keyword-based search on RDF(S) triple stores. It interprets the keywords and
computes a set of SPARQL queries which can be selected by users to search the
triple store. The RDF(S) data is mapped to a graph structure which is explored
for the computation. In contrast to other approaches a predicate is not mapped
to an edge, it is mapped to a node. To avoid unnecessary workload a sophisticated
weighting strategy is applied. In particular, the strategy takes place at runtime
whereas the exploration of new graph elements is based on the previous explored
elements (conditional search). The presented approach enables not only heuristic
reasoning on class hierarchies but also on the property hierarchies.
The next step of our work is to integrate SPARQL operators into our approach.
Thus, it becomes possible to use words, such as "greater" or "smaller".

References
1. Achiezra, H., Golenberg, K., Kimelfeld, B., Sagiv, Y.: Exploratory keyword search on
   data graphs. In: Proceedings of the 2010 ACM SIGMOD International Conference
   on Management of Data. SIGMOD '10, New York, USA, ACM (2010)
2. Cappellari, P., De Virgilio, R., Maccioni, A., Roantree, M.: A path-oriented rdf index
   for keyword search query processing. In: Proceedings of the 22Nd International
   Conference on Database and Expert Systems Applications - Volume Part II. (2011)
3. De Virgilio, R., Maccioni, A., Cappellari, P.: A linear and monotonic strategy to
   keyword search over rdf data. In Daniel, F., Dolog, P., Li, Q., eds.: Web Engineering.
   Springer Berlin Heidelberg (2013)
4. Tran, D.T., Wang, H., Rudolph, S., Cimiano, P.: Top-k exploration of query candi-
   dates for ecient keyword search on graph-shaped (rdf) data. In: Proc. of the 25th
   Intern. Conference on Data Engineering (ICDE'09), Shanghai, China (2009)
5. Wagner, A., Bicer, V., Tran, T.: Pay-as-you-go approximate join top-k processing for
   the web of data. In Presutti, V., d'Amato, C., Gandon, F., d'Aquin, M., Staab, S.,
   Tordai, A., eds.: The Semantic Web: Trends and Challenges. Springer International
   Publishing (2014)