<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Keyword-Based Semantic Search Engine Koios++</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bjrn Forcher</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Giloj</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erich Weichselgartner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leibniz Institute for Psychology Information</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe the keyword-based semantic search engine KOIOS++ which interprets the keywords and computes a set of SPARQL queries. The special feature of KOIOS++ is that it leverages not only the class hierarchy but also the property hierarchy. The algorithm and data structures of KOIOS++ are based on a well-established approach that we extended by minor adjustments of the data structures and a sophisticated weighting strategy.</p>
      </abstract>
      <kwd-group>
        <kwd>keyword-based semantic search</kwd>
        <kwd>RDFS semantics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The RDF data model is a popular data model of the Semantic Web which can be
easily transferred to a simple directed graph. RDFS extends RDF and provides
representational constructs for describing ontologies, for instance the properties
rdfs:subClassOf and rdfs:subPropertyOf . Both properties are transitive relations
which are used to describe class or property hierarchies of ontologies.</p>
      <p>
        Finding information in graph-shaped data, keyword search seems to be
natural because it is the de facto standard for current search engines. The eld of
keyword-based search on graph-structured data in general, and in particular over
RDF data, is a prevalent research topic and corresponding eorts can be referred
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] at dierent glances. Tran et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] describe an interesting
approach of keyword-based semantic search for computing the top-k ranked search
results from RFD(S) graphs. They rst compute queries from the keywords,
allowing the users to choose one of them, and nally to process the query using
the underlying database engine. Nevertheless, these approaches do not focus on
the hierarchy of properties and thus, they cannot make use of the transitive tree
of the rdfs:subPropertyOf relation. Recent publications of that group focused on
the processing of the approximated top-k ranked results [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to reduce
computation time but they did not consider further aspects of the RDFS semantics. The
primary motivation of our work is to make the rdfs:subPropertyOf available for
keyword-based semantic search on graph shaped data. We extend the approach
of Tran et al. by using a special graph mapping and a sophisticated weighting
strategy which is implemented in the KOIOS++ search engine. We claim that in
this way we get a semantic benet because we can favor certain graph patterns
during the search. As a result, it is possible, for instance, to make use of the
property hierarchy by leveraging the rdfs:subPropertyOf relation.
      </p>
      <p>The paper is structured as follows. Sec. 2 introduces the search engine
KOIOS++ including its search algorithm and data structures. We conclude with a
brief summary and a small outlook (Sec. 3).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Koios++</title>
      <p>
        As mentioned in the previous chapter, the approach of KOIOS++ is based on
the work of Tran et al.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] which is depicted in Figure 1. In the Data
Preprocessing two main data structures are built from the RDF(S) data, namely keyword
index and summary graph. The graph represents a summarization of the RDF(S)
data comprising only structural information. Thus, it contains only classes and
properties, but no literals and instances. Literals and instances are integrated at
runtime by means of the keyword index. In general, the keyword index is used to
map keywords to elements of the RDF(S) data. The basic thinking behind the
establishment of both structures is to enable a high performance search along
with scalability. The summary graph (kept in memory) contains only as much
information as necessary whereas the keyword index (native database) represents
an entry point containing additional information.
Keyword-Element
      </p>
      <p>Mapping</p>
      <p>Augmentation of
Graph Index</p>
      <p>Top-K Graph</p>
      <p>Exploration
Keyword Index</p>
      <p>Summary Graph
Keyword Indexing</p>
      <p>Summarization</p>
      <p>Conjunctive Queries</p>
      <p>Element-Query</p>
      <p>Mapping</p>
      <p>Query Computation</p>
      <p>Data Preprocessing</p>
      <p>RDF(S) Data</p>
      <p>
        One contribution of our work is the adjustment of the summary graph in
order to leverage both, the class hierarchy and the property hierarchy. Figure 2
is intended to point out the main dierence between the two approaches. The
gure contains some example triples and the corresponding summary. In
contrast to the summarization of Tran et al. a triple with two instances is mapped
to a directed edge. Source and target correspond to the classes of the instances
and the property of the triple is noted as label of the edge. In our approach every
argument of a triple is mapped to a typed node (class or property node). The
node of the subject and the node of the predicate are connected by a directed
edge without label. The same applies for predicate and object. The only
exception to that rule are triples including a ’rdfs:subClassOf’ or ’rdfs:subPropertyOf’
predicate. In these cases the nodes are directly linked with an edge.
For computing SPARQL queries, the loose keywords are rst disassembled into
its constituent parts. For simplicity, let the keywords already be separated. Thus,
there is a set of h terms T = ft1; :::; thg, whereas ti 2 T is either a single word
or a multi-word expression. In the following step, the terms are mapped to
elements of the input RDF(S) data which are called M-Resources. In general, a
term ti 2 T is mapped to a set of j resources Ri = fri1 ; :::; rij g. As follows,
there are h sets of M-Resources R1; :::; Rh that are used for further processing.
The resource sets R1 to Rh were distributed to h threads and in each thread Zi
a graph exploration is performed on the prepared (augmented) summary graph
for each M-Resource riq 2 Ri. Hence, many paths were explored starting from
riq . In case there is a resource rc that is reached by any path in each thread a
connecting subgraph Gs of the knowledge base can be constructed consisting of
h paths. The resource rc is called C-Resource which connects all paths with one
another. The outcome of the graph search algorithm is a set of weighted
subgraphs G1; :::; Gq, whose size can be restricted by an upper weight limit and a
general time limit. The weighting strategy can be separated into two parts. The
static part weights all nodes and edges in the preprocessing step (further
information is presented in Tran et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). The dynamic part of the weighting takes
place at the runtime of the system and concerns visited paths only. The weight
wpn for a new path pn is based on the path before po, the new edge ea and node
vb, and the resulting path pattern mn: w(pn) = w(po) + w(ea) + w(vb) + w(mn).
The last steps of KOIOS++ are straightforward. For each subgraph G1; :::; Gq
a conjunctive SPARQL query is constructed and presented as semantic network
to the user. Subsequently, the user can select relevant queries to retrieve the
corresponding answer from the triplestore.
      </p>
      <p>Consider the example keywords ’person’ and ’worked’. The rst one is mapped
to the class ’zpid:Person’ and the second one is mapped to the property
’zpid:WorkedOn’. The exploration may track two paths starting from both nodes
which may end up in the nodes ’zpid:Librarian’ or ’zpid:Author’. As follows, two
dierent SPARQL queries can be constructed.</p>
      <p>The presented data structure has an inherent disadvantage because the queries
may contain statements that are not included in the origin data, for instance, ’a
librarian wrote an article’. This information is not necessarily incorrect, but if
this relation is not entailed in the triplestore unnecessary graph traveling is done.
As follows, the algorithm gets expensive and time-consuming. This is one reason,
why we integrated the dynamic weighting as described above. If the graph
traveling ends in a path pn with an unwanted pattern the additional weight w(mn)
is set to innite (if not unwanted w(mn) = 0). That means that the new path
is (very likely) not considered for further exploration and query construction.
Thus, the described graph mapping in combination with the dynamic weighting
enables the integration of the property hierarchy without constructing unwanted
SPARQL queries.</p>
      <p>However, this kind of weighting strategy could also be used to foster ( w(mn) &lt; 0)
certain graph patterns which is important for explanation scenarios.
zpid:Librarian
rdf:type
rdf:type
zpid:ErichW
zpid:Article-01</p>
      <p>zpid:Abstract-02
zpid:Person
zpid:Author
zpid:Article
In this paper, we presented the semantic search engine KOIOS++ that enables
a keyword-based search on RDF(S) triple stores. It interprets the keywords and
computes a set of SPARQL queries which can be selected by users to search the
triple store. The RDF(S) data is mapped to a graph structure which is explored
for the computation. In contrast to other approaches a predicate is not mapped
to an edge, it is mapped to a node. To avoid unnecessary workload a sophisticated
weighting strategy is applied. In particular, the strategy takes place at runtime
whereas the exploration of new graph elements is based on the previous explored
elements (conditional search). The presented approach enables not only heuristic
reasoning on class hierarchies but also on the property hierarchies.
The next step of our work is to integrate SPARQL operators into our approach.
Thus, it becomes possible to use words, such as "greater" or "smaller".</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Achiezra</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golenberg</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kimelfeld</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagiv</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Exploratory keyword search on data graphs</article-title>
          .
          <source>In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. SIGMOD '10</source>
          , New York, USA, ACM (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cappellari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De</surname>
            <given-names>Virgilio</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Maccioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Roantree</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>A path-oriented rdf index for keyword search query processing</article-title>
          .
          <source>In: Proceedings of the 22Nd International Conference on Database and Expert Systems</source>
          Applications - Volume
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          . (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>De</surname>
            <given-names>Virgilio</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Maccioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Cappellari</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.:</surname>
          </string-name>
          <article-title>A linear and monotonic strategy to keyword search over rdf data</article-title>
          . In Daniel, F.,
          <string-name>
            <surname>Dolog</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Q</given-names>
          </string-name>
          ., eds.: Web Engineering. Springer Berlin Heidelberg (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudolph</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Top-k exploration of query candidates for ecient keyword search on graph-shaped (rdf) data</article-title>
          .
          <source>In: Proc. of the 25th Intern. Conference on Data Engineering (ICDE'09)</source>
          , Shanghai, China (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Wagner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bicer</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Pay-as-you-go approximate join top-k processing for the web of data</article-title>
          . In
          <string-name>
            <surname>Presutti</surname>
          </string-name>
          , V.,
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gandon</surname>
          </string-name>
          , F.,
          <string-name>
            <surname>d'Aquin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tordai</surname>
          </string-name>
          , A., eds.:
          <source>The Semantic Web: Trends and Challenges</source>
          . Springer International Publishing (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>