<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Running SPARQL-Fulltext Queries Inside a Relational DBMS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arunav Mishra</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sairam Gurajada</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Theobald</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Max Planck Institute for Informatics</institution>
          ,
          <addr-line>Saarbru ̈cken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe the indexing, ranking, and query processing techniques we implemented in order to process a new kind of SPARQL-fulltext queries that were provided in the context of the INEX 2012 Jeopardy task. The INEX 2012 Linked Data track provides a new data collection that aims to combine the benefits of both text-oriented and structured retrievals settings in one unified data collection. For the rapid development of a new query engine that could handle this particular combination of XML markup and RDF-style resource/property-pairs, we decided to opt for a relational database system as storage back-end, which allows us to index the collection and to retrieve both the SPARQL- and keyword-related conditions of the Jeopardy queries under one common application layer. To process the 90 queries of the benchmark, we keep two main index structures, one of which is based on a recent dump of DBpedia core triples, and another one, which is based on keywords extracted from the INEX Wikipedia-LOD collection. Additionally, our engine comes with a rewriting layer that translates the SPARQL-based query patterns into SQL queries, thus formulating joins over both the DBpedia triples and the keywords extracted from the XML articles.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>2.1</p>
      <p>Query1: What is a famous Indian Cuisine dish that
mainly contains rice, dhal, vegetables, roti and papad
SELECT ?s WHERE f ?s ?p &lt;http://dbpedia.org/ http://
resource/Category:Indian cuisine&gt; . FILTER dbpedia.org/
FTContains (?s, "rice dhal vegetables roti resource/Thali
papad") . g</p>
      <p>Query2: Which mountain range is bordered by another
mountain range and is a popular sightseeing and sports destination?
SELECT ?p WHERE f ?m &lt;http://dbpedia.org/ http://
ontology/border&gt; ?p . ?p &lt;http://www.w3.org/ dbpedia.org/
1999/02/22-rdf-syntax-ns#type&gt; &lt;http:// resource/Alps
dbpedia.org/ontology/MountainRange&gt; . FILTER
FTContains(?p, "popular sightseeing and sports
destination") . g</p>
      <p>
        Table 1. Example benchmark queries in SPARQL-fulltext format
2. WikiText: This portion of the document includes text from the Wikipedia article in
well-formed XML syntax. The Wikipedia were translated from the common
MediaWiki [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] format. Additionally XML tags for all infobox attributes (and
corresponding values) were included to replace all Wiki markup by proper XML tags. The
inter-Wiki links which point to the other Wikipedia entities are extended to include
the links to both the YAGO and DBpedia resources of the Wikipedia target pages.
Each link has three sub-links: wiki-link, yago-link, and dbpedia-link.
The wiki-link attribute is a regular hyperlink to the target Wikipedia article,
while the yago-link and dbpedia- link attributes contain pointers to the
respective sources in the RDF collections of DBpedia and YAGO2.
3. Properties: Finally, each document at the end includes a list of all the properties
from the DBpedia and YAGO facts about the entity.
      </p>
      <sec id="sec-1-1">
        <title>2.2 Benchmark Queries</title>
        <p>
          The translated queries are syntactically conforming to the SPARQL standard, with an
additional FTContains operator that provides the flexibility to add keyword
constraints to the query. The new FTContains operator takes two parameters, namely
the variable name in the SPARQL component and a set of associated keywords.
Table 1 shows two such natural-language queries and their SPARQL translations
containing FTContains operators for the keyword components. For instance, in the
SPARQL translation of Query 1, we have the original SPARQL component as ?s ?p
&lt;http://dbpedia.org/resource/Category:Indian cuisine&gt; and the
keyword component as frice dhal vegetables roti papadg. In addition,
the subject “?s” is bound by the keyword component as specified in the FTContains
function.
We employed a regular SAX parser to parse the 3.1 Million XML articles whose
general XML structure is still based on that of the original articles. That is, these articles
contain a metadata header with information about the ID, authors, creation date and
others, usually also an infobox with additional semistructured information consisting
of attribute-value pairs that describe the entity, and of course rich text contents
consisting of unstructured information and more XML markup about the entity that is captured
by such an article. Our keyword indexer uses the basic functionality of TopX 2.0 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ],
which includes Porter stemming, stopword removal and BM25 ranking, but stores the
resulting inverted lists for keywords into a relational database instead of TopX’s
proprietary index structures.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Data Model for the Document Collection</title>
      <p>
        In this section we describe the storage backend for both the structured and unstructured
data parts of our document collection. Our query engine employs two relational tables
to import the core tables of DBpedia and the keywords extracted from the Wikipedia
LOD collection [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in one unified database schema. Thus, the SPARQL queries with
fulltext filter conditions of the above form can directly be rewritten to SQL queries with
various join conditions over these two tables.
4.1
      </p>
      <sec id="sec-2-1">
        <title>Storing RDF Data in a Single Relational Table</title>
        <p>An RDF data collection can be defined as a collection of triples (a subject or resource,
a predicate or property, and an object or value). In this paper, we refer to the RDF data
Column Type
N3ID NUMBER
Subject VARCHAR2(1024)
Predicate VARCHAR2(1024)
Object VARCHAR2(1024)
Table 2. Schema of the
DBpediaCore table</p>
        <p>Index Name Attributes
DBpediaIDX Obj (Object,Subject,Predicate)
DBpediaIDX Sub (Subject,Predicate,Object)
DBpediaIDX Prd (Predicate,Object,Subject)</p>
        <p>Table 3. Indices built over the DBpediaCore table
as a collection of SPO triples, each containing exactly one such subject (S), predicate
(P), and object (O).</p>
        <p>As our storage back-end, we use a relational database (in our case Oracle 11g), to
store the RDF data we imported from the core dump of DBpedia v3.7 (see
http://downloads.dbpedia.org/3.7/en/).A SPARQL query then conceptually is
transformed into a sub-graph matching problem over this large RDF graph. The RDF data
model (a collection of triples) can be viewed as a directed, labeled multi-graph (RDF
graph) where every vertex corresponds to a subject or object, and each directed edge
from a subject to an object corresponds to a predicate. There may be multiple edges
directed from a subject towards an object. Thus an RDF graph becomes a multi-graph.</p>
        <p>
          In our query engine, RDF data is treated as a large list of triples and stored in a
relational database. Figure 1(b) shows a fragment of an RDF graph constructed over
Wikipedia, and Figure 1(a) shows a corresponding relational table representation in the
database. Keeping this in mind, and maintaining the simplicity of the model, we use
one relational table to store all SPO triples. In our system, we refer to this table as the
DBpediaCore table, and its schema is described in Table 2. This route is pursued
similarly by many so-called triplet-stores like Jena [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], Sesame [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and Oracle [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
and RDF-3X [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Though there are other advanced and more complex approaches, like
vertical partitioning or the exploitation of property tables, our goal was satisfied by this
relatively simple idea.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>4.2 Indices on the RDF Table</title>
        <p>Representing RDF data as a relational table opens up for us all kinds of optimization
techniques used by the database community. One such technique is to build indices
over the DBpediaCore table to improve the data retrieval operations on it. Since
we translate a SPARQL query with fulltext conditions into conjunctive SQL queries
(described in the following section), we employ a large amount of self joins over this
table. These self joins could occur on any of the three columns described in Table 2.
Thus we build three multi-attribute indices, using each of the SPO columns of the table
as key and the remaining two attributes as depending data values, to accelerate the
lookup times over this table for the case when each of the attributes is provided as a
constant by the SPARQL query. Table 3 describes these three indices built over the
DBpediaCore table.</p>
        <p>Column Type
Entity ID VARCHAR2(1024)
Term VARCHAR2(1024)
Score NUMBER
Table 4. Table schema of the
Keywords table</p>
        <p>Index Name Attributes
Keywords Entity IDX (Entity ID, Term, Score)
Keywords Term IDX (Term, Entity ID, Score)</p>
        <p>Table 5. Indices built over the Keywords table
4.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Storing Keywords in a Single Relational Table</title>
        <p>As per traditional IR, a fulltext search allows for identifying documents that satisfy a
keyword query, and optionally sorting the matching documents by their relevance to the
query. The most common approaches use fairly simple TF/IDF counts or Boolean
retrieval to measure the content similarity of a fulltext keyword query to the documents in
the data collection. In our engine, we store the unstructured text contents of all the
documents in the collection as another table in the relational database. This table essentially
contains three columns, relating a keyword (or term) with the documents in which it
occurs and the similarity scores calculated for this particular term and document based
on the underlying scoring model.</p>
        <p>
          We define a term as a sequence of characters that occur grouped together in some
document and thus yield a useful semantic unit for processing. To obtain this set of
terms from the fulltext content of a Wikipedia article, we use a variant of the TopX
indexer [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which applies a standard white-space tokenizer, Porter stemming, and
stopword removal to the unstructured text inputs. For this purpose, we treat all CDATA
sections and attribute values of the Wikipedia-LOD articles as one flat collection of
text; that is we treat the XML collection as an unstructured set of Wikipedia text
documents. We use a default BM25 scoring model (using k = 2:0 and b = 0:75) to
calculate the relevance score of a term in a document. In our approach, we create a
Keywords table to store all the terms occurring in the unstructured Wikipedia fulltext
collection. The schema of this table is shown in Table 4. The Entity ID column
essentially stores the Uniform Resource Identifier (URI) of the DBpedia entities. It may
be noted that every document in our collection also corresponds to a DBpedia entity.
So we prefer to use the RDF prefix defined by DBpedia to represent an entity, which
is hhtttp://dbpedia.org/resource/entityi. To obtain a mapping from a
DBpedia entity to the Entity ID’s from a Wikipedia page, we maintain a hashmap
that maps every Wikipedia page to its corresponding DBpedia entity URI. Thus
every statement of the Keywords table represents a term that is mapped to an entity in
DBpedia together with its content similarity score to the entity’s Wikipedia page.
4.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Indices on the Keywords Table</title>
        <p>We can easily imagine that the Keywords table would be very large (containing more
than a billion entires), considering the number of terms extracted from almost the entire
Wikipedia encyclopedia. The Keywords table basically maps every term occurring
in Wikipedia to a DBpedia entity. Taking the size of this table into consideration, data
retrieval operations on the table becomes costly and inefficient. Also processing
conjunctive queries over multiple self joins becomes infeasible unless correct indices are
built over the table. So we build two more multi-attribute indices over this table as
shown in Table 5.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Scoring</title>
      <sec id="sec-3-1">
        <title>Keywords Scoring and retrieval</title>
        <p>We use the TopX 2.0 indexer to create per-term inverted lists from the plain
(unstructured) text content of the Wikipedia documents, which each corresponds to an entity
in DBpedia. The exact BM25 variant we used for ranking an entity e by a string of
keywords S in an FTContains operator is given by the following formula:
score(e; FTContains(e; S)) =
ti2S
X (k1 + 1) tf (e; ti)</p>
        <p>K + tf (e; ti)
log</p>
        <p>N</p>
        <p>df (ti) + 0:5
df (ti) + 0:5
with K = k1 (1
b) + b</p>
        <p>len(e)
avgflen(e0) j e0 in collectiong
Where:
1) N is the number of XML articles in Wikipedia LOD collection.
2) tf (e; t) is the term frequency of term t in the Wikipedia LOD article associated
with entity e.
3) df (t) is the document frequency of term t in the Wikipedia LOD collection.
4) len(e) is the length (sum of tf values) of the Wikipedia LOD article associated
with entity e.</p>
        <p>
          We used the values of k1 = 2:0 and b = 0:75 as the BM25-specific tuning parameters
(see also [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] for tuning BM25 on earlier INEX settings).
5.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Translating the Keyword-Scores to SPO Triples</title>
        <p>
          Recently there has been a lot of work on identifying appropriate entity scoring and
ranking models. Many techniques use language models [
          <xref ref-type="bibr" rid="ref10 ref13 ref9">13, 9, 10</xref>
          ] while other approaches
try to adopt more complex measures form IR. Ranking for structured queries have been
intensively investigated for XML [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], and, to a small extent, also for restricted forms of
SQL queries [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. While some of these approaches carry over to large RDF collections
and expressive SPARQL queries as discussed earlier, the LOD track makes a
significant simplifying assumption: since every DBpedia or YAGO entity is associated (via an
explicit link) with the Wikipedia article (in XML format) that describes this entity, the
entities can directly be referred to and ranked by the keywords that are imposed over
the corresponding Wikipedia article. We thus believe that it is a good idea to translate
the scores to the RDF entities from a keyword-based search on the fulltext part of the
Wikipedia LOD collection. As discussed earlier, every entity in our collection is
essentially a document containing all the DBpedia and YAGO properties about this entity
together with the fulltext content from the corresponding Wikipedia page. Thus we
perform a keyword search on this fulltext content by using the fulltext condition specified
by the FTContains operator (as discussed earlier). From this search, we get a ranked
entity list containing the candidates for the answer to the given query. This is can be
best explained by an example as shown in Figure 2 .
        </p>
        <p>Figure 2(a) shows an example query containing a simple SPARQL query with one
triplet and a fulltext condition that is bound to the subject of the SPARQL query.
Figure 2(b) shows fragments of top-100 results obtained by fulltext search on the
unstructured part of the collection with keywords “Lucky Jim”. As illustrated, we already have
the relevant candidates for the answer from the keyword search due to the satisfactory
performance of our BM25 scoring scheme applied to score the keywords. The scored
entities in the candidate list is again checked for the graph pattern of the SPARQL query
in Figure 2(a) and the final top-k entities are retrieved from the entities which qualify for
the graph pattern. Figure 2(c) shows the retrieved results after searched for the pattern
in the graph.
6
6.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Rewriting SPARQL-Fulltext Queries to SQL</title>
      <sec id="sec-4-1">
        <title>Basic Idea of a Conjunctive Query</title>
        <p>Conjunctive queries have high practical relevance because they cover a large part of
queries issued on relational databases and RDF stores. That is, many SQL and SPARQL
queries can be written as conjunctive queries. In our scenario, we are required to
generate an appropriate translation of a given SPARQL query with multiple fulltext
conditions into an SQL query. Our system-generated queries are essentially conjunctive
queries over multiple instances of the DBpediaCore and Keywords tables (as
described earlier).</p>
        <p>To capture the idea behind translating a given SPARQL query with fulltext
conditions into a conjunctive SQL query, let us take an example query Q taken from the
Jeopardy benchmark as shown in Table 6. Q contains three triple patterns T 1, T 2 and
T 3, and two fulltext conditions K1 and K2. Each Ki contains a variable occurring in
a triple pattern and is bound to a string by an FTContains operator. To begin
translation, firstly, every attached string is tokenized and stemmed into the format of terms
stored in the Term column of the Keywords table. For every generated token kj where
j 0 from each Ki , an instance of the Keywords table is taken, where the Terms
column of the instance is restricted to kj . Similarly for every triple pattern Tm , we take
an instance of the DBpediaCore table ti, where i 0. In the example, instance t1
represents triple ?sub ?o ?s, instance t2 represents the triple ?x ?r ?s, and instance t3
represents the triple ?sub rdf:type hhttp://dbpedia.org/ontology/Placei.
These instances ti are further restricted on their Subject or Object by the Entity ID
of the kj instance of the Keywords table as specified by the fulltext condition. Finally,
the instances ti are restricted by each other on Subject, Predicate or Object as
per the logic of the joins in the SPARQL query. The translation of Q into a conjunctive
SQL query, as described above, is shown in Table 7.
7</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Materialization SQL Joins, Temporary Tables, and Join Orders</title>
      <p>In the previous section, we presented a translation method for any given (conjunctive)
SPARQL query with fulltext conditions into a conjunctive SQL query. But, there are
a few problems with such a direct translation. Firstly, this form of translation does
not handle empty results returned from any intermediate join in the desired way. For
example, lets say we encounter with a very unlikely keyword term which is missing in
the Keywords table. Then an empty result will be returned for that instance, and, since
we issue a conjunctive query with all AND conditions, the overall result of the query
will also be empty. Secondly, this type of query with many (self-)joins is inefficient
and difficult to optimize. During query processing, we rely on the Oracle optimizer for
selecting an appropriate query plan. It becomes difficult for the optimizer to choose
the best plan due to the redundancy of data in the DBpediaCore table, i.e., due to
multiple occurrences of an entity as a Subject or Object and an unknown amount of
occurrences of a term in the Keywords table. Due to apparently insufficient statistics
on these large tables (although we had analyzed the tables and enabled this feature in
the optimizer), we found the Oracle optimizer to often choose a bad query plan, which
initially resulted in individual SQL queries that did not even finish after several days.
7.1</p>
      <sec id="sec-5-1">
        <title>SQL Joins</title>
        <p>Most join queries contain at least one join condition, either in the FROM (for outer
joins) or in the WHERE clause. The join condition compares two columns, each from
a different table. To execute a join, the database system combines pairs of rows, each
containing one row from each table, for which the join condition evaluates to true. The
columns in the join conditions need not also appear in the select list.</p>
        <p>To execute a join of three or more tables, Oracle first joins two of the tables based
on the join conditions comparing their columns and then joins the result to another table
based on the join conditions containing columns of the previously joined tables and the
new table. Oracle continues this process until all tables are joined into the result. The
optimizer determines the order in which Oracle joins tables based on the join conditions,
indexes on the tables, and any available statistics for the tables.</p>
        <p>FULL OUTER JOIN A FULL OUTER JOIN does not require each record in the two
joined tables to have a matching record. The joined table retains each record even if no
other matching record exists. Where records in the FULL OUTER JOIN’ed tables do
not match, the result set will have NULL values for every column of the table that lacks
a matching row. For those records that do match, a single row will be produced in the
result set (containing fields populated from both tables). This join can be used to solve
the first problem mentioned above. All instances of the Keywords table representing a
fulltext condition Ki can undergo a FULL OUTER JOIN on the Entity ID attribute.
Thus we will have results from the keywords table even if one of the instance matches
any tuples or has a NULL as a value.
INNER JOIN An INNER JOIN is the most common join operation used in
applications and can be regarded as the default join type. INNER JOIN creates a new result
table by combining column values of two tables (A and B) based upon the join
predicate. The query compares each row of A with each row of B to find all pairs of rows
which satisfy the join predicate. When the join predicate is satisfied, column values for
each matched pair of rows of A and B are combined into a result row. The result of
the join can be defined as the outcome of first taking the Cartesian product (or cross
join) of all records in the tables (combining every record in table A with every record
in table B). It returns all records which satisfy the join predicate. This join returns the
same result as the “equality” join as described in the previous section but gives more
flexibility to the optimizer to select different types of joins like hash joins, merge joins,
or nested loop joins. We can replace all the equality join with Oracle’s INNER JOIN.
In some queries this trick shows considerable improvement in query processing time by
Oracle’s query engine.
7.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Materializing Temporary Tables</title>
        <p>One big conjunctive query forces the Oracle optimizer to choose from a very large
number of possible query execution plans, and it turns out that—at least in our setting—it
often chooses an inefficient plan. For example, in a big conjunctive query, the
optimizer often chose to join instances DBpediaCore tables before restricting the
relevant entities with the given conditions. These types of query plans proved to be highly
expensive. Thus, to prevent the optimizer from taking such inappropriate decisions, we
materialize temporary tables by separately joining the Keywords table instances and
the DBpediaCore table instances. This acts as a strong support for the optimizer to
select better query plans for smaller intermediate queries and store their results into
temporary tables which are later joined together to retrieve the final result.
7.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Evaluating the Join Order and Forcing Orders via Optimizer Hints</title>
        <p>There are some simple techniques by which we can determine the join order of the
tables. One such technique is to maintain an IDF index containing the most frequent
terms that occur in the collection. This index has a very simple and intuitive schema
Features(Term, IDF). The first column represents a term and the second column
represents its Inverse Document Frequency (IDF). It can be intuitively be seen that a
frequent term will have lower IDF and a select query will result in bigger intermediate
joins. At the same time, if a term is absent in the feature index, then it can be assumed
to be infrequent and thus have a high IDF value. Every instance of the Keywords table
can now be joined in increasing order of the IDF values of their respective term, thus
ensuring the smaller tables to be joined first. This order of joining can be enforced on
the Oracle optimizer by adding so-called optimizer hints to the queries.</p>
        <p>A hint is a code snippet that is embedded into an SQL statement to suggest to
Oracle how the statement should be executed. There are many hints provided to assist
the optimizer. In our case, we found the Ordered hint to force the optimizer to join
tables in the specified order written in the FROM clause of the query. So our translator
algorithm writes the Keywords table instances in the appropriate order in the FROM
clause of the translated SQL query.
8</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>The Rewriting Algorithm</title>
      <p>We can now develop an overall rewriting algorithm by putting together all the afore
described steps as follows.
1. Load the features index containing frequent terms and their IDF values into main
memory.
2. Tokenize and stem the FTContains fulltext conditions and decide the order of
joins among the keywords from the features index.
3. Create temporary Keysi tables for each fulltext condition: these contain the results
of the outer join over the Keywords table instances constrained by the terms. This
is shown in Figure 3.
4. Create temporary T abi tables for each triplet pattern: these contain the results
of the inner join over the DBpediaCore table instances which are additionally
joined with Keysi temporary tables for each FTContains fulltext condition in
the query. This is shown in Figure 4.
5. Assign a default score of 1 to all triples in absence of a fulltext condition: in absence
of a fulltext condition on any triplet pattern, a default score of 1 is assigned to all
the triples as a constant score for each triplet condition (as discussed earlier).
6. Final query: the main select query combines the T abi temporary tables via an inner
join; the join logic is based on the joins given in the original SPARQL query. This
is shown in Figure 5.
7. Finally, drop the temporary tables Keysi and T abi. This is shown in Figure 6.
Since a detailed evaluation of the run submissions was not available at the time this
paper was submitted, we defer a discussion of the results until to the INEX workshop
at the CLEF conference in September.
10</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>We presented an approach for storing structured RDF data and unstructured data in
relational database. We also presented the necessary indices required to efficiently
process queries over this relational schema. Our approach converts a SPARQL query with
fulltext conditions into unions of conjunctive SQL queries by materializing temporary
tables. These temporary tables store intermediate results from inner or outer joins over
our relations, based on given conditions in the query. We also presented a simple yet
effective way to rank entities by translating scores from keywords. Finally, we showed
the advantages of predeciding the join orders of tables and techniques to enforce them
in the Oracle optimizer</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Media</given-names>
            <surname>Wiki</surname>
          </string-name>
          . http://en.wikipedia.org/wiki/MediaWiki.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>2. Overview of the INEX 2012 Linked Data Track</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Amer-Yahia</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lalmas</surname>
          </string-name>
          .
          <article-title>XML search: languages, INEX and scoring</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>35</volume>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Broekstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kampman</surname>
          </string-name>
          , and
          <string-name>
            <surname>F. van Harmelen. Sesame:</surname>
          </string-name>
          <article-title>A Generic Architecture for Storing and Querying RDF and RDF Schema</article-title>
          .
          <source>In LNCS</source>
          , volume
          <volume>2342</volume>
          ,
          <string-name>
            <surname>Sardinia</surname>
          </string-name>
          , Italy,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Das</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hristidis</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Probabilistic information retrieval approach for ranking of database query results</article-title>
          .
          <source>ACM Trans. Database Syst., 31</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>E. I.</given-names>
            <surname>Chong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Eadon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          .
          <article-title>An efficient SQL-based RDF querying scheme</article-title>
          .
          <source>In VLDB '05</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>C. L. A.</given-names>
            <surname>Clarke</surname>
          </string-name>
          .
          <article-title>Controlling overlap in content-oriented XML retrieval</article-title>
          . In SIGIR,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          and
          <string-name>
            <surname>G. Weikum.</surname>
          </string-name>
          <article-title>The RDF-3X engine for scalable management of RDF data</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>19</volume>
          (
          <issue>1</issue>
          ),
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D.</given-names>
            <surname>Petkova</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Hierarchical Language Models for Expert Finding in Enterprise Corpora</article-title>
          .
          <source>In ICTAI '06</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>P.</given-names>
            <surname>Serdyukov</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          .
          <article-title>Modeling Documents as Mixtures of Persons for Expert Finding</article-title>
          . In LNCS, volume
          <volume>4956</volume>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>M. Theobald</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Aji</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Schenkel</surname>
          </string-name>
          .
          <source>TopX 2</source>
          .
          <article-title>0 at the INEX 2009 Ad-Hoc and Efficiency Tracks</article-title>
          .
          <source>In INEX</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>K.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sayers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kuno</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          .
          <article-title>Efficient RDF Storage and Retrieval in Jena2</article-title>
          .
          <source>In Proc. First International Workshop on Semantic Web and Databases</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>C. Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            , and
            <given-names>J.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Wen</surname>
          </string-name>
          .
          <article-title>Closing the Loop in Webpage Understanding</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>22</volume>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. L.
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , M. T. O¨ zsu, and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>gStore: answering SPARQL queries via subgraph matching</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>8</issue>
          ),
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>