<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge Driven Query Sharding?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Adam Krasuski</string-name>
          <email>krasus@inf.sgsp.edu.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcin Szczuka</string-name>
          <email>szczuka@mimuw.edu.pl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair of Computer Science, The Main School of Fire Service</institution>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Mathematics, The University of Warsaw</institution>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the idea of an approach to database query sharding that makes use of knowledge about data structure and purpose. It is based on a case study for a database system that contains information about documents. By making use of knowledge about the data structure and the specific top-k queries to be processed we demonstrate a method for avoiding costly and unnecessary steps in query answering. We also demonstrate how the knowledge of data structure may be used to perform sharding and how such sharding may improve performance. We propose generalization of our findings that could lead to self-optimization and self-tuning in RDBMS engines, especially for column-based solutions.</p>
      </abstract>
      <kwd-group>
        <kwd>Text mining</kwd>
        <kwd>document grouping</kwd>
        <kwd>top-k queries</kwd>
        <kwd>query processing</kwd>
        <kwd>database sharding</kwd>
        <kwd>column-oriented</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The SYNAT project (abbreviation of Polish “SYstem NAuki i Techniki”, see
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) is a large, national R&amp;D program of Polish government aimed at
establishment of a unified network platform for storing and serving digital information
in widely understood areas of science and technology. The project is composed
of nearly 50 modules developed by research teams at 16 leading research
institutions in Poland.3 Within the framework of a larger project we want to
design and implement a solution that will make it possible for a user to search
within repositories of scientific information (articles, patents, biographical notes,
etc.) using their semantic content. Our prospective system for doing that is
called SONCA (abbreviation for Search based on ONtologies and Compound
Analytics, see [
        <xref ref-type="bibr" rid="ref6 ref8 ref9">6, 8, 9</xref>
        ]).
      </p>
      <p>Ultimately, SONCA should be capable of answering the user query by listing
and presenting the resources (documents, Web pages, et cetera) that correspond
to it semantically. In other words, the system should have some understanding
of the intention of the query and of the contents of documents stored in the
repository as well as the ability to retrieve relevant information with high
efficacy. The system should be able to use various knowledge sources related to the
investigated areas of science. It should also allow for independent sources of
information about the analyzed objects, such as, e.g., information about scientists
who may be identified as the stored articles’ authors.</p>
      <p>
        In order to be able to provide semantical relationships between concepts and
documents we employ a method called Explicit Semantic Analysis (ESA) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
This method associates elementary data entities with concepts coming from
knowledge base. In our system the elementary data entities are documents
(scientific articles) collected from the PubMed Central database (PMC, see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) and
as knowledge base we use the Medical Subject Headings (MeSH 4) – a
comprehensive controlled vocabulary for the purpose of indexing journal articles and
books in the life sciences. We have field-tested a modified version of the ESA
approach on PMC using MeSH (see [
        <xref ref-type="bibr" rid="ref11 ref5">5, 11</xref>
        ]), and found out that while
conceptually the method performs really well, the underlying data processing, especially
the part performed inside RDBMS, requires introduction of new techniques.
      </p>
      <p>
        From the database viewpoint the problem we want to solve is one of
performing a top-k analytic, agglomerative SQL-query that involves joins on very
large data tables (see [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). The characteristic property of our task is the
possibility of decomposing it into smaller subtasks (tasks on sub-tables), given some
knowledge about the nature and structure of our data set. The fact that the
query-answering can be decomposed into largely independent sub-tasks makes
it possible to optimize it by using only top-k instead of all sub-results most of
the time. Inasmuch as sub-tasks are largely independent from one another, we
can also create shards and process them concurrently using, e.g., multiple cores
(processors).
      </p>
      <p>While optimizing the execution of queries in our particular case, we have
noticed that the problem we are dealing with is of general nature. Our way of
solving it, by decomposition to sub-tasks and then scheduling their concurrent
execution, also appears to be generally applicable. That led us to formulation of
general scheme for processing certain kind of queries with use of knowledge that
we have about underlying data and query type. We have observed that while
some database engines (e.g. PostgreSQL) partially support the kind of operations
we want to perform, others do not. The lack of support for decomposition of
query processing was especially problematic for us in case of column-oriented
database systems (Infobright, MonetDB), since the column-oriented model is
for various reasons recommended for our application.</p>
      <p>The paper is organized as follows. We begin by introducing the example data
set and query-answering task that has led us to key observations (Section 2).
Then we generalize this particular problem to the one of knowledge-based task
decomposition and sharding (Section 2.2) and show experimental results that
demonstrate possible profits and limitations (3). Finally, we sketch the possible
4 http://www.nlm.nih.gov/pubs/factsheets/mesh.html
improvements that could be introduced into database engines (Section 4) and
finish with conclusions and our view on possible further work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Description of the Problem</title>
      <p>
        We present the general task that we are dealing with by means of a case study
which is a version of the actual analytic task that we face while constructing the
SONCA system (see [
        <xref ref-type="bibr" rid="ref11 ref6 ref8 ref9">6, 8, 9, 11</xref>
        ]).
      </p>
      <p>For the sake of clarity of the presentation, we will first make some simplifying
assumptions. As mentioned in Introduction, the analytical part of the SONCA
database stores the information about different types of objects such as authors,
publications, institutions and so on. In this case study we consider only objects of
type publication, i.e., documents. Let us assume that the information about these
objects is stored in a table called document word. The table contains columns as
follows: doc id – identifier of the document, word pos – an ordinal number of the
given word in document, and word – a word from the document. Thus, to store
a document we need as many rows as there are words in it. In the relational
algebra formulae below we will refer to this table as R.</p>
      <p>The second table which is involved in our calculations is called word stem, and
denoted by S. The table contains two columns word and stem – which represent
the stem for the given word. A stem is the root or roots of a word, together with
any derivational affixes, to which inflectional affixes are added. The table stores
the information about the stem and the stemming process, which was performed
earlier using a standard Porter stemmer5.</p>
      <p>The last table needed to present a sample of analytic querying performed
on documents, is called stem concept and denoted by T. The table contains
three columns, as follows: stem, concept – the name of the concept from MeSH
controlled vocabulary and weight which associates quantitatively the concept
with a given stem. The stem concept table represents an inverted index for the
ESA method.</p>
      <p>Figure 1 outlines the tables introduced. The tables will be used to explain the
ESA method, which aims at determining the semantical relationships between
documents and (MeSH) concepts.</p>
      <p>word_document (R)
doc_id LONG
word_pos INTEGER
word VARCHAR</p>
      <p>word_stem (S)
word
stem</p>
      <sec id="sec-2-1">
        <title>VARCHAR</title>
        <p>VARCHAR
stem_concept (T)
stem
concept
weight</p>
      </sec>
      <sec id="sec-2-2">
        <title>VARCHAR VARCHAR REAL</title>
        <p>5 http://tartarus.org/~martin/PorterStemmer/</p>
        <p>
          Determining Semantical Relationships between Documents
In our (SONCA) system we would like to associate each document with a list of
concepts from an ontology or a knowledge base, such as MeSH [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], Wikipedia
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], and so on. This, technically speaking, corresponds to creation of a vector
of ontology concepts associated with the document. The vector is constructed
in such a way, that each position corresponds to an ontology concept and the
numerical value at this position represents the strength of the association. The
strength is derived using the Explicit Semantic Analysis (ESA, see [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]). In a
nutshell, the calculation of the association between concept(s) and the document
in ESA comprises of three steps: stemming, stem frequency calculation, and the
retrieval of the set of concepts relevant for (strongly associated with) stems
corresponding to the document. We describe this calculation with relational
calculus formulae corresponding to SQL queries.
        </p>
        <p>First, the inflected or derived words are reduced to their stem. We create
new table called word doc stemmed – R2 as the result of the join of tables
document word – R and word stem – S.</p>
        <p>R2 (R:doc id;S:stem)(R word.=/word S) (1)
The next task is the calculation of the stem frequency within the documents.
We perform this using one table R2. The term (stem) frequency is calculated as:
(U1:doc id;U1:stem;(U1:cnt=U2:cnt all)!tf)
(R2:doc id;R2:stem;R2:count( )!cnt)(R2)</p>
        <p>./
doc id=doc id
U2</p>
        <p>(R2:doc id;R2:count( )!cnt all)(R2)
R3</p>
        <p>U1
R4
(2)
(3)
The final step is the calculation of the vector of concepts associated with the
document and the strength of the association.</p>
        <p>(R3:doc id;T:concept;assoc)
(assocDESC)
(R3:doc id;T:concept;SUM(R3:tf T:weight)!assoc)(R3 stem.=/stem
T )</p>
        <p>The queries presented above return the complete information, i.e., for each
document they give us the levels of association with each and every concept
in our ontology (knowledge base). This is both unnecessary and unwanted in
practical applications. Empirical experiments show that if we are to present the
results to the user, we shall present no more than top-k most associated concepts,
with k 30. Anything above 30 produces perceptual noise. So, as a last step
in calculation we shall prune the result, leaving only top-30 most associated
concepts in each documents’ representation.</p>
        <p>The Idea behind Query Sharding
As explained in the previous section, we are not interested in calculating all
possible answers to a query. Hence, we propose to split the table and process it
piece-wise. Each piece (shard) would contain information about an object in the
data, such that it can be processed independently from other objects without
distorting the global outcome. Once we have this sharding (task decomposition),
we can distribute calculation among multiple threads on a single (multicore)
processor or among several machines in networked environment.</p>
        <p>The key to success is the knowledge about the composition of and
relationships between the objects. If we possess the information (knowledge) that the
objects are largely independent and can be processed in parallel, each shard
separately. In our current approach this knowledge is derived from the domain by
hand. However, it is imaginable that in the future an automated data
(structure) analysis tool would make it possible to discover rules (criteria) for detecting
situations we discuss here, and implement these rules using database triggers.</p>
        <p>It is crucial to note, that the approach to processing queries using sharding
which we propose, does not require a rewrite of the existing query optimizers.
We propose a rewrite of the large query into a collection of smaller ones, that
can be executed in parallel. We do not interfere with intra-query parallelization
implemented in most RDBMS. Instead we apply a trick, creating a collection of
virtual clients that send very simple queries to the database, instead of processing
one global query that may be very time-consuming to answer.</p>
        <p>
          By running queries for each of the pieces (documents) separately we achieve
additional profit. We are able to process queries that require application of LIMIT
operator within GROUP BY statement. This functionality was added in SQL:1999
and SQL:2003 standards [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] by introducing windowing functions and elements
of procedural languages. Unfortunately, these functionalities are not supported
in most column-based systems, such us Infobright6 and MonetDB7. The ability
to limit processing to top-k objects (documents) only can make a big difference
in execution time, as demonstrated experimentally in Section 3.
1 N := SELECT DISTINCT doc id from TABLE
2 for doc id 2 N do
3 run SELECT ... WHERE DOC ID = doc id in K threads concurrently
4 end
        </p>
        <p>Algorithm 1: Query sharding</p>
        <p>In the case of example presented in Section 2.1 sharding corresponds to
creation a separate query for each of the objects, since we have knowledge that
there is no interference with other objects along the calculation. Objects
correspond to documents, and the boundary of an object can be easily determined by</p>
        <sec id="sec-2-2-1">
          <title>6 http://www.infobright.org/ 7 http://www.monetdb.org/</title>
          <p>detecting the change of id in column doc id. Now, each of the involved queries
presented in the Section 2.1, can be decomposed to a series of simple ones using
the scheme presented in Algoritm 1.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental Verification</title>
      <p>In order to validate the usefulness of the proposed approach we have performed a
series of experiments. In the experiments, we compared the processing of queries
with and without the use of sharding. To have better overview we have included
in the experiment the representatives of three major types of database
technologies:
a) Infobright, which combines column-oriented architecture with Knowledge</p>
      <p>Grid;
b) PostgreSQL8 which represents a row-oriented object-relational architecture
with PL/pgSQL – a procedural language extensions of SQL.
c) MonetDB which represents purely column-oriented database architecture.
The results are summarized in Table 1 and Fig. 2.
8 http://www.postgresql.org/docs/current/static/plpgsql.html</p>
      <p>Due to the fact that in column-oriented architectures that we use it is not
possible to run query with LIMIT within GROUP BY, the comparison with
performance of windowing functions and elements of procedural language (LOOP within
PL/pgSQL) was performed only with PostgreSQL database. The experiments
are based on an external implementation in Java with Apache DBCP9 used for
connection pooling that was required in parallel query processing. In all
experiments we have used tables with the following number of rows: word document
– 370 878 730, word stem – 76 108, stem concept 517 729. All experiments were
done on a server with Intel R Xeon R CPU X5650 @ 2.67GHz - 24 cores, with 64
GB memory and 600 GB SAS 6 GB/s 15 000 RPM disks.</p>
      <p>0
0
5
3
0
0
0
3
0
0
5
2
0
0
5
0
()s 2000
e
itm 00
iton 15
cxeEu 0100
0
0
2
1
0
0
0
1
0
0
8
0
0
2
()s
item 006
n
o
it
u
cxeE 400
0
0
0
0
8
0
0
0
0
7
0
()s 6000
e
itm 00
iton 500
u
ce 0
xE 0004
0
0
0
0
3
5
0
+
e
1
4
0
+
e
8
4
0
()s 6e+
e
tm 4
i
in 0+
to 4e
u
c
e
xE 40
+
e
2
0
0
+
e
0</p>
      <p>No sharding
a) Stemming</p>
      <p>Sharding</p>
      <p>No sharding Sharding
b) Stem frequency</p>
      <p>No sharding Sharding
c) Vector of concepts
calculation. No limit</p>
      <p>No sharding Sharding
d) Vector of concepts
calculation. Limit k=30
The experiments clearly demonstrate that joining query sharding with parallel
execution of sub-tasks has a potential. In some cases, the queries processed with
using query sharding were executed from 3 to 23 times faster (Fig. 2c,d). Also, in
column-oriented databases sharding was the method to get around the problem
with enforcing limit LIMIT inside GROUP BY. The experiments, however, also
demonstrate that specific conditions must be met in order for query sharding to
be beneficial. Two out of four tasks tested experimentally exhibited significant
decrease in performance – from 2:6 to 2:9 times slower (Fig. 2a,b). This is due
to imbalance between the computational overhead created by the parallelization
of the task and the complexity of the task itself. In case of both stemming and
stem frequency calculation the cost of creating virtual clients that ask distributed</p>
      <sec id="sec-3-1">
        <title>9 http://commons.apache.org/dbcp/</title>
        <p>queries was much higher than the collective profit obtained from processing only
simpler queries on smaller data tables.</p>
        <p>There is additional issue that we decided to address by performing additional
experiment. As mentioned before, we have restricted ourselves to top 30 results
because of the kind of objects we are dealing with. However, for other types of
data the specific limit of 30 top results may be meaningless. Therefore, we have
also conducted a smaller experiment, using only the Infobright instance ( part of
the current SONCA implementation) to check how the query execution changes
with the increase of k in top-k limit. With k = 100 the vector of concepts (ESA)
calculation took 29 min. 18.22 s. and for k = 1000 the execution time was 35
min. 25.09 s. We should also mention that good efficiency achieved by MonetDB
on non-sharded, simple queries could not be extended to a sharded case due to
problems that MonetDB has with facilitating multiple queries in parallel.</p>
        <p>The conclusion from the experimental verification is the set of guidelines that
have to be followed in order for the sharding to be effective. These guidelines are
expansion of general ideas stated in Section 2.2. These are:
1. The query to be decomposed must contain a central, complex, agglomerative
task, which involves joins on very large data tables. Typically, we would
decide to use sharding if the auxiliary structures used to store GROUP BY
data exceed the RAM allocation capabilities.
2. Secondly, all arithmetic operations must occur inside the join operation.</p>
        <p>We strongly believe that these guidelines can be used to formulate a set of
rules for automatic query execution tuning in database engines. That is, if certain
conditions are met, the database engine would transform the query processing
from traditional to sharding model. The key to success is the knowledge about
the data structure and purpose, which makes it possible to avoid unnecessary
calculations.</p>
        <p>
          The proposed approach has one more advantage, which was especially valid
for us in the context of our SONCA system. The set of smaller queries obtained
the result of sharding may be executed independently and concurrently. Thanks
to this, we can regulate the number of threads (machines, processors, cores)
involved in the calculation at any given point. Since the results of each sub-query
execution are stored and do not need to be accessed by others, the entire
calculation can be interrupted and then picked up without any loss of information. This
ability is usually hard to achieve in database systems that use multi-threading in
query processing (see [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]). In our implementation we have achieved good control
over load-balancing by performing the scheduling outside of database, using our
own code. However, we strongly believe that a similar result can (and should) be
achieved by implementing sharding inside the database engine. For the moment,
we benefit from query sharding in the SONCA system. It gives us the ability to
plan ahead tasks and perform them with optimal use of computing resources.
This is not so crucial for simpler tasks, such as document processing (stemming,
stem association), which normally take less than an hour, but for finding
semantical relationships between concepts and sections, sentences or snippets it is of
paramount importance, as these calculations may last for several days.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sequeira</surname>
          </string-name>
          , E.:
          <article-title>PubMed Central (PMC): An archive for literature from life sciences journals</article-title>
          . In: McEntyre,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Ostell</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>The NCBI Handbook, chap. 9</source>
          . National Center for Biotechnology Information,
          <string-name>
            <surname>Bethesda</surname>
          </string-name>
          (
          <year>2003</year>
          ), http: //www.ncbi.nlm.nih.gov/books/NBK21087/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bembenik</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Skonieczny, Ł.,
          <string-name>
            <surname>Rybiński</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niezgódka</surname>
          </string-name>
          , M. (eds.):
          <article-title>Intelligent Tools for Building a Scientific Information Platform</article-title>
          ,
          <source>Studies in Computational Intelligence</source>
          , vol.
          <volume>390</volume>
          . Springer, Berlin / Heidelberg (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Eisenberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melton</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kulkarni</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michels</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zemke</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>SQL:2003 has been published</article-title>
          .
          <source>SIGMOD Rec</source>
          .
          <volume>33</volume>
          (
          <issue>1</issue>
          ),
          <fpage>119</fpage>
          -
          <lpage>126</lpage>
          (
          <year>Mar 2004</year>
          ), http://doi.acm.
          <source>org/ 10</source>
          .1145/974121.974142
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gabrilovich</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markovitch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Computing semantic relatedness using Wikipediabased explicit semantic analysis</article-title>
          .
          <source>In: Proceedings of the 20th International Joint Conference on Artificial Intelligence</source>
          . pp.
          <fpage>6</fpage>
          -
          <lpage>12</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Janusz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Świeboda</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krasuski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H.S.:</given-names>
          </string-name>
          <article-title>Interactive document indexing method based on explicit semantic analysis</article-title>
          .
          <source>In: Proceedings of the Joint Rough Sets Symposium (JRS</source>
          <year>2012</year>
          ), Chengdu, China,
          <source>August 17-20</source>
          ,
          <year>2012</year>
          . Lecture Notes in Computer Science, Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kowalski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Śle¸zak,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Stencel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Pardel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Grzegorowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kijowski</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>RDBMS model for scientific articles analytics</article-title>
          .
          <source>In: Bembenik et al. [2], chap. 4</source>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Triantafillou</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>KLEE: a framework for distributed topk query algorithms</article-title>
          .
          <source>In: Proceedings of the 31st international conference on Very large data bases</source>
          . pp.
          <fpage>637</fpage>
          -
          <lpage>648</lpage>
          . VLDB '05,
          <string-name>
            <given-names>VLDB</given-names>
            <surname>Endowment</surname>
          </string-name>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H.S.:</given-names>
          </string-name>
          <article-title>On designing the SONCA system</article-title>
          .
          <source>In: Bembenik et al. [2], chap. 2</source>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>35</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H.S.</given-names>
          </string-name>
          , Śle¸zak,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Skowron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Bazan</surname>
          </string-name>
          , J.:
          <article-title>Semantic search and analytics over large repository of scientific articles</article-title>
          .
          <source>In: Bembenik et al. [2], chap. 1</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pankratius</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heneka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Parallel SQL query auto-tuning on multicore</article-title>
          .
          <source>Karlsruhe Reports in Informatics 2011-5</source>
          , Karlsruhe Institute of Technology,
          <source>Faculty of Informatics</source>
          (
          <year>2011</year>
          ), http://digbib.ubka.uni-karlsruhe.de/volltexte/ documents/1978109
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Śle¸zak,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Janusz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Świeboda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.S.</given-names>
            ,
            <surname>Bazan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.G.</given-names>
            ,
            <surname>Skowron</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Semantic analytics of PubMed content</article-title>
          . In: Holzinger,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Simonic</surname>
          </string-name>
          , K.M. (eds.)
          <article-title>Information Quality in e-Health - 7th Conference of the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society</article-title>
          , USAB 2011, Graz, Austria,
          <source>November 25-26</source>
          ,
          <year>2011</year>
          .
          <source>Proceedings. Lecture Notes in Computer Science</source>
          , vol.
          <volume>7058</volume>
          , pp.
          <fpage>63</fpage>
          -
          <lpage>74</lpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Szczuka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janusz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herba</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Clustering of rough set related documents with use of knowledge from DBpedia</article-title>
          .
          <source>In: Proceedings of the 6th International Conference on Rough Sets and Knowledge Technology (RSKT</source>
          <year>2011</year>
          ), Banff, Canada, October 9-
          <issue>12</issue>
          ,
          <year>2011</year>
          . Lecture Notes in Computer Science, vol.
          <volume>6954</volume>
          , pp.
          <fpage>394</fpage>
          -
          <lpage>403</lpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>