<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CCoommppaarriissoonn ooff NNaattiivvee XXMMLL DDaattaabbaasseess aanndd EExxppeerriimmeennttiinngg wwiitthh IINNEEXX</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petr Kol´aˇr</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Loupal Petr Kola´ˇr</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Loupal Dept. of Computer Science</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Engineering Dept.FoEf EC</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>oCnz´aemchˇesTt´ıec</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ivPerrashitay</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karlovo n´aCmzeˇecsht´ı R</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Praha</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>kolarp</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@fel.cvCuzte.cchz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Relpouubplaiclp@fel.cvut.cz kolarp</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@fel.cvut.cz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>loupalp@fel.cvut.cz</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>1995</year>
      </pub-date>
      <fpage>116</fpage>
      <lpage>119</lpage>
      <abstract>
        <p>The aim of the article is to summarize and compare approaches of design and architecture of native XML databases. We discuss our results accomplished by utilizing the INEX data set in two open source database systems - eXist and Apache Xindice. There is also a bas ic performance comparison outlined as a basis for discussion about suitability for particular database system and for our consecutive experiments.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Comparison of Exist and Xindice XML Native
Database
Due to limited space we mention only basic attributes and features of two
database systems in following table. In our work we consider Xindice XML
database version 1.0 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and eXist XML database version 1.0-dev-20060124 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
We would like also test Timber or Sedna database, but we decided not to test
these databases. Both Timber and Sedna database accept only load of one XML
document into database container.
      </p>
      <p>Experiments, basic performance comparison</p>
      <p>INEX Dataset
For our experiments we use the INEX XML data set. The INEX data set (we use
version 1.4) has 536MB of XML data. It is exactly 12,107 articles from 6 IEEE
transactions and 12 journals from years 1995 to 2002. Pictures are not included
– data set consists only of XML formated text.</p>
      <p>Data set is organized in a file structure. Root directory consists of two
subdirectories – dtd (holds structure information - DTD specification article element)
and xml. Each journal/transaction has its own two-letter named subdirectory
inside xml directory. Journal/transaction is further divided into the directories
by the year of publication. Finally each article is stored in an individual xml file,
which name consists of a letter followed by four-digit number and xml suffix.</p>
      <p>
        In average each article contains 1,532 XML nodes, where the average depth
of node is 6.9. See [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for detailed characteristics of data set.
3.2
      </p>
      <p>
        XPath
XPath [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] is a language for finding information in an XML document –
navigating through elements and attributes in an XML document. XPath is a major
      </p>
      <p>CPoemtrpaKrioslo´anˇr,oPf aNvealtiLvoeuXpMalL Databases and Experimenting with INEX
element in the W3C’s XSLT standard - and XQuery and XPointer are both built
on XPath expressions. So an understanding of XPath is fundamental to a lot of
advanced XML usage.</p>
      <p>We prepared set of XPath queries in following categories:
Selecting nodes. XPath uses path expressions to select nodes in an XML
document – e.g. /article or /article/f m/hdr/hdr1/crt/issn. Queries 1 to 3 in
Table 1.</p>
      <p>
        Predicates. Predicates are used to find a specific node or a node that
contains a specific value. Predicates are always embedded in square bracket. E.g.
/article/bdy/sec[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or /article/bdy/sec[position() &lt; 3]. Queries 4 to 11 in
Table 1.
      </p>
      <p>Selecting Unknown Nodes. XPath wildcards can be used to select unknown XML
elements – e.g. / ∗ / ∗ [@∗]. Queries 12 to 14 in Table 1.</p>
      <p>Selecting Several Paths. By using the | operator in an XPath expression we can
select several paths – e.g. //article/f m/hdr| //article/bdy/sec. See queries 15
and 16 in Table 1.
4</p>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>We measured duration time of each query five times. Then we discarded the
largest and the smallest value and counted arithmetic mean.</p>
      <p>The time needed to load INEX data set into database was 25 minutes for
Xindice and 97 minutes for eXist. The data on filesystem took 600 MB for
Xindice and 1300 MB for eXist. Our hardware configuration was based on a
personal computer with Intel Celeron 1.7 Ghz processor, 512MB RAM and Windows
XP(SP2) operating system. INEX XML data set in version 2003 (1.4). Detailed
information about the data set and its structure is shown in Section 3.1.
4.1</p>
      <p>Summary
Our results do not meet our expectations – Xindice has totally failed in our
experiments. With regard to our results this database system is impracticable
for more extensive XML data sets. Althought we tried to create indices for all
elements and attributes but without any significant improvement.</p>
      <p>Most of XPath queries running over Xindice returned an empty result set
– it seems that Xindice does not fully support the XPath 1.0 specification but
only its limited subset. On the contrary, eXist showed much better behavior.
This can be induced by its automatically generated structural index that is very
efficient. eXist has also an user friendly GUI for both database management and
ad-hoc query processing.
=′
=′</p>
      <p>Q uery duration time [s]
Records retrieved eXist Xindice
12104 1,3 230
11666 2,2 98
11666 1,3 447
11955 1,9 NA
11955 5,6 NA
11019 5,8 NA
22974 8,1 NA</p>
      <p>868 1,0 more than 10 min
108496 81,3 NA
1623 2,6 NA
72</p>
      <p>4,0
The aim of our experiment – to test some of native XML databases and perform
basic performance comparison – was in principle not successful. We were not able
to import the INEX data set into all proposed native XML databases. Therefore
we carried out only basic tests for the eXist and Xindice databases. Our results
show that for further experiments we should consider only the eXist database.
Xindice can be used just as an example of a basic native XML database.</p>
      <p>We would like to perform further comparisons among other native XML
databases. Also, we plan to add some of non-native (or hybrid) XML databases.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Apache</given-names>
            <surname>Xindice - Native</surname>
          </string-name>
          <string-name>
            <surname>XML</surname>
          </string-name>
          database. http://xml.apache.org/xindice.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>2. eXist Native XML database</article-title>
          . http://exist.sourceforge.net/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Chamberlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Berglund</surname>
          </string-name>
          , and
          <string-name>
            <given-names>e. a. Scott</given-names>
            <surname>Boag. XML Path</surname>
          </string-name>
          <article-title>Language (XPath) 2</article-title>
          .0,
          <year>September 2005</year>
          . http://www.w3.org/TR/xpath20/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>DeRose. XML Path</surname>
          </string-name>
          <article-title>Language (XPath) 1</article-title>
          .0,
          <string-name>
            <surname>November</surname>
          </string-name>
          <year>1999</year>
          . http://www.w3.org/TR/xpath.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fuhr</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gvert</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lalmas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Initiative for the evaluation of xml retrieval (INEX</article-title>
          ),
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>