<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SSyyllllaabbllee--bbaasseedd Ccoommpprreessssiioonn ffoorr XXMMLL dDooccuummeennttss</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>ttssii</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>rryynn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>rrnniikk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>nn LL´</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>nnsskky´y´</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>oo GG</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ooˇˇss</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University, Faculty of Mathematics and Physics Malostransk ́e n ́am.</institution>
          <addr-line>25, 118 00 Praha 1, Czech</addr-line>
        </aff>
      </contrib-group>
      <fpage>21</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>Syllable-based compression achieves sufficiently good results on text documents of a medium size. Since the majority of XML documents are of that size, we suppose that the syllable-based method can give good results on XML documents, especially on documents that have a simple structure (small amount of elements and attributes) and relatively long character data content. In this paper we propose two syllable-based compression methods for XML documents. The first method, XMLSyl, replaces XML tokens (element tags and attributes) by special codes in input document and then compresses this document using a syllable-based method. The second method, XMillSyl, incorporates syllable-based compression into the existing method for XML compression XMill. XMLSyl and XMillSyl are compared with a non-XML syllable-based method and with other existing method for XML compression.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Syllable-based compression method</title>
      <p>
        Syllable-based compression [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is the method where compression is performed
at the syllable level. There are two syllable-based compressors. The first one is
syllable-based LZW, and the second one is syllable-based Huffman.
      </p>
      <p>
        Algorithm LZW [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is a dictionary compression character-based method.
The syllable-based version is called LZWL. In the initialization step, the syllable
dictionary is filled with empty syllable and syllables from a database of frequent
syllables. The following steps are similar with character-based version of LZW,
but LZWL works over an alphabet of syllables.
      </p>
      <p>The second syllable-based compression method is called HuffSyllable. It is a
statistical compression method based on the adaptive Huffman coding. For our
purposes, we use only LZWL syllable-based compression method. Adaptation of
HuffSyllable for XML compression gave worse results than LZWL.
3</p>
    </sec>
    <sec id="sec-3">
      <title>XMLSyl</title>
      <p>Our goal was to modify the syllable-compression method to compress XML
documents efficiently. We attempted to modify existing syllable-based compressor
so, that it treats XML tokens (element tags and attributes) as single syllables
instead of decomposing them into many syllables. There were two possibilities
to compel the syllable-based compressor to treat XML tokens as syllables:
1. Modify parser used in the syllable-based tool and combine it with an XML
parser, so that it can recognize XML tokens and treat them as a single
syllable.
2. Replace XML tokens with bytes in the input document and then compress
such a document with an existing syllable-based tool.</p>
      <p>
        We decided to implement the second way because this implementation allows
us to make some future improvements easily. For example, we may compel the
syllable-based compressor to assign codes with minimal length to XML tokens
by adding this single bytes to the syllable dictionary [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This improvement
is impossible in the first variant. The encoding of XML tokens is inspired by
existing XML compression methods like XMLPPM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], XGrind [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], XPress [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
XMill [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
3.1
      </p>
      <p>Architecture and principles of XMLSyl
The architecture of XMLSyl is shown in Figure 1. It has four major modules: the
SAX Parser, the Structure Encoder, the Containers and the Syllable Compressor.
First, the XML document is sent to the SAX Parser. Next the parser decomposes
document into SAX events (start-tags, end-tags, data items, comments and etc.)
and forwards them to the Structure Encoder.</p>
      <p>The Structure Encoder encodes the SAX events and routes them to the
different Containers. There are three containers in our implementation:
XML Document</p>
      <sec id="sec-3-1">
        <title>SAX Parser</title>
      </sec>
      <sec id="sec-3-2">
        <title>Structure Encoder</title>
        <p>Element Container</p>
        <p>Attribute Container Data and Structure Container
Syllable Compressor</p>
        <p>Syllable Compressor</p>
        <p>Syllable Compressor</p>
        <p>Compressed XML document
1. Element Container: The Element Container stores the names of all
elements that occur in an XML document. The Structure Encoder also uses
the Element Container as the dictionary for encoding XML structure.
2. Attribute Container: The Attribute Container stores the names of all
attributes which occur in an XML document. The Structure Encoder also
uses the Attribute Container as the dictionary for encoding XML structure.
3. Structure and Data Container: The Structure and Data Container stores
an XML document, in which all meta-data are replaced with special codes.
The encoding process is presented in section 3.2.</p>
        <p>When a document is parsed and separated into the containers completely,
the contents of the containers are sent to the Syllable Compressor. It compresses
the content of each container separately using syllable-based compression and
sends the result to the output.</p>
        <p>
          We have not written the SAX parser by ourselves, rather we have used the
Expat parser[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] which is an open-source SAX parser written in C.
3.2
        </p>
        <p>Encoding the structure of XML document
The structure of XML document is encoded in XMLSyl as follows. Whenever a
new element or attribute is encountered, its name is sent to the dictionary and the
index of the element is sent to the Data and Structure Container. Two different
dictionaries are used for attributes and elements: the Element Dictionary and
the Attribute Dictionary. The Attribute Container operates as the Attribute
Dictionary and the Element Container as the Element Dictionary. Whenever
an end tag is encountered a token END_TAG is sent to the Data and Structure
container. Whenever a character sequence is encountered, it is sent to the Data
and Structure Container without changes. Start and end of character sequences
are indicated by special tokens. We distinguish four different character sequences:
value of attribute, value of element, comment, and white spaces between tags, if
white spaces are preserved.</p>
        <p>To illustrate the encoding process, consider the encoding of the following
small XML document:
&lt;book&gt;
&lt;title lang="en"&gt;XML&lt;/title&gt;
&lt;author&gt;Brown&lt;/author&gt;
&lt;author&gt;Smith&lt;/author&gt;
&lt;price currency="EURO"&gt;49&lt;/price&gt;
&lt;/book&gt;
&lt;!-- Comment--&gt;
First, the XML document is converted into a corresponding stream of SAX
events:
startElement("book")
startElement("title",("lang","en"))
characters("XML")
endElement("title")
startElement("author")
characters("Smith")
endElement("author")
startElement("author")
characters("Brown")
endElement("author")
startElement("price","currency","EURO")
characters("49")
endElement("price")
endElement("book")
comment("Comment")</p>
        <p>The tokens in the SAX event stream are sent to the Structure Encoder.
It encodes them and sends them to their corresponding containers. When the
book start element token is encountered, the string book is sent to the Element
Container since this element name was not encountered before. An index E0 is
assigned to this entry. This index is sent to the Data and Structure Container.
The same operation is executed for title start element. String title is sent to The
Element Container and an index E1 is assigned to it. The index E1 is sent to
the Data and Structure Container. The element title has the attribute lang. The
attribute name is sent to the Attribute Container and the index A0 is assigned
to it. The index A0 is sent to the Data and Structure Container. Then attribute
value ”en” is sent without modification to the Data and Structure Container.
The ”en” attribute is followed by the token END_ATT, that signals the end of the
attribute value. When an element value such as ”XML” is encountered, the token
CHAR, signaling the beginning of character sequence, the data value and then the
token END_CHAR are all sent to the Data and Structure Container. Finally, all
the end tags are replaced by the token END_TAG. When a comment event is
encountered, the code CMNT is put into the Data and Structure Container. The
comment is also sent to the container and is enclosed by END_CMNT code. The
final state of all containers is shown in Figure 2.</p>
        <sec id="sec-3-2-1">
          <title>Element Container</title>
          <p>element index
book E0
title E1
author E2
price E3</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Attribute Container attribute index lang A0 currency A1</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>Data and Structure Container</title>
          <p>&lt;book&gt; &lt;title lang="en"&gt; XML &lt;/title&gt; &lt;author&gt;
E0 E1 A0 en END_ATT CHAR XML END_CHAR END_TAG E2
Brown &lt;/author&gt; &lt;author&gt; Smith &lt;/author&gt; &lt;price
CHAR Brown END_CHAR END_TAG E2 CHAR Smith END_CHAR END_TAG E3
currency="EURO"&gt; 49 &lt;/price&gt; &lt;/book&gt; &lt;!--Comment--&gt;
A1 Euro END_ATT CHAR 49 END_CHAR END_TAG END_TAG CMNT Comment END_CMNT</p>
          <p>In this example we have ignored white spaces between tags, e.g. &lt;book&gt; and
&lt;title&gt;, so the decompressor then produces a standard indentation. Optionally,
XMLSyl can preserve the white spaces. In that case, it stores the white spaces as
the sequence of characters in the Data and Structure Container between tokens
WS and END_WS.
3.3</p>
          <p>Containers
The containers are the basic units for grouping XML data. The Attribute
Container holds attribute names and the Element Container holds element names.
As long as the number of all element and attribute names in any XML
document is not high, this two containers are kept in main memory. During parsing,
the containers size increases as the container is filled with entries. Each entry
in the Element container is assigned a byte in the range 00-A9. These bytes
are used for encoding the element names. Each entry in the Attribute container
is assigned a byte in the range AA-F9. These bytes are used for encoding the
attribute names. The residual 6 bytes are reserved for special codes like CHAR,
END_TAG etc. In most cases, 170 (or 80) bytes are enough to encode element (or
attribute) names. If the number of elements (or attributes) are greater than 170
(or 80), entries are encoded with two bytes, then tree and so on.</p>
          <p>There is another situation with The Data and Structure Container. We do not
know the size of the input XML document. The size of XML document can be
so big, that document will not fit into memory, and it is not possible to increase
the size of container endlessly. Therefore, the container consists of two memory
block of constant size. The content of the first memory block is compressed, as
soon as the container is filled. We don’t compress two blocks at once, because
the context of the second memory block is used for compression of the first one.
After the compression, the compressed content of the first block is sent to the
output and the first block swaps its purpose with the second one. Now the first
block is filled with data. When it is full, the second block is compressed, and so
on.
3.4</p>
          <p>The Syllable Compressor
The Syllable Compressor compresses the Structure and Data Container first
and sends the output to the output file. Then the Attribute Containers are
compressed and sent to the output file and finally the same happens with the
Element Container. LZWL is used for the compression of data. HuffSyll could
be also chosen, but the performance is worse, so we decided to use only LZWL.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>XMillSyl</title>
      <p>
        This chapter introduces our second syllable-based XML compressor, XMillSyl.
This second method incorporates syllable-based compression with the existing
method for XML compression of XMill [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. XMill has two main principles in
order to optimize XML compression:
– separating structure from data content, and
– grouping Data values with related semantics in the same ”container”.
Each data container is then compressed individually with gzip [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. In XMillSyl,
containers are compressed with LZWL.
      </p>
      <p>We do not suppose that XMillSyl method gives better results than XMill
because gzip compression performs better than LZWL. We have implemented
XMillSyl in order to compare the power of XMLSyl with the power of two main
principles of XMill.
4.1</p>
      <p>Implementation
We did not write XMill compressor. We decided to use existing sources of XMill.</p>
      <p>XMill operates as follows: a SAX parser parses the XML file and the SAX
events are sent to the core module of the XMill called the path processor. It
determines how to map tokens to containers: element tag names and attribute
names are encoded and sent to the structure container, while the data values
are sent to various data containers, according to their semantic. Finally, the
containers are gzipped independently and stored on disk.</p>
      <p>We have modified compression and decompression functions (operating on
containers) in the way they compress and decompress the data containers with
Input XML file</p>
      <sec id="sec-4-1">
        <title>SAX Parser</title>
      </sec>
      <sec id="sec-4-2">
        <title>Path Processor</title>
        <p>Large
Data Container k</p>
      </sec>
      <sec id="sec-4-3">
        <title>GZip</title>
      </sec>
      <sec id="sec-4-4">
        <title>GZip</title>
      </sec>
      <sec id="sec-4-5">
        <title>LZWL</title>
      </sec>
      <sec id="sec-4-6">
        <title>LZWL</title>
        <p>Compressed XML file
the syllable-based method (see Figure 3). Moreover we have modified the
syllablebased method so that it can work with the containers of XMill implementation
instead of a file stream.</p>
        <p>XMillSyl discerns the difference between small and large containers. Since
LZWL is not suitable for extremely small data, the small containers are
compressed with gzip. The structure container is also gzipped in XMillSyl. The large
containers are compressed with LZWL.</p>
        <p>elts
pcc
stats
tal
tpc</p>
        <p>V set2 Murkup menshe chem 50% I harakter dannych tekstovyj=&gt;
pokazyvajet horoshije rezultaty.
5.1</p>
        <p>XML data sources
XMLSyl and XMillSyl were tested on two data sets that cover a wide range
of XML data formats and structures. The first data set is shown in Table 1.
It contains English XML documents with different inner structure. It includes
regular data that has regular markup and short character data content (elts,
stats, weblog, tpc). It also includes irregular data, that has irregular markup
(pcc, tall).</p>
        <p>The second data set is shown in Table 2. It contains textual XML documents
of simple structure with long character data content. It contains five stage plays
marked up as XML, four in English and one in Czech. It also contains data in
DocBook format in Czech and in English.</p>
        <p>
          Some dataSiwzeas dLiasntgribuDteesdcripwtiointh the XMLPPM [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and the Exalt [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
compressoerltss while o1t0h3e91r9sEwngelisrhe fPoeurionddic taobnle oIfnthteerenlemeten[ts1i5n]X,M[L16]. All Czech documents use
Windopcwc s-12502e60n0c2o57dEinngglis.h Formal proofs transformed to XML
stats 869059 English One year statistics if baseball players
tal 1364576 English Safe-annotated assembly language converted to XML
tpc 313193 EnglishTTahbe lXeML2r.epTrehseentsaetiocnoonfdthedTaPtCa_Dsebte.nchmark database.
        </p>
        <p>CRLZWL CRXmill CRXMillSyl CRFXMillSyll CRXMLSyl CRFXMLSyl
errors 1,98 1,83 2,00 1,09 1,83 1,00
XMillSyhlalmolert XMLSyl 1w,9i6th r1e,s9p1ect to 2X,0M0ill. Th1e,05compr1e,s8s5ion rat0i,o97factor is
defined aanstofnoyllows: 1,05</p>
        <p>1,84 1,79 1,88 1,69 0,94
ch00 3,28 C2R,6F9XSyl =13,,80CC90RRXXMSyilll11.,,0152 2,88 1,07
much_ado 1,88 1,80 1,77 0,98
ch01 2,69 2,20 2,43 1,10 2,46 1,12
ch02 1,76 1,43 1,70 1,19 1,57 1,10
ch03 2,90 1,87 2,70 1,44 2,08 1,11
ch04 2,09 1,66 1,78 1,07 1,83 1,10
5.3 Ecxhp05erimental 2R,2e8sult1s,81 2,03 1,12 2,04 1,13
glossary 2,07 1,64 1,84 1,12 1,89 1,15
The comhopwrteossion ratio6,s6t9atist2i,c3s0 of two2,5s0ets of X1M,09L docu2,m59ents ar1e,13shown in
Table 3halenddanTiable 4. 3,79 3,13 3,62 1,16 3,40 1,09</p>
        <p>The ksoymlluanbiklea-cbeased 3m,2e5thod2,p6e5rforme2d,9w3orse on1d,1o1cumen3t,0s1from t1h,e14first data
set. On ntahveihaocteher hand3,,79both3X,14MLSyl 3a,6n8d XMill1S,1y7l show3s,4g4reat im1,p10rovement
comparirnobgotto LZWL. 3T,4h3ey c2o,m86pressed3,2th2e input1,t1o3 50-603,%04 of the1,s0i6ze of the
xml 3,74 3,23 3,69 1,14 3,30 1,02
compresrsuer1d file with L2Z,3W3 L. 2,07 2,37 1,14 2,15 1,04</p>
        <p>On XAvMerLagdeocumen2t,s88of th2,e22second2d,5a1ta set, 1L,1Z3WL p2r,o3v8ides a1,r0e7asonably
good compression ratio - on the average, about two-thirds that of XMill. This
confirmschour prediction1,,84that 1s,y61llable-b1a,s7e8d comp1r,e1s1sion is1,7e0ffective1,f0o6r textu1a,1l1
XML dboocoukmsents. Mor1e,7o1ver o1,u7r9 compr1e,7s5sion me0t,h9o8ds sh1o,w66 even 0g,9re3ater
imch+books 1,80 1,74 1,76 1,01 1,72 0,99
provement.</p>
        <p>3,13 2,63 2,81 1,07 2,93 1,11 0,935943</p>
        <p>On the document o2f,8t3he s2e,c3o2nd dat2a,5s1et, XMi1l,l0S8yl ach2ie,6v0es abo1u,t12150%,92a4n3d03
XMLSyl is about 20% b2,e7t8ter c2o,2m8pressio2n,4r7atio tha1n,0L8ZWL2.,C57ompar1e,d13to 0X,9M23i0ll7,7
both methods perform2s,l5i8ghtly2,w14orse. X M2,3il0lSyl com1,p0r7esses a2b,4o0ut 13%1,1a2nd0X,9M30L43-5
Syl about 7% worse th2a,4n9XM2i,l1l.5 2,32 1,08 2,34 1,09 0,926724
Figure 4 shows the2v,4a0riati2o,n07of the 2c,o2m2pressio1n,07ratio a2s,2a5 funct1io,0n9 of0,X93M24L32
2,30 1,97 2,17 1,10 2,15 1,09 0,907834
data size for ”DocBoo2k,2:1 The1,9D0efinitiv2e,08Guide”.1T,0h9e com2p,0re6ssion 1w,0a8s r0u,9n13o4n62
several subsets. On sm2a,1ll7files1,X89MillSyl2,p1e0rforms 1b,1e1tter th2a,0n3XMLS1y,0l7. The ex0-,9
planation is, that the2d,0a7ta a1r,e80split in2t,0o1 many 1sm,12all con1t,9a3iners i1n,0X7M0i,8ll9S5y5l2,2
which are compressed 1w,9i8th g1z,i7p3 (gzip 1o,u9t3perform1s,12LZWL1,,8e4special1l,y06on0,s8m96a3l7l3
data). On middle-sized1,9a2nd l1a,r6g8e files 1X,8M8 LSyl o1u,t1p2erform1,s79XMillS1y,0l7. W0,e89c3a6n17
1,89 1,65 1,85 1,12 1,76 1,07 0,891892
observe that the bigge1r,8s8ize a1ls,6o4implies1,a83better c1o,1m2 press1io,7n4. 1,06 0,896175
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this work we introduced syllable-based compression tools for XML documents
called XMLSyl and XMillSyl. We presented the architecture and implementation</p>
      <p>Katsiaryna CChReLZrWnLik,CJRaXnmiLll´aCnsRkXy´M,illSLyelo GCRalFaXmMibllSoyˇlsl CRXMLSyl
elts 1,04 0,47 0,54 1,15 0,72
pcc 0,22 0,02 0,03 1,50 0,04
stats 0,67 0,33 0,40 1,21 0,39
tal 0,36 0,09 0,12 1,33 0,15
tpc 1,82 Ta1b,l0e5 4. The1,fi5r4st data s1e,4t7. 1,60
Average 0,82 0,39 0,53 1,33 0,58</p>
      <p>CRLZWL
1,98
1,96
1,84
1,88
3,28
2,69
1,76
2,90
2,09
2,28
2,07
6,69
3,79
3,25
3,79
3,43
3,74
2,33
2,88
Fig. 4. Compression ratio under different sizes.
of our tools and tested their performance on a variety of XML documents. In our
experiments, XMLSyl and XMillSyl were compared with LZWL and XMill. Both
methods are more suitable for textual XML documents. XMill outperformed
our methods only marginally. XMLSyl performs better than XMillSyl. It implies
that in our case encoding of XML structure is more efficient than separating a
structure from data and grouping data values with related meaning. XMillSyl
and XMLSyl show better results for Czech language.</p>
      <p>In the future, we want implement some modifications to enhance the
compression ratio. For example, the information in the DTD section can be extracted
and utilized to create a special syllable dictionary for elements and attributes.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Wilfred</given-names>
            <surname>Ng</surname>
          </string-name>
          , Lam Wai, Yeung James Cheng.
          <article-title>Comparative Analysis of XML Compression Technologies</article-title>
          .
          <source>World Wide Web Journal</source>
          ,
          <year>2005</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Smitha</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Nair. XML Compression</surname>
          </string-name>
          <article-title>Techniques: A Survey. www</article-title>
          .cs.uiowa.edu/~rlawrenc/research/Students/SN_04_XMLCompress.pdf
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.</given-names>
            <surname>Cheney</surname>
          </string-name>
          .
          <source>Compressing XML with Multiplexed Hierarchical PPM Models In Proc. Data Compression Conference</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>V.</given-names>
            <surname>Toman</surname>
          </string-name>
          .
          <article-title>Compression of XML Data</article-title>
          .
          <source>MFF UK</source>
          ,
          <year>2003</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. World Wide Web Consorcium.
          <article-title>Extensive Markup Language (XML) 1.0</article-title>
          . http://www.w3.org/XML/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P.</given-names>
            <surname>Tolani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Haritsa</surname>
          </string-name>
          .
          <article-title>XGrind: A Query-friendly XML Compressor</article-title>
          .
          <source>In Proc. IEEE International Conference on Data Engineering</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. SAX:
          <article-title>A Simple API for XML</article-title>
          . http://www.saxproject.org
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>H.</given-names>
            <surname>Liefke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <article-title>XMill: an Efficient Compressor for XML Data</article-title>
          .
          <source>In Proc. ACM SIGMOD Conference</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jun-Ki</surname>
            <given-names>Min</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Myung-Jae</surname>
            <given-names>Park</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chin-Wan</surname>
            <given-names>Chung</given-names>
          </string-name>
          ,
          <article-title>XPRESS: A Queriable Compression for XML Data SIGMOD 2003</article-title>
          , June 912,
          <year>2003</year>
          , San Diego, CA,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Expat</surname>
            <given-names>XML</given-names>
          </string-name>
          Parser. http://expat.sourceforge.net
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Welch</surname>
          </string-name>
          .
          <article-title>A technique for high performance data compression</article-title>
          .
          <source>IEEE Computer</source>
          ,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>J. Lansky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Zemlicka</surname>
          </string-name>
          . Text Compression: Syllables. DATESO,
          <year>2005</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>J. Lansky</surname>
          </string-name>
          ,
          <article-title>Slabikov´a komprese</article-title>
          .
          <source>MFF UK</source>
          ,
          <year>2005</year>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>V.</given-names>
            <surname>Toman</surname>
          </string-name>
          .
          <article-title>Komprese XML dat</article-title>
          . http://kocour.ms.mff.cuni.cz/~mlynkova/prg036/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>J.</given-names>
            <surname>Kosek</surname>
          </string-name>
          .
          <article-title>Inteligentn´ı podpora navigace na WWW s vyuˇzit´ım XML</article-title>
          . http://www.kosek.cz/diplomka/,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>16. DocBook http://www.docbook.org/</mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>17. A Quick Introduction to XML. http://www.cellml.org/tutorial/xml_guide</mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>M.</given-names>
            <surname>Pilgrim</surname>
          </string-name>
          . What Is RSS. http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>XML</given-names>
            <surname>Processing</surname>
          </string-name>
          . http://diveintopython.org/xml_processing/
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <article-title>SAX And DOM Overview</article-title>
          . http://www.jezuk.co.uk/cgi-bin/view/arabica/SAXandDOMIntro
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <article-title>The gzip home page</article-title>
          . http://www.gzip.org/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>