<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Classifying Elements for XML Query Transformation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>c Ke Geng</string-name>
          <email>ke@cs.auckland.ac.nz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Proceedings of the Spring Young Researcher's Colloquium on Database and Information Systems</institution>
          ,
          <addr-line>Moscow, Russia, 2006</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Auckland</institution>
          ,
          <country country="NZ">New Zealand</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Research into XML query transformation has become important with the increased use of XML. Currently the research into XML query transformation concentrates on transformation based on structure of the XML document. In this paper, we will introduce our research which transforms queries using the classification of elements of XML documents. The experiments have shown that our method improves query execution dramatically.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Query transformation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is an important branch
of semantic query optimization [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Some research
into XML query transformation has been carried out
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], but this research concentrates on transforming
XML queries based on the structure of the XML
document and there is little research into transforming XML
queries based on the content of the data. This is because
XML documents are too complex and the complexity
limits the implementation of existing analysis
technologies to glean information from the content of the XML
document.
      </p>
      <p>
        In this paper, we introduce our research into
transforming XML queries based on the data in XML documents.
The aim of our research is to improve query execution
by transforming the queries based on the elements’
classification. Elements in the queried XML document will
be analyzed and their value distribution characteristics
will be abstracted. Then classifications will be generated
based on the elements’ value distribution charactestics.
With the information of the elements’ classification, the
input XML queries will be evaluated and transformed to
a query, which will return the same results as the
original query but can be executed faster. A series of
experiments have been carried out and the results show that
query execution can be improved dramatically with our
method. Our early contributions are: First, we introduce
a method to transform XML queries based on the content
of the XML document [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Second, we introduce
possible query transformations in our method [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Third, we
present experimental results to examine the practicality
of transforming XML queries based on element
classification and study the influence of query transformation on
query execution [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Based on these contributions, the
novel contributions in this papaer are: First, we discuss
the challenges of XML data analysis and the possible
solution. Second, we propose a possible XML data
analysis method. Previously, we assumed a technique existed
that analyses the XML data and classifies the elements.
In this paper , we propose that XMeans can be used to do
the classification.
      </p>
      <p>The remainder of this paper is structured as follows.
Background information is introduced in section 2. Our
query transformation method is introduced in section 3.
Section 4 introduces the query transformations in our
method. In section 5, we introduce the experiments and
analyze the experimental results. In section 6, we discuss
related work. Section 7 draws conclusions and discusses
the problems that we are working on.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>In this section, we discuss the existing semantic query
optimization methods in XML and the challenges that
arise in XML analysis.</p>
      <p>
        XML semantic query optimization can be classified into
two main groups: optimization based on the structure of
the document, and optimization based on the content of
the document. Optimization based on the structure of
the document uses information that is found in the
document’s schema to optimize XML query processing. In
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], an ordered graph schema is introduced. Input queries
are evaluated against the schema and query conditions
that will not influence the final result are eliminated.
Another implementation of optimization based on
structure is outlined in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which checks target XML
documents against the schema and skips unnecessary
computations. The authors in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] introduce methods to
transform XPath queries based on unique location path
constraints abstracted from the XML Schema.
      </p>
      <p>
        Optimizations based on the content use the
characteristics of values of elements to improve XML query
execution. Currently most of this kind of research
concentrates on statistical information. An example of this kind
of optimization is implemented in Lore [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which uses
statistical information of elements to improve query
execution by rearranging the sub query execution order.
Pattern trees [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are used in another important
implementation of query optimization based on statistics.
How to analyse XML documents and abstract element
characteristics is an obstacle to transforming queries
based on the content of XML. This obstacle arises
because XML is very flexible and complex, limiting the
existing XML analysis:
1. the structure of XML documents is unpredictable
and users can compose their XML documents
using any arbitrary structure. Also, the same
content can be expressed in various ways, such as
“&lt; name &gt; J ohnGoldberg &lt; /name &gt;”
and “&lt; name &gt;&lt; f irstN ame &gt; J ohn &lt;
/f irstN ame &gt;&lt; lastN ame &gt; Goldberg &lt;
/lastN ame &gt;&lt; /name &gt;”.
2. the relationship between elements can be complex.
      </p>
      <p>The relationship can be siblings, parent-child or
ancestor-descendant.
3. same name elements may appear within different
parent nodes and be distributed at different levels
in an XML document.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>Our method for transforming XML queries is to
classify elements based on distribution characteristics of
the values of elements and transform the queries based
on the classifications. As we discussed in the previous
section, the biggest obstacle is the flexibility of XML
documents. However if we concentrate only on specific
elements instead of the whole document, it is possible
to abstract their value distribution characteristics, and
related elements may be classified based on the
distribution characteristics. Then query transformation may be
carried out based on the classification.</p>
      <p>Consider the XML file shown in Fig. 1. It may be
difficult to classify the product elements based on the
values of all their subelements. This problem can be
overcome by choosing the most important elements for
the classification. For example we may analyze two
elements ID and material, and find that all products
that are made of cotton have IDs smaller than 200000
and the IDs of products that are made of leather are
greater than 200000. So we can classify the elements
product to two groups: product of cotton, which have
IDs smaller than 200000, and product of leather, which
have IDs bigger than 200000. With this classification,
a query “document( store.xml )//product[ID &gt;
200000 and material = ‘leather ]” can be transformed
to “document( store.xml )//product[material =
‘leather ]” and the transformed query will return same
results as the original when executed against the data in
Fig. 1.</p>
      <p>The element classification results and the related
information of elements, including both element value
characteristics and the number of the elements of each
group, are all useful in query transformation. With the
classification information, queries will be evaluated and
possible query transformations may be generated. If
there are multiple transformations, the number of
elements in each group may be used to choose the most
efficient transformation.</p>
      <p>XML
query
with element
4</p>
    </sec>
    <sec id="sec-4">
      <title>Transform classification</title>
      <p>Element classification based XML query
transformations can be classified into three categories:
Elimination, which removes mutually exclusive query
conditions; Reduction, which removes the semantically
redundant query conditions; and Introduction, which
introduces extra conditions to the original query
conditions. Below, we will discuss these transformations in
detail. In order to explain the transformations clearly,
we will use “ConditionM (e)” to represent the result
of evaluating query condition “M” on element “e” and
σCon(M)(E) to represent all the elements that satisfy
query condition “M” when applied to the elements in E.
4.1</p>
      <sec id="sec-4-1">
        <title>Elimination Transformation</title>
        <p>Elimination is the operation that eliminates queries
with mutually exclusive query conditions. Consider an
XML document in which “all the managers are paid
more than $5000”, then the condition of query
“document(’record.xml’)//person [position = ’manager’ and
getP aid &lt; 4500] ”1 can be transformed to F alse.
Theorem 3.1 If two conditions in a query are mutually
exclusive and the operator between them is “∧”, the two
conditions can be transformed to F alse, which can be
illustrated as</p>
        <p>ConditionM∧N (e) = False
if ConditionM (e) ∧ ConditionN (e) = False2
1We represent all queries in this paper in XPath.</p>
        <p>
          2For more discussion and a proof about each transformation please
refer to [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
The elimination transformation is a branch of the
reduction transformation. We discuss it separately because this
transformation may help us detect and block queries that
return no result due to mutually exclusive conditions, as
shown in the example above. These queries do not return
any result and executing these queries wastes time.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Reduction Transformation</title>
        <p>Reduction is the operation that eliminates
redundant query conditions so this kind of transformation
can only be applied to multiple-condition queries. An
example of this transformation is that the original
query “document(’record.xml’) // person [position =
manager and salary &gt; 5000]” can be simplified to
“document( record.xml )//
person[position = manager ]” if we know that all the
managers are paid more than 5000.</p>
        <p>Theorem 3.2 If all elements “e” that satisfy query
condition “M” always satisfy condition “N” and the
operator between the two query conditions is “∧”, then the
two query conditions can be simplified to condition “M”.
This is written below:</p>
        <sec id="sec-4-2-1">
          <title>ConditionM∧N (e) ⇒ ConditionM (e)</title>
          <p>if ∀e(e ∈ σCon(M)(E) ⇒ e ∈ σCon(N)(E))
Theorem 3.3 If all elements “e” that satisfy query
condition “M” always satisfy condition “N” and the operator
between the two query conditions is “∨”, then the two
query conditions can be simplified to condition “N”. This
can be represented as:</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>ConditionM∨N (e) ⇒ ConditionN (e)</title>
          <p>if ∀e(e ∈ σCon(M)(E) ⇒ e ∈ σCon(N)(E))
Because the redundant query conditions are removed by
the reduction transformation, both the amount of work of
“computation” and “join” will be reduced and the
performance of the query execution will be improved.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3 Introduction Transformation</title>
        <p>Introduction is the transformation that adds a new
condition, with the aim of reducing the query search
space, compared with the original query condition.
Consider the scenario where only people, whose rank
is lower that 4, get paid less than 3000, the query
“ document(’record.xml’)// person [salary &lt; 3000]
” can be transformed to “ document(’record.xml’ )//
person[rank &lt; 4 and salary &lt; 3000] ”. The
necessary conditions for this kind of transformation are :
• the domain of the original query condition must be
a subset of the domain of the additional condition.
• an index is built on the additional condition-related
element.
• the selectivity of the additional condition-related
element is higher than that of the original
conditionrelated element.</p>
        <p>Theorem 3.4 If all elements “e” that satisfy query
condition “M” always satisfy condition “N”, then the query
condition “M” can be transformed to condition “M ∧N ”.
The transformation can be illustrated as</p>
        <sec id="sec-4-3-1">
          <title>ConditionM (e) ⇒ ConditionM∧N (e)</title>
          <p>if ∀e(e ∈ σCon(M)(E) ⇒ e ∈ σCon(N)(E))
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimentation</title>
      <p>
        In this section, we present the experiments designed
for our research. All the experiments are run on an HP
workstation XW4200, which is equipped with an Intel
Pentium 4 CPU 3.40GHZ and 2 GB memory.
The experimental data sets are built using an XML
generator [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] based on the XMark benchmark [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. We use
the generator to build five XML documents with sizes
of 20M, 40M, 60M, 80M and 100M. Then we insert four
kinds of elements as child nodes to the “person” element.
The inserted elements and values are listed in Fig. 2.
From the figure, you can see that “person” elements can
be classified into different groups based on the values
of the inserted elements, such as a grouping of all the
persons that are in their thirties and with an importance
value of 1.
      </p>
      <p>Three groups of queries are designed for the experiments
and the formats are shown in Fig. 3. Because the
condition of the original query will be transformed to F alse
using the Elimination Transformation, there is no query
generated. For “reduction” and “introduction”, we can
get a series of queries with different selectivity by
changing the values of V al1 and V al2. The introduced
condition for “introduction” will be generated automatically
according the descriptions of element classifications and
for different situations the generated conditions are
different.</p>
      <p>A transforming engine was designed for the experiments,
which includes the following modules:
1. Information management module, which is
designed to manage the elements’ classification
descriptions used in query transformation;</p>
      <sec id="sec-5-1">
        <title>2. Information extraction module, which is designed</title>
        <p>to extract information for each kind of element from
the XML document;</p>
      </sec>
      <sec id="sec-5-2">
        <title>3. Blocking unsatisfied query module, which deals</title>
        <p>with the Elimination transformation;</p>
      </sec>
      <sec id="sec-5-3">
        <title>4. Reducing condition module, which deals with the</title>
        <p>Reduction transformation;</p>
      </sec>
      <sec id="sec-5-4">
        <title>5. Introducing condition module, which deals with</title>
        <p>the Introduction transformation.</p>
        <p>
          The structure of the engine is shown in Fig. 4 3.
We use the eXist XML database [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] in our experiments.
Because eXist provides a query caching function to
improve the speed of query execution and an index on
elements can be built both automatically and manually, both
the original queries and the generated queries are
executed in four situations:
1. no query caching plus an automatically built index,
2. no query caching plus a manually built index,
3. with query caching plus an automatically built index
4. with query caching plus a manually built index.
Because of the space limitations, we only list the results
in two situations: no query caching plus an
automatically made index and with query caching plus a manually
made index, and with two selectivities: 20% and 80%.
        </p>
      </sec>
      <sec id="sec-5-5">
        <title>Experiment for Elimination</title>
        <p>
          The results of these experiments are shown in Fig. 5. It
3For more discussion about the transforming engine please refer to
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
can be seen that time spent on the original query
execution is much longer than the time spent on the queries,
which are generated by the transformation engine. This
is because the conditions of the original queries are
transformed to “F alse”, and no queries are sent to the server
and all the computations are finished on the client side. If
we contrast this with the time spent on the original query
execution, the time spent on query transformation can
almost be ignored especially when transformation works
together with query caching.
        </p>
      </sec>
      <sec id="sec-5-6">
        <title>Experiment for Reduction</title>
        <p>Results of “reduction” are shown in Fig. 6. From the
Fig. it can be seen that because the workload on both
computation and join are reduced, the performance of
query execution can be improved by about 30% when
the query is executed with the automatically built index.
When the queries are executed with a manually built
index, the performance can be improved around 20%. This
phenomena becomes more obvious when the selectivity
or the size of the XML document is increased.</p>
      </sec>
      <sec id="sec-5-7">
        <title>Experiment for Introduction</title>
        <p>From Fig. 7, it can be seen that the results of
“introduction” are different from our prediction and the query
execution time has increased instead of reduced. In fact,
the bigger the selectivity or the size of the document,
the more the execution time increases. After analyzing
the query execution, we assume transformation
“introduction” may only work on some query engines which
execute queries by choosing a sub-query execution
strategy. And for query engines, which execute
multiplecondition queries by choosing the intersection of the
results of each sub-query, the transformation
“introduction” will increase the workload and make the query
execution worse.</p>
        <p>From the results of our experiments we can draw the
conclusion that element classification based query
transformation can improve the query execution in some
situations.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Related work</title>
      <p>
        Currently XML query optimization based on the content
of the data mainly concentrates on element statistics. In
the optimization mechanism in “Lore” [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the database
system abstracts all the information of nodes and edges,
and stores them as several indexes in the database
management system. All input queries are broken into a
series of sub-operations and each sub-operation can be
evaluated as part of the query. By creating evaluations
for all the sub-operations and joining all the result
aggregations together, the system can get the most effective
execution order for the sub-operations, resulting in faster
execution of the query.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], an element classification method is
introduced, which is based on the information of
structure of XML documents. By analyzing the DTD [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
of an XML document, elements are classified based
on their child nodes. For example: description “&lt;
!ELEM EN T person(name, email∗) &gt;” will lead to
a kind of classification of “person with email” and
“person without email”. With the classification, the query
searching area will be reduced.
      </p>
      <p>
        Authors of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduce a heuristic-based algebraic
XML query transformation method. The method is based
on a series of equivalences that are represented in PAT
algebra expressions [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. With information of the
structure of an XML document, a set of deterministic
algebraic transformation rules can be derived based on the
PAT equivalences. Then an input query can be
transformed to a more effective one with the transformation
rules.
      </p>
      <p>The research reported above improves the performance
of XML query execution. However, none of this work is
based on the distribution of the values of elements.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion and problems to be solved</title>
      <p>In this paper, We discuss the possibility of transforming
XML queries based on the content of data. We also
discuss the possible transformations, which are carried out
based on the content of data. A series of experiments are
presented and the results are discussed. The
experimental results show that transforming queries based on data
can improve query execution time in some situations.
Currently the element classifications are edited manually.
Now we are working on an algorithm, which analyzes the
values of elements and abstracts distribution
characteristics. As we discussed previously, XML documents are
very flexible making analysis difficult. We have
identified the following issues:</p>
      <sec id="sec-7-1">
        <title>1. the number of classifications When an XML doc</title>
        <p>
          ument is analyzed, it is very difficult for a user
or database administrator to specify an accurate
or suitable number of classifications based on the
data distribution. Because of this, we chose the
XMeans [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] algorithm to do the clustering, which
is an adapted KMeans algorithm. The advantage
of XMeans is that users input maximum and
minimum number of classifications instead of
specifying the exact number of classifications. XMeans
will explore the best classification number
automatically. Some experiments have shown that XMeans
can satisfy our demands.
2. missing element or missing values For this
problem, we propose to find an element which has
similar characteristics to the element that has a missing
subelement or missing value and attach the
subelement or value found in the similar element.
Consider the example in Fig. 8, with the people
elements. One of the people has a missing value for
“ID”. The similar element belongs to department
“computer” and has “ID” 01030, so the two
elements will be classified into the same group.
3. multiple same name elements under one
parent element For this situation, as a first step we
will replicate and divide the element. For
example: &lt; people &gt;&lt; /hobby &gt;&lt; /hobby &gt;&lt;
/name &gt;&lt; /people &gt; will be divided into two
elements that has structure &lt; people &gt;&lt; /hobby &gt;&lt;
/name &gt;&lt; /people &gt;. This solution will enable
us to classify people based on individual hobbies
but not on multiple hobbies.
4. datatype handling Values of elements/attributes in
XML documents may belong to various datatypes.
The data can be number, string or other types. The
data can be very long, such as a paragraph of a book,
or very short, such as an English character. Also the
data can be classified as continuous or discrete data.
Usually the data needs preprocessing before
analysis. Currently our system can analyse number, short
string or character. More functionality of datatype
handling will be added to the whole system in the
future.
5. value distribution When the classification is
derived, the value distribution of each group will be
used to optimize the input queries. The
distributions can be classified into three types: isolated,
overlapping and including. In our paper, we only
demonstrate one situation where the value of each
group is isolated. In our research, we found that the
value distribution is very complex and the
distributions influence the possibility of query optimization
greatly. An example for overlapping is all
lecturers have salary from 4000 to 6000 and all
professors have salary from 5500 to 8000. An example
for including is that the carparks for professors are
distributed on level 2 and level 3 while the carparks
for lectures are distributed on level 1, 2 and 3. How
to handle the complex situation of value
distribution and how to carry out query optimization with
the complex data distribution will be studied in the
future.
6. complex elements Even when we can specify
several elements to analyze, some elements with
complex structure may still be involved in the
analysis.How to deal with complex objects is still an open
question for us.
        </p>
        <p>Our contributions to date have been a data generator to
build data sets with specific characteristics to carry out
accurate and targeted experiments, the classification of
possible transformations and experiments that prove how
well the transformations work. From here, we plan to
build a tool that classifies the data. In order to do so, we
will have to address the issues described above.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ahme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Witkowski</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Zait</surname>
          </string-name>
          , and Cruanes T.
          <article-title>Cost-based query tranformation in oracle</article-title>
          . Seoul, Korea,
          <year>2006</year>
          . VLDB.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Consens</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. O.</given-names>
            <surname>Mendelzon</surname>
          </string-name>
          .
          <article-title>Xml query optimization using path indexes</article-title>
          . Paris, France,
          <year>2004</year>
          . XIME-P.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Chaudhri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rashid</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Zicari. XML Data</surname>
          </string-name>
          <article-title>Management: Native XML and XML-Enabled Database Systems</article-title>
          .
          <source>Addison Wesley</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Che</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Aberer</surname>
          </string-name>
          , and
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>O¨zsu. Query optimization in xml structured-document databases</article-title>
          .
          <source>the International Journal on Very Large Data Base</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferna</surname>
          </string-name>
          <article-title>´ndez and</article-title>
          <string-name>
            <given-names>D.</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <article-title>Optimizing regular path expressions using graph schemas</article-title>
          . Orlando, Florida,
          <year>1998</year>
          . ICDE.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Geng</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Dobbie</surname>
          </string-name>
          .
          <article-title>Element classification based transformation of xml queries</article-title>
          . Jakarta, Indonesia,
          <year>2007</year>
          . IIWAS.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Geng</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Dobbie</surname>
          </string-name>
          .
          <article-title>Using data in the transformation of xml queries</article-title>
          .
          <source>Christchurch</source>
          , New Zealand,
          <year>2008</year>
          . NZCSRSC.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Geng</surname>
          </string-name>
          and
          <string-name>
            <given-names>Gillian</given-names>
            <surname>Dobbie</surname>
          </string-name>
          .
          <article-title>An xml document generator for semantic query optimization experimentation</article-title>
          . Yogyakarta, Indonesia,
          <year>2006</year>
          . IIWAS.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Groppe</surname>
          </string-name>
          and
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Bo¨ttcher. Schema-based query optimization for xquery queries</article-title>
          . Tallinn, Estonia,
          <year>2005</year>
          . ADBIS.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cagle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kovack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pinnock</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Rafter</surname>
          </string-name>
          . Beginning XML. Wrox Press, Indianapolis, Indiana, second edition,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D. X. T.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bressan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Taniar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Rahayu</surname>
          </string-name>
          .
          <article-title>Semantic xpath query transformation: Opportunities and performance</article-title>
          . Bangkok Thailand,
          <year>2007</year>
          . DASFAA.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>McHugh</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Query optimization for xml</article-title>
          . Edinburgh, Scotland,
          <year>1999</year>
          . VLDB.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Lu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Ooi</surname>
          </string-name>
          .
          <article-title>An efficient semantic query optimization algorithm</article-title>
          . Kobe, Japan,
          <year>1991</year>
          . ICDE.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Pelleg</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Moore</surname>
          </string-name>
          .
          <article-title>X-means: Extending kmeans with efficient estimation of number of clusters</article-title>
          . San Francisco,
          <year>2000</year>
          . ICML.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Salminen</surname>
          </string-name>
          and
          <string-name>
            <given-names>F. W.</given-names>
            <surname>Tompa</surname>
          </string-name>
          .
          <article-title>Pat expressions: an algebra for text search</article-title>
          .
          <source>Acta Linguistica Hungarica</source>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Waas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kersten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Carey</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Manolescu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Busse</surname>
          </string-name>
          .
          <article-title>Xmark: A benchmark for xml data management</article-title>
          .
          <source>Hong Kong</source>
          , China,
          <year>2002</year>
          . VLDB.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>W.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. Zhang.</surname>
          </string-name>
          <article-title>An efficient method for xml queries optimization based dtd abstraction and classification</article-title>
          . Hangzhou, China,
          <year>2004</year>
          . WCICA.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>