<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Axiomatic System for Order Dependencies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jaroslaw Szlichta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Parke Godfrey</string-name>
          <email>godfrey@cse.yorku.ca</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jarek Gryz</string-name>
          <email>jarek@cse.yorku.ca</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Calisto Zuzarte</string-name>
          <email>calisto@ca.ibm.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM Centre for Advanced Studies</institution>
          ,
          <addr-line>Toronto</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>York University</institution>
          ,
          <addr-line>Toronto</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Dependencies play an important role in database theory. We study order dependencies (ODs) and unidirectional order dependencies (UODs), which describe the relationships among lexicographical orderings of sets of tuples. We investigate the axiomatization problem for order dependencies. Understanding the semantics of data is important, both for data quality analysis and knowledge discovery. While the relational data model is set based and does not concede the concept of order, ordered streams nonetheless play important roles in relational systems. SQL allows one to specify by its order-by clause that the answer “set” be returned in the specified order. Ordered streams are prevalent in query plans between operators to provide efficient evaluation. A query optimizer must reason extensively over interesting orders while building efficient query plans [5, 7]. We are interested in lexicographical ordering, or nested sort, as is provided by SQL's order-by directive. Let X = [A | T] be a list of attributes, the attribute A is the head of the list, and the list T the tail. For two tuples s and t, s X t iff sA &lt; tA or (sA = tA and (T = [ ] or s T t)). Given two lists of attributes X and Y, X → Y denotes a unidirectional order dependency (UOD) [5], read as X orders Y. Table r satisfies X → Y iff, for all s, t ∈ r, s X t implies s Y t. That is, given table r, any list of the table's tuples that satisfies order by X also satisfies order by Y. Also, X ∼ Y denotes that X ↔ Y. The default direction of the order for SQL's order-by is ascending. We also consider order-by's that mix asc and desc directives; e.g., order by A asc, B desc. This generalization we simply call order dependency (OD) [4, 7].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Fundamentals</title>
      <p>Example 1. Let r be a table with attributes A, B, C, D, E, F (Table 1). Note
that [A, B, C] → [F, E, D] is satisfied by r but [A, B, C] → [F, D, E] is falsified by
r. Also note r |= [−→C , ←A−] → [−→B , ←D−, ←E−], but r |= [−→C , ←A−] → [←E−, −→B, ←D−].</p>
      <p>
        Functional dependencies (FDs) are to group-by as ODs are to order-by. Given
X → Y, if one has an SQL query with order by Y, one can rewrite the query
with order by X instead, and meet the intent of the original query. Assume we
have a tree index for X. This index can help in a query plan with the semantic
information that X orders Y. In [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5–7</xref>
        ], we showed the details how current query
optimizers could be extended with ODs to simplify queries with order by in
similar ways to how FDs have been shown to be useful in simplifying queries
with group by.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Axiomatization and Challenges</title>
      <p>
        A key concern in dependency theory is developing the algorithms for testing
logical implication. Developing inference rules (axioms) is an approach to show
logical implication between dependencies. We investigate the axiomatization for
ODs. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we studied UODs. We provided a sound and complete axiomatization
for UODs. The inference rules of the axiomatization are shown in Figure 1. The
inference rules remain sound over ODs. To prove the axiomatization is sound,
we show that each of the axiom is sound, which is simple. Proving completeness
is much more involved. To prove the axiomatization is complete for UODs, we
demonstrate in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that for any set of UODs M, a table t can be constructed
that satisfies M and that every OD that is not derivable over M with axioms
presented in Figure 1 is falsified by table t.
      </p>
      <p>2. Prefix.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we additionally demonstrate that Armstrongs axiomatization for
functional dependencies (FDs) is subsumed within our axiomatization for ODs.
Working with ODs is more involved than with FDs because the order of the
attributes matters. Thus, we must work with lists of attributes instead of with
sets. This necessarily complicates our axioms compared with Armstrongs
axioms for FDs. The axioms provide insight into how ODs behave and patterns
for how ODs logically follow from others that are not easily evident reasoning
from first principles. The work about ODs, we feel, opens exciting venues for
future work to develop a powerful new family of query optimization techniques
in database systems.
      </p>
      <p>
        There is more that can be done, and that we plan to do. We plan to work on
extending our work axiomatization for UODs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] into an complete
axiomatization for ODs, which allow the mix of ascending and descending orders. Such an
axiomatization might provide insight into how ODs behave, and provide input
for useful query rewrites.
      </p>
      <p>If ABC → D holds but not AB → D, is ordering by AB useful if we need a
stream sorted by D? If the stream is sorted by AB, it may be nearly sorted on
D. If it were known that every partition of AB is small, each AB-partition could
be sorted on-the-fly in main memory, removing the need for an external sort
operator. We would like to formalize the concept of nearly sorted.</p>
      <p>
        We are working on introducing a framework for discovering conditional
order dependencies. (Conditional sequential dependencies were proposed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].) A
conditional order dependency can be represented as a pair (X → Y, Tr), where
X → Y, referred to as the embedded OD, and Tr is a range pattern tableau
defining over which rows the dependency applies. It would provide a novel
integrity constraint allowing one to express, that an OD date → salary holds for a
given employee id.
      </p>
      <p>
        Furthermore, in the process of merging data from various sources, it is often
the case that small variations occur. For example, one movie site might report
the movie Gone with the Wind as having a running time of 222, while site two
reports 238 minutes for it. The FD that movie → length would be violated. In
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], they define a metric over FDs to allow for such small variations. Likewise,
we would like to define metric ODs to generalize both ODs as in this paper and
metric FDs. We would like to devise algorithms for determining whether a given
metric OD holds for a given relation, and to investigate the use of these as data
cleaning techniques as in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for matching dependencies.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>L.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kolahi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Lakshmanan</surname>
          </string-name>
          .
          <article-title>Data cleaning and query answering with matching dependencies and matching functions</article-title>
          .
          <source>In ICDT</source>
          ,
          <fpage>268</fpage>
          -
          <lpage>279</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>L.</given-names>
            <surname>Golab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Karloff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Korn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          .
          <source>Sequential Dependencies. PVLDB</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>574</fpage>
          -
          <lpage>585</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>N.</given-names>
            <surname>Koudas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Venkatasubramanian</surname>
          </string-name>
          .
          <article-title>Metric Functional Dependencies</article-title>
          . In ICDE,
          <fpage>1291</fpage>
          -
          <lpage>1294</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Szlichta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Godfrey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Gryz</surname>
          </string-name>
          .
          <article-title>Chasing Polarized Order Dependencies</article-title>
          .
          <source>In AMW</source>
          ,
          <fpage>168</fpage>
          -
          <lpage>179</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Szlichta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Godfrey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Gryz</surname>
          </string-name>
          .
          <source>Fundamentals of Order Dependencies. PVLDB</source>
          ,
          <volume>5</volume>
          (
          <issue>11</issue>
          ):
          <fpage>1220</fpage>
          -
          <lpage>1231</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Szlichta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Godfrey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gryz</surname>
          </string-name>
          , W. Ma, P. Pawluk, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zuzarte</surname>
          </string-name>
          . Queries on Dates:
          <article-title>Fast Yet not Blind</article-title>
          . In EDBT,
          <fpage>497</fpage>
          -
          <lpage>502</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>J.</given-names>
            <surname>Szlichta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Godfrey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gryz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zuzarte</surname>
          </string-name>
          .
          <article-title>Expressiveness and Complexity of Order Dependencies</article-title>
          . PVLDB,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>