<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Self-Indexing XML</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Universidad de Chile gnavarro@dcc.uchile.cl</string-name>
        </contrib>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Self-indexing is a technology that integrates text compression and text indexing,
such that a text collection can be simultaneously compressed and indexed. The
resulting representation, called a self-index of the text, takes space close to that
of the compressed text, is able of reproducing any text substring, and o ers
indexed searching of the collection. This has been a major breakthrough in the
last decade, allowing one to handle huge text collections within main memory
and representing them succinctly.</p>
      <p>In this talk I will, besides presenting the basics of this technique, discuss how
it can be extended to index XML collections, where, on one hand, the text has
structure and, on the other hand, we wish to support much more complex query
languages, XPath at least. I will rst describe two plug-and-play solutions, where
a text self-index is coupled with compressed data structures that handle trees
and sequences. Then I will introduce two more innovative solutions, where the
compressed data structures are designed speci cally for this problem. This area
is full of open challenges and I will conclude by enumerating the most relevant
ones.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>