<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Discovering Data Transformations in Web Resources (Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ziawasch Abedjan</string-name>
          <email>abedjan@tu-berlin.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Morcos</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ihab F. Ilyas</string-name>
          <email>ilyasg@uwaterloo.ca</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mourad Ouzzani</string-name>
          <email>mouzzani@qf.org.qa</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Papotti</string-name>
          <email>ppapotti@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Stonebraker</string-name>
          <email>stonebraker@csail.mit.edu</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>TU Berlin</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arizona State University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Qatar Computing Research Institute</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Waterloo</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In data integration, data curation, and other data analysis tasks, users spend a considerable amount of time converting data from one representation to another. For example US dates to European dates or airport codes to city names. In practice, data scientists have to code most of the transformation tasks manually, search for the appropriate dictionaries, and involve domain experts. In a previous vision paper, we presented the initial design of DataXFormer, a system that uses web resources to assist in transformation discovery [1]. Speci cally, DataXFormer discovers possible transformations from web tables and web forms and involves human feedback where appropriate. We demonstrated the system at SIGMOD 2015 and deployed an open version of the system, which helped us to increase our initial workload from 50 to 120 transformations [3]. At the same time we extended DataXFormer with new algorithms to nd 1. transformations that entail multiple columns of input data, 2. indirect transformations that are compositions of other transformations, 3. transformations that are not functions but rather relationships, and 4. transformations from a knowledge base of public data. We report on experiments with the collection of 120 transformation tasks, and show our enhanced system automatically covers 101 of them by using openly available web resources [2].</p>
      </abstract>
    </article-meta>
  </front>
  <body />
  <back>
    <ref-list />
  </back>
</article>