<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Theoretical View on Reverse Engineering Problems for Database Query Languages?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>DCC, University of Chile &amp; IMFD</institution>
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A typical reverse engineering problem for a query language L is as follows: Given a database D and two sets P and N of tuples over D labeled as positive and negative examples, respectively, is there a query q in L that explains P and N , i.e., the evaluation of q on D contains all positive examples in P and none of the negative examples in N ? Applications of reverse engineering problems include database explanations, data exploration, data security, relational classi er engineering, and the study of the expressiveness of query languages. In this talk I will present a family of tests that solve the reverse engineering problem described above for several query languages of interest, e.g., FO, CQ, UCQs, RPQs, CRPQs, etc. We will see that in many cases such tests directly provide optimal bounds for the problem, as well as for the size of the smallest query that explains the given labeled examples. I will also present restrictions that alleviate the complexity of the problem when it is too high. Finally, I will develop the relationship between reverse engineering and a separability problem recently introduced in the database theory literature to assist the task of relational classi er engineering with data management tools.</p>
      </abstract>
      <kwd-group>
        <kwd>reverse engineering de nability query by example separability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Good introductions to the applications of reverse engineering problems in data
exploratory analysis can be found in [
        <xref ref-type="bibr" rid="ref1 ref12 ref13">1, 13, 12</xref>
        ]. The study of theoretical issues
related to reverse engineering problems has received quite some attention in
di erent contexts; e.g., for rst-order logic and the class of conjunctive queries
over relational databases [
        <xref ref-type="bibr" rid="ref10 ref11 ref15 ref16 ref17 ref18 ref19 ref3 ref7">19, 16, 11, 7, 3, 18, 15, 17, 10</xref>
        ]; for regular path queries
over graph databases [
        <xref ref-type="bibr" rid="ref2 ref6">2, 6</xref>
        ]; for SPARQL queries over RDF [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]; for tree patterns
over XML [
        <xref ref-type="bibr" rid="ref14 ref8">8, 14</xref>
        ]; and for description logics in the setting of ontology-based data
access [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The application of reverse engineering methods to relational classi er
engineering can be found in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Ziawasch</given-names>
            <surname>Abedjan</surname>
          </string-name>
          , Lukasz Golab, and
          <string-name>
            <given-names>Felix</given-names>
            <surname>Naumann</surname>
          </string-name>
          .
          <article-title>Data pro ling: A tutorial</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <volume>1747</volume>
          {
          <fpage>1751</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Timos</given-names>
            <surname>Antonopoulos</surname>
          </string-name>
          , Frank Neven, and Frederic Servais.
          <article-title>De nability problems for graph query languages</article-title>
          .
          <source>In ICDT</source>
          , pages
          <volume>141</volume>
          {
          <fpage>152</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Marcelo</given-names>
            <surname>Arenas</surname>
          </string-name>
          and
          <string-name>
            <surname>Gonzalo I.</surname>
          </string-name>
          <article-title>D az. The exact complexity of the rst-order logic de nability problem</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>41</volume>
          (
          <issue>2</issue>
          ), to
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Marcelo</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo I. D az</surname>
          </string-name>
          , and
          <string-name>
            <surname>Egor</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Kostylev</surname>
          </string-name>
          .
          <article-title>Reverse engineering sparql queries</article-title>
          .
          <source>In WWW</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Pablo</given-names>
            <surname>Barcelo</surname>
          </string-name>
          , Alex Baumgartner, Victor Dalmau, and
          <string-name>
            <given-names>Benny</given-names>
            <surname>Kimelfeld</surname>
          </string-name>
          .
          <article-title>Regularizing conjunctive features for classi cation</article-title>
          .
          <source>In PODS</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Angela</given-names>
            <surname>Bonifati</surname>
          </string-name>
          , Radu Ciucanu, and
          <string-name>
            <given-names>Aurelien</given-names>
            <surname>Lemay</surname>
          </string-name>
          .
          <article-title>Learning path queries on graph databases</article-title>
          .
          <source>In EDBT</source>
          , pages
          <volume>109</volume>
          {
          <fpage>120</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Angela</given-names>
            <surname>Bonifati</surname>
          </string-name>
          , Radu Ciucanu, and
          <string-name>
            <given-names>Slawek</given-names>
            <surname>Staworko</surname>
          </string-name>
          .
          <article-title>Learning join queries from user examples</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>40</volume>
          (
          <issue>4</issue>
          ):
          <fpage>24</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Sara</given-names>
            <surname>Cohen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yaacov Y.</given-names>
            <surname>Weiss</surname>
          </string-name>
          .
          <article-title>Learning tree patterns from example graphs</article-title>
          .
          <source>In ICDT</source>
          , pages
          <volume>127</volume>
          {
          <fpage>143</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. V ctor
          <string-name>
            <surname>Gutierrez-Basulto</surname>
            , Jean Christoph Jung, and
            <given-names>Leif</given-names>
          </string-name>
          <string-name>
            <surname>Sabellek</surname>
          </string-name>
          .
          <article-title>Reverse engineering queries in ontology-enriched systems: The case of expressive horn description logic ontologies</article-title>
          .
          <source>In IJCAI</source>
          , pages
          <year>1847</year>
          {
          <year>1853</year>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dmitri</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Kalashnikov</surname>
          </string-name>
          ,
          <string-name>
            <surname>Laks</surname>
            <given-names>V. S.</given-names>
          </string-name>
          <string-name>
            <surname>Lakshmanan</surname>
            , and
            <given-names>Divesh</given-names>
          </string-name>
          <string-name>
            <surname>Srivastava</surname>
          </string-name>
          . Fastqre:
          <article-title>Fast query reverse engineering</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <volume>337</volume>
          {
          <fpage>350</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hao</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chee-Yong Chan</surname>
            ,
            <given-names>and David</given-names>
          </string-name>
          <string-name>
            <surname>Maier</surname>
          </string-name>
          .
          <article-title>Query from examples: An iterative, data-driven approach to query construction</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>8</volume>
          (
          <issue>13</issue>
          ):
          <volume>2158</volume>
          {
          <fpage>2169</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Denis Mayr Lima Martins.
          <article-title>Reverse engineering database queries from examples: State-of-the-art, challenges, and research opportunities</article-title>
          .
          <source>Information Systems</source>
          ,
          <volume>83</volume>
          :
          <fpage>89</fpage>
          {
          <fpage>100</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Davide</surname>
            <given-names>Mottin</given-names>
          </string-name>
          , Matteo Lissandrini, Yannis Velegrakis, and
          <string-name>
            <given-names>Themis</given-names>
            <surname>Palpanas</surname>
          </string-name>
          .
          <article-title>New trends on exploratory methods for data analytics</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>10</volume>
          (
          <issue>12</issue>
          ):
          <year>1977</year>
          {
          <year>1980</year>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Slawek</given-names>
            <surname>Staworko</surname>
          </string-name>
          and
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Wieczorek</surname>
          </string-name>
          .
          <article-title>Characterizing XML twig queries with examples</article-title>
          .
          <source>In ICDT</source>
          , pages
          <volume>144</volume>
          {
          <fpage>160</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <article-title>Balder ten Cate and V ctor Dalmau</article-title>
          .
          <article-title>The product homomorphism problem and applications</article-title>
          .
          <source>In ICDT</source>
          , pages
          <volume>161</volume>
          {
          <fpage>176</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Quoc Trung Tran, Chee Yong Chan, and
          <string-name>
            <given-names>Srinivasan</given-names>
            <surname>Parthasarathy</surname>
          </string-name>
          .
          <article-title>Query reverse engineering</article-title>
          . VLDB J.,
          <volume>23</volume>
          (
          <issue>5</issue>
          ):
          <volume>721</volume>
          {
          <fpage>746</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Yaacov</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Weiss</surname>
          </string-name>
          and Sara Cohen.
          <article-title>Reverse engineering spj-queries from examples</article-title>
          .
          <source>In PODS</source>
          , pages
          <volume>151</volume>
          {
          <fpage>166</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Ross</given-names>
            <surname>Willard</surname>
          </string-name>
          .
          <article-title>Testing expressibility is hard</article-title>
          .
          <source>In CP</source>
          , pages
          <volume>9</volume>
          {
          <fpage>23</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Meihui</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Hazem Elmeleegy,
          <string-name>
            <surname>Cecilia M. Procopiuc</surname>
            , and
            <given-names>Divesh</given-names>
          </string-name>
          <string-name>
            <surname>Srivastava</surname>
          </string-name>
          .
          <article-title>Reverse engineering complex join queries</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <volume>809</volume>
          {
          <fpage>820</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>