<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fitting Ontologies and Constraints to Relational Structures (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simon Hosemann</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean Christoph Jung</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten Lutz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Rudolph</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Scalable Data Analytics and Artificial Intelligence</institution>
          ,
          <addr-line>ScaDS.AI</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leipzig University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TU Dortmund University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>We study the problem of fitting ontologies and constraints to positive and negative examples that take the form of a finite relational structure. As ontology and constraint languages, we consider the description logics ℰℒ and ℰℒℐ as well as several classes of tuple-generating dependencies (TGDs): full, guarded, frontier-guarded, frontier-one, and unrestricted TGDs as well as inclusion dependencies. We pinpoint the exact computational complexity, design algorithms, and analyze the size of fitting ontologies and TGDs. We also investigate the related problem of constructing a finite basis of concept inclusions / TGDs for a given set of finite structures.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Fitting Problems</kwd>
        <kwd>Database Constraints</kwd>
        <kwd>Ontologies</kwd>
        <kwd>Tuple-Generating Dependencies</kwd>
        <kwd>Description Logics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>using unary and binary relation symbols. Let ℒ be one of the TGD classes mentioned above (including
ℰℒ(ℐ)-CIs). Further let (P, N) be a pair of finite sets of instances, henceforth called a fitting instance .
We say that an ℒ-ontology  fits (P, N) if  |=  for all  ∈ P and  ̸|=  for all  ∈ N. For a single
ℒ-TGD  , fitting (P, N) is defined in exactly the same way. The induced decision problems of fitting
ℒ-ontology existence and fitting ℒ-TGD existence ask whether a given (P, N) admits a fitting ℒ-ontology
or a fitting ℒ-TGD. We also consider the corresponding construction problems, where the goal is to
construct a fitting ℒ-ontology or a fitting ℒ-TGD for (P, N), if one exists.</p>
      <p>Example 1. Consider the instances  = {(, ), (, )},  = {(, ), (, ), (, )}. Then
({ }, { }) has no fitting ℰℒℐ-CI, but it has fitting GTGDs such as</p>
      <p>(, ) → (, ).</p>
      <p>Now let  ′ =  ∪ {(, ), (, ), (, )}. Then ({ }, { ′}) has no fitting GTGD. But it has fitting
FGTGDs such as</p>
      <p>(, ) ∧ (, ) ∧ (, ) → (, ).</p>
      <p>Example 2. Having ⊥ or not makes a diference. Let  = {(, )},  = {(, )}. Then ∃.∃.⊤ ⊑
⊥ fits ({ }, { }), but ({ }, { }) has no fitting ℰℒℐ-ontology.</p>
      <p>
        All negative claims in Examples 1 and 2 are a consequence of the semantic characterizations for
iftting ℒ-TGD existence established in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The characterization for ℰℒ⊥ and ℰℒℐ⊥ is explicitly stated
in Theorem 1 below.
      </p>
      <p>How are fitting ontologies and fitting TGDs related? The following is an immediate consequence of
the definition of fitting and the semantics of ontologies and TGDs.</p>
      <p>Lemma 1. Let (P, N) be a fitting instance. Then there is an ℒ-ontology that fits (P, N) if and only if for
every  ∈ N, there is an ℒ-TGD that fits (P, { }).</p>
      <p>Hence, if (P, N) admits a fitting ℒ-ontology, it admits one with at most |N| TGDs.</p>
      <p>
        The problem of fitting an ontology to a given set of examples turns out to be closely related to a
problem that has been studied in the area of description logic and is known as finite basis construction
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ]. There, one fixes an ontology language ℒ and is given as input a finite instance  and the task
is to produce an ℒ-ontology  such that  |=  if and only if  |=  , for all ℒ-TGDs  . We generalize
this problem to a finite set H of input instances. The following lemma connects finite basis construction
with fitting ℒ-ontology existence. Informally, it states that a finite basis of the positive examples is a
canonical candidate for a fitting ℒ-ontology.
      </p>
      <p>Lemma 2. Let (P, N) be a fitting instance and let P be a finite ℒ-basis of P. Then P fits (P, N) if and
only if (P, N) has a fitting ℒ-ontology.</p>
      <p>If finite ℒ-bases always exists, we can thus solve the ℒ-ontology fitting problem for any (P, N) by
constructing P and checking whether it fits the input examples. This approach in fact often yields
decidability and tight upper complexity bounds.</p>
      <p>
        We first consider the DLs ℰℒ and ℰℒℐ as well as their extensions with the ⊥ concept. We reprove the
existence of finite bases for ℰℒ, already known from [
        <xref ref-type="bibr" rid="ref10 ref13">13, 10</xref>
        ], and simultaneously prove that finite bases
exist also for ℰℒℐ which to the best of our knowledge is a new result. In contrast to the proofs from
[
        <xref ref-type="bibr" rid="ref10 ref13">13, 10</xref>
        ], our proofs are direct in that they do not rely on the machinery of formal concept analysis. The
constructed bases are of double exponential size, but can be succinctly represented in single exponential
size by structure sharing. We also show that these size bounds are tight, both for ℰℒ and for ℰℒℐ. We
obtain from this an ExpTime upper bound for the fitting existence problem for ℰℒ- and ℰℒℐ-ontologies.
      </p>
      <p>In order to obtain lower complexity bounds, we provide a semantic characterization of fitting ℰℒ- and
ℰℒℐ-CI existence in terms of simulations and direct products. Let ℒ ∈ {ℰℒ, ℰℒℐ}. For unary pointed
instances (, ) and (, ) we write (, ) ⪯ ℒ (, ) if there exists an ℒ-simulation from  to  that
contains the pair (, ). Recall that an ℰℒ-simulation preserves concept names and the existence of
role-successors, whereas an ℰℒℐ-simulation must in addition preserve role-predecessors, reflecting
inverse roles. For a non-empty finite set H of instances with pairwise disjoint active domains, we use
⨄︀H to denote the instance ⋃︀ H. When the domains of the instances in H are not pairwise disjoint,
we assume that renaming is used to achieve disjointness before forming ⨄︀H. We next present the
characterization for fitting ℰℒ⊥- and ℰℒℐ⊥-CI existence.</p>
      <p>Theorem 1. Let ℒ ∈ {ℰℒ, ℰℒℐ}. Let (P, N) be a fitting instance where N = {1, . . . , } and let
 = ⨄︀P. Then no ℒ⊥-concept inclusion fits (P, N) if and only if for all ¯= ( 1, . . . , ) ∈ ∆ ∏︀ N, the
following condition is satisfied:
¯ = {(, ) | (∏︁ N, ¯) ⪯ ℒ (, )} is non-empty and
∏︁ ¯ ⪯ ℒ (, ) for some  ∈ [].</p>
      <p>
        An extended version of Theorem 1, also covering the cases of ℰℒ and ℰℒℐ without ⊥ is provided
in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The semantic characterization gives rise to an algorithm for fitting ℰℒ(ℐ)-CI existence and opens
up an alternative path to algorithms for fitting ℰℒ(ℐ)-ontology existence. It also enables us to prove
lower complexity bounds and we in fact show that all four problems are ExpTime-complete. The proof
of the theorem is constructive in the sense that it also yields an algorithm for fitting CI and fitting
ontology construction. Regarding fitting ontology existence and construction, Lemma 1 yields a simple
reduction to the CI fitting case that gives the desired results. We also prove tight bounds on the sizes of
iftting CIs and fitting ontologies, which are identical to the size bounds on finite bases described above.
      </p>
      <p>We next turn to TGDs. For guarded TGDs, we implement exactly the same program described above
for ℰℒ(ℐ), but obtain diferent complexities. We show that finite GTGD-bases always exist and establish
a tight single exponential bound on their size. Succinct representation does not help to reduce the size.
We give a characterization of fitting GTGD existence and fitting GTGD-ontology existence in terms of
products and homomorphisms, show that fitting GTGD existence and fitting GTGD-ontology existence
is coNExpTime-complete, and give a tight single exponential bound on the size of fitting GTGDs and
GTGD-ontologies. The coNExpTime upper bound may be obtained either via finite bases or via the
semantic characterization.</p>
      <p>For the remaining classes of TGDs, the approach via finite bases fails: for the frontier-guarded,
frontier-one, and full case, we prove that finite bases need not exist. For inclusion dependencies, finite
bases trivially exist but approaching fitting via this route does not result in an optimal upper complexity
bound. For unrestricted TGDs, the existence of finite bases is left open.</p>
      <p>Theorem 2. For ℒ ∈ {FGTGD, F1TGD, FullTGD}, there exist instances that have no finite ℒ-basis.
Example 3. Consider the instance  = {(, ), (, )}. It has no finite FGTGD- and no finite
F1TGDbasis. For every  ≥ 1, consider the frontier-one TGD
  =</p>
      <p>⋀︁
∈[− 1]</p>
      <p>(, +1) ∧ (, 1) → (1, 1).</p>
      <p>The TGD   expresses that if 1 lies on a cycle of length , then 1 has a reflexive loop. We have  |=  
for all odd  because (i) a cycle homomorphically maps to  if and only if it is of even length and (ii) 
contains no reflexive loops. Note that the rule bodies of the TGDs   with  odd get larger with increasing
. Intuitively, this means that also the rule bodies of any finite FGTGD-basis of  must be of unbounded
size, which means that there is no finite FGTGD-basis.</p>
      <p>We may, however, still approach fitting existence in a direct way or via a semantic characterization.
For inclusion dependencies (IND), we use direct arguments to show that fitting IND existence and fitting
IND-ontology existence is NP-complete, and that the size of fitting IND-ontologies is polynomial. For
all remaining cases, we establish semantic characterizations in terms of products and homomorphisms
and then use them to approach fitting existence. In this way, we prove the following. Fitting ontology
existence and fitting TGD existence are coNExpTime-complete for TGDs that are frontier-guarded
or frontier-one. For full TGDs, fitting TGD existence is coNExpTime-complete and fitting ontology
existence is in Σ 2 and DP-hard. In the case of unrestricted TGDs, both problems are coNExpTime-hard
and we prove a co2NExpTime upper bound for fitting ontology existence and a co3NExpTime upper
bound for fitting TGD existence. We also show tight single exponential size bounds for fitting TGDs
and ontologies in the case of frontier-guarded and frontier-one TGDs. We do the same for fitting full
TGDs while if there is a fitting FullTGD-ontology, then there is always one of polynomial size. For
unrestricted TGD and TGD-ontology fittings, we give a single exponential lower bound and a triple (for
TGDs) and double (for ontologies) exponential upper bound on the size.</p>
    </sec>
    <sec id="sec-2">
      <title>Acknowledgments</title>
      <p>This work is partly supported by BMFTR (Federal Ministry of Research, Technology and Space) in
DAAD project 57616814 (SECAI, School of Embedded Composite AI) as part of the program Konrad
Zuse Schools of Excellence in Artificial Intelligence.</p>
      <p>The second and third author were supported by DFG project JU 3197/1-1.</p>
    </sec>
    <sec id="sec-3">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maier</surname>
          </string-name>
          ,
          <article-title>Query from examples: An iterative, data-driven approach to query construction</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>8</volume>
          (
          <year>2015</year>
          )
          <fpage>2158</fpage>
          -
          <lpage>2169</lpage>
          . doi:
          <volume>10</volume>
          .14778/2831360.2831369.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Barceló</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <article-title>The complexity of reverse engineering problems for conjunctive queries</article-title>
          ,
          <source>Proc. of ICDT 68</source>
          (
          <year>2017</year>
          ) 7:
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          :
          <fpage>17</fpage>
          . doi:
          <volume>10</volume>
          .4230/LIPICS.ICDT.
          <year>2017</year>
          .
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>ten Cate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dalmau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <article-title>Extremal fitting problems for conjunctive queries</article-title>
          ,
          <source>Proc. of PODS</source>
          (
          <year>2023</year>
          )
          <fpage>89</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>Concept learning in description logics using refinement operators</article-title>
          ,
          <source>Mach. Learn</source>
          .
          <volume>78</volume>
          (
          <year>2010</year>
          )
          <fpage>203</fpage>
          -
          <lpage>250</lpage>
          . doi:
          <volume>10</volume>
          .1007/S10994-009-5146-2.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pulcini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          ,
          <article-title>Separating data examples by description logic concepts with restricted signatures</article-title>
          ,
          <source>Proc. of KR</source>
          (
          <year>2021</year>
          )
          <fpage>390</fpage>
          -
          <lpage>399</lpage>
          . doi:
          <volume>10</volume>
          .24963/KR.
          <year>2021</year>
          /37.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>B. ten Cate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Lutz, SAT-based PAC learning of description logic concepts</article-title>
          ,
          <source>Proc. of IJCAI</source>
          (
          <year>2023</year>
          )
          <fpage>3347</fpage>
          -
          <lpage>3355</lpage>
          . doi:
          <volume>10</volume>
          .24963/IJCAI.
          <year>2023</year>
          /373.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Petrova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherkhonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Entity comparison in RDF graphs</article-title>
          ,
          <source>Proc. of ISWC 10587</source>
          (
          <year>2017</year>
          )
          <fpage>526</fpage>
          -
          <lpage>541</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -68288-4\_
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Petrova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. V.</given-names>
            <surname>Kostylev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Query-based entity comparison in knowledge graphs revisited</article-title>
          ,
          <source>Proc. of ISWC 11778</source>
          (
          <year>2019</year>
          )
          <fpage>558</fpage>
          -
          <lpage>575</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -30793-6\_
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hosemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          ,
          <article-title>Fitting ontologies and constraints to relational structures</article-title>
          ,
          <source>in: Proc. of KR</source>
          ,
          <year>2025</year>
          . Accepted; to appear. Preprint: https://arxiv.org/abs/2508.13176.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Distel</surname>
          </string-name>
          ,
          <article-title>Learning description logic knowledge bases from data using methods from formal concept analysis</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Dresden University of Technology,
          <year>2011</year>
          . URL: https://nbn-resolving.org/urn: nbn:de:bsz:
          <fpage>14</fpage>
          -qucosa-70199.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guimarães</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ozaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Persia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sertkaya</surname>
          </string-name>
          , Mining E L ⊥
          <article-title>bases with adaptable role depth</article-title>
          ,
          <source>J. Artif. Intell. Res</source>
          .
          <volume>76</volume>
          (
          <year>2023</year>
          )
          <fpage>883</fpage>
          -
          <lpage>924</lpage>
          . doi:
          <volume>10</volume>
          .1613/JAIR.1.13777.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Kriegel</surname>
          </string-name>
          ,
          <article-title>Eficient axiomatization of OWL 2 EL ontologies from data by means of formal concept analysis</article-title>
          ,
          <source>Proc. of AAAI 38</source>
          (
          <year>2024</year>
          )
          <fpage>10597</fpage>
          -
          <lpage>10606</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Distel</surname>
          </string-name>
          ,
          <article-title>A finite basis for the set of EL-implications holding in a finite model</article-title>
          ,
          <source>Proc. of ICFCA 4933</source>
          (
          <year>2008</year>
          )
          <fpage>46</fpage>
          -
          <lpage>61</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>540</fpage>
          -78137-0\_4.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>