Separating Positive and Negative Data
     Examples by Concepts and Formulas:
  The Case of Restricted Signatures (Abstract)

 Jean Christoph Jung1 , Carsten Lutz1 , Hadrien Pulcini2 , and Frank Wolter2
        1                                      2
            University of Bremen, Germany          University of Liverpool, UK


    There are several applications that fall under the broad term of supervised
learning and seek to compute a logical expression that separates positive from
negative examples given in the form of labeled data items in a knowledge base.
A prominent example is concept learning for description logics (DLs) where the
aim is to automatically construct a concept description that can then be used,
for instance, in ontology engineering [4, 21, 20, 30, 8, 9, 27]. A further example is
reverse engineering of database queries (also called query by example, QBE),
which has a long history in database research [28, 29, 32, 31, 17, 1, 6, 18, 23] and
which has also been studied in the presence of a DL ontology [12, 24]. Note
that a closed world semantics is adopted for QBE in databases while an open
world semantics is required when the data is assumed to be incomplete as in the
presence of ontologies, but also, for example, in reverse engineering of SPARQL
queries [2]. Another example is entity comparison in RDF graphs, where one
aims to find meaningful descriptions that separate one entity from another [26,
25] and a final example is generating referring expressions (GRE) where the aim
is to describe a single data item by a logical expression such as a DL concept,
separating it from all other data items. GRE has originated in linguistics [19],
but has recently received interest in DL-based ontology-mediated querying [7].
    A fundamental problem common to all these applications is to decide whether
a separating expression exists at all. There are several degrees of freedom in defin-
ing this problem. One concerns the negative examples: is it enough that they
do not entail the separating formula (weak separability) or are they required
to entail its negation (strong separability)? Another one concerns the question
whether additional helper symbols are admitted in the separating formula (pro-
jective separability) or not (non-projective separability). The emerging family of
problems has recently been investigated in [10, 14], concentrating on the case
where the separating expression is a DL concept or formulated in a decidable
fragment of first-order logic (FO) such as the guarded fragment (GF) and the
guarded negation fragment (GNF). In the work reported about in this abstract
[15], we add a signature Σ that is given as an additional input and require
separating expressions to be formulated in Σ. This makes it possible to ‘direct’
separation towards expressions based on desired features and to exclude features
that are not supposed to be used for separation such as gender and skin color. In
  Copyright © 2020 for this paper by its authors. Use permitted under Creative
  Commons License Attribution 4.0 International (CC BY 4.0).
the projective case, helper symbols from outside Σ are also admitted, but must
be ‘fresh’ in that they cannot occur in the given knowledge base. The signa-
ture Σ brings the separation problem closer to the problem of deciding whether
an ontology is a conservative extension of another ontology [13], also a form of
separation, and to deciding the existence of uniform interpolants [22]. It turns
out, in fact, that lower bounds for these problems can often be adapted to weak
separability with signature. In constrast, for strong separability we observe a
close connection to Craig interpolation.
    We consider both weak and strong separability, generally assuming that the
ontology is formulated in the same logic that is used for separation. We con-
centrate on combined complexity, that is, the input to the decision problems
consists of the knowledge base that comprises an ABox and an ontology, the
positive and negative examples in the form of lists of individuals (for DLs) or
lists of tuples of individuals (for FO fragments that support more than one free
variable), and the signature. In the following, we summarize our main results.
    We start with weak projective separability in ALCI, present a characteriza-
tion in terms of Σ-homomorphisms that generalizes characterizations from [10,
14], and then give a decision procedure based on tree automata. This yields a
2ExpTime upper bound, and a matching lower bound is obtained by reduction
from conservative extensions. In contrast, weak projective (and non-projective)
separability in ALCI without a signature is only NExpTime-complete [10]. The
non-projective case with signature remains open. We then show that weak sepa-
rability is undecidable in any fragment of FO that extends GF (such as GNF) or
ALCF IO (such as the two-variable fragment of FO with counting quantifiers).
In both cases, the proof is by adaptation of undecidability proofs for conserva-
tive extensions [13, 11] and applies to both the projective and the non-projective
case. This should be contrasted with the fact that weak separability is decid-
able and 2ExpTime-complete for GF and for GNF without a signature, both
in the projective and in the non-projective case [14]. The decidability status
of (any version of) separability in ALCF IO without a signature is open. It is
known, however, that projective and non-projective weak separability without a
signature are undecidable in the two-variable fragment of FO [14].
    We then turn to strong separability. Here, the projective and the non-pro-
jective case coincide and will thus not be distinguished in what follows. We
again start with ALCI for which we show 2ExpTime-completeness, and thus
the increase in complexity that results from adding a signature is even more
pronounced. In fact, strong separability without a signature is only ExpTime-
complete in ALCI [14]. The proofs are rather different from those used in the
weak case. The upper bound proof uses a characterization of non-separability in
terms of the existence of a set of types that can be realized in a model of the
ontology at elements that are all ALCI(Σ)-bisimilar. A matching lower bound
is proved by a reduction from the word problem of exponentially space bounded
ATMs. Alternatively, the 2ExpTime upper bound can be proved by a polynomial
time reduction to Craig interpolant existence in ALCIO, the extension of ALCI
with nominals. In fact, it has recently be shown that Craig interpolant existence
in ALCIO is decidable in 2ExpTime [3].
    For the guarded fragments GF and GNF, we show decidability in 3ExpTime
and 2ExpTime-completeness, respectively. The upper bounds are proved by pro-
viding a polynomial time reduction to the respective Craig interpolant existence
problems. Since GNF has the Craig interpolation property (CIP) [5], interpolant
existence reduces to validity and is thus 2ExpTime-complete. GF does not en-
joy the CIP and the 3ExpTime upper bound for interpolant existence has only
been established recently [16].

Acknowledgments. Supported by the DFG Collaborative Research Center
1320 EASE - Everyday Activity Science and Engineering.


References
 1. Arenas, M., Diaz, G.I.: The exact complexity of the first-order logic definability
    problem. ACM Trans. Database Syst. 41(2), 13:1–13:14 (2016)
 2. Arenas, M., Diaz, G.I., Kostylev, E.V.: Reverse engineering SPARQL queries. In:
    Proc. of WWW. pp. 239–249 (2016)
 3. Artale, A., Jung, J.C., Mazzullo, A., Ozaki, A., Wolter, F.: Living without Beth
    and Craig: Explicit definitions and interpolants in description logics with nominals
    (2020), available at https://arxiv.org/abs/2007.02736
 4. Badea, L., Nienhuys-Cheng, S.: A refinement operator for description logics. In:
    Proc. of ILP. pp. 40–59 (2000)
 5. Bárány, V., Benedikt, M., ten Cate, B.: Some model theory of guarded negation.
    J. Symb. Log. 83(4), 1307–1344 (2018)
 6. Barceló, P., Romero, M.: The complexity of reverse engineering problems for con-
    junctive queries. In: Proc. of ICDT. pp. 7:1–7:17 (2017)
 7. Borgida, A., Toman, D., Weddell, G.E.: On referring expressions in query answering
    over first order knowledge bases. In: Proc. of KR. pp. 319–328 (2016)
 8. Bühmann, L., Lehmann, J., Westphal, P., Bin, S.: DL-learner - structured machine
    learning on semantic web data. In: Proc. of WWW. pp. 467–471 (2018)
 9. Fanizzi, N., Rizzo, G., d’Amato, C., Esposito, F.: DLFoil: Class expression learning
    revisited. In: Proc. of EKAW. pp. 98–113 (2018)
10. Funk, M., Jung, J.C., Lutz, C., Pulcini, H., Wolter, F.: Learning description logic
    concepts: When can positive and negative examples be separated? In: Proc. of
    IJCAI. pp. 1682–1688 (2019)
11. Ghilardi, S., Lutz, C., Wolter, F.: Did I damage my ontology? A case for conser-
    vative extensions in description logics. In: Proc. of KR. pp. 187–197. AAAI Press
    (2006)
12. Gutiérrez-Basulto, V., Jung, J.C., Sabellek, L.: Reverse engineering queries in
    ontology-enriched systems: The case of expressive Horn description logic ontologies.
    In: Proc. of IJCAI-ECAI (2018)
13. Jung, J., Lutz, C., Martel, M., Schneider, T., Wolter, F.: Conservative extensions in
    guarded and two-variable fragments. In: Proc. of ICALP. pp. 108:1–108:14. Schloss
    Dagstuhl – LZI (2017)
14. Jung, J.C., Lutz, C., Pulcini, H., Wolter, F.: Logical separability of incomplete
    data under ontologies. In: Proc. of KR (2020)
15. Jung, J.C., Lutz, C., Pulcini, H., Wolter, F.: Separating positive and negative
    data examples by concepts and formulas: The case of restricted signature. In:
    https://arxiv.org/abs/2007.02736 (2020)
16. Jung, J.C., Wolter, F.: Living without Beth and Craig: Explicit def-
    initions and interpolants in the guarded fragment (2020), available at
    http://arxiv.org/abs/2007.01597
17. Kalashnikov, D.V., Lakshmanan, L.V., Srivastava, D.: Fastqre: Fast query reverse
    engineering. In: Proc. of SIGMOD. pp. 337–350 (2018)
18. Kimelfeld, B., Ré, C.: A relational framework for classifier engineering. ACM Trans.
    Database Syst. 43(3), 11:1–11:36 (2018), https://doi.org/10.1145/3268931
19. Krahmer, E., van Deemter, K.: Computational generation of referring expressions:
    A survey. Computational Linguistics 38(1), 173–218 (2012)
20. Lehmann, J., Fanizzi, N., Bühmann, L., d’Amato, C.: Concept learning. In: Per-
    spectives on Ontology Learning, pp. 71–91. AKA / IOS Press (2014)
21. Lehmann, J., Hitzler, P.: Concept learning in description logics using refinement
    operators. Machine Learning 78, 203–250 (2010)
22. Lutz, C., Wolter, F.: Foundations for uniform interpolation and forgetting in ex-
    pressive description logics. In: Proc. of IJCAI. pp. 989–995. IJCAI/AAAI (2011)
23. Martins, D.M.L.: Reverse engineering database queries from examples: State-of-
    the-art, challenges, and research opportunities. Information Systems (2019)
24. Ortiz, M.: Ontology-mediated queries from examples: a glimpse at the DL-Lite
    case. In: Proc. of GCAI. pp. 1–14 (2019)
25. Petrova, A., Kostylev, E.V., Grau, B.C., Horrocks, I.: Query-based entity com-
    parison in knowledge graphs revisited. In: Proc. of ISWC. pp. 558–575. Springer
    (2019)
26. Petrova, A., Sherkhonov, E., Grau, B.C., Horrocks, I.: Entity comparison in RDF
    graphs. In: Proc. of ISWC. pp. 526–541 (2017)
27. Sarker, M.K., Hitzler, P.: Efficient concept induction for description logics. In:
    Proc. of AAAI. pp. 3036–3043 (2019)
28. Tran, Q.T., Chan, C., Parthasarathy, S.: Query by output. In: Proc. of PODS. pp.
    535–548. ACM (2009)
29. Tran, Q.T., Chan, C.Y., Parthasarathy, S.: Query reverse engineering. VLDB J.
    23(5), 721–746 (2014)
30. Tran, T., Ha, Q., Hoang, T., Nguyen, L.A., Nguyen, H.S.: Bisimulation-based con-
    cept learning in description logics. Fundam. Inform. 133(2-3), 287–303 (2014)
31. Weiss, Y.Y., Cohen, S.: Reverse engineering spj-queries from examples. In: Proc.
    of PODS. pp. 151–166. ACM (2017)
32. Zhang, M., Elmeleegy, H., Procopiuc, C.M., Srivastava, D.: Reverse engineering
    complex join queries. In: Proc. of SIGMOD. pp. 809–820. ACM (2013)