=Paper= {{Paper |id=Vol-2738/paper7 |storemode=property |title=Modeling Interdependent Preferences over Incomplete Knowledge Graph Query Answers |pdfUrl=https://ceur-ws.org/Vol-2738/LWDA2020_paper_7.pdf |volume=Vol-2738 |authors=Till Affeldt,Stephan Mennicke,Wolf-Tilo Balke |dblpUrl=https://dblp.org/rec/conf/lwa/AffeldtMB20 }} ==Modeling Interdependent Preferences over Incomplete Knowledge Graph Query Answers== https://ceur-ws.org/Vol-2738/LWDA2020_paper_7.pdf

Modeling Interdependent Preferences over
Incomplete Knowledge Graph Query Answers

Till Affeldt1[0000−0001−6440−5654] , Stephan Mennicke2[0000−0002−3293−2940] , and
Wolf-Tilo Balke1[0000−0002−5443−1215]
1
Institute for Information Systems, TU Braunschweig, Braunschweig, Germany
2
Knowledge-Based Systems Group, TU Dresden, Dresden, Germany

Abstract. We study the properties of optimal patterns, our novel exten-
sion of Sparql, that allows for a fine-grained control over incompleteness
of query answers in knowledge graphs. Optimal patterns encode pref-
erences over the completeness of query answers. In this paper, we add
language constructs for expressing dependencies between conflicting pref-
erences. Furthermore, we provide discussions and proofs of fundamental
results concerning Sparql with optimal patterns.

Keywords: knowledge graphs · structural preferences · Sparql.

1 Introduction

One of the major difficulties in working with knowledge graphs is that entities
may vastly differ in their structural representation, even if they are of the same
kind. Indeed, due to limitations of the underlying sources and knowledge extrac-
tion processes it is quite common that in practical instances two entities (repre-
sented as nodes or resources) share a type, but may be characterized in totally
different ways regarding their properties. The consequences are unexpectedly
small or even empty result sets when querying basic graph patterns (conjunctive
queries), usually followed by a lot of manual query refinements in the retrieval
process. Thus, operators for handling such heterogeneity are mandatory when
designing a robust query language for knowledge graphs. Sparql [11] already of-
fers optional patterns, which handle incompleteness in RDF graphs by returning
complete matches (i. e., mandatory plus optional parts) and incomplete matches.
Optional patterns and well-behaving subclasses of them have been heavily dis-
cussed over the last decade [12,13,2,6,4]. However, the diversity of result types
remains unchanged. Suppose we query for three patterns A, B, and C:

A OPTIONAL B OPTIONAL C.

Then we may get results satisfying A alone, results that additionally satisfy B
or C, and results that satisfy all three patterns. In fact, the number of result
Copyright c 2020 by the paper’s authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
schemas can be exponential in the number of OPTIONAL operators in a query.
We recently proposed optimal patterns [1], which come with a built-in preference
semantics [8], so that only maximal results are returned. In the example above,
we would not return results satisfying A alone when there are results that satisfy
more patterns.
The new OPTIMAL operator for Sparql enables the expression of structural
as well as some value preferences. These preferences can be used to satisfy a user’s
information need if no perfect results can be found. For instance, a user may ask
for a car with sunroof and air conditioner that costs less than 10,000e. If such
configurations are not available in the data, cars with fewer features or higher
prices are still relevant. Evaluating queries that use our OPTIMAL operator
automatically retrieves the best possible answers. Thus, a user can receive useful
information instead of having to manually tweak her search settings until a result
can be found.
OPTIMAL positions itself between the rather restrictive query conjunction
and the loose optional patterns of Sparql. In our previous work [1] we have
shown that the operator can be used in real-world scenarios to achieve fine-
grained control over inherent incompleteness of RDF knowledge bases. It can
also be used to model multiple preferences simultaneously, both of similar as well
as prioritized relevance. Moreover, using optimal patterns results in readable
and maintainable query representations. After we introduce some notational
conventions in Sect. 2, we briefly recap syntax and semantics of optimal patterns
in Sect. 3. In addition to what has been shown in our previous work, we discuss
fundamental design decisions and properties of optimal patterns (Sections 3.1
and 3.2). In particular, we discuss how optimal patterns behave in open world
knowledge bases like RDF.
Optimal patterns alone lack the ability to describe complex dependencies
between preference patterns. As we will argue with concrete query examples in
Sect. 4.1, we cannot express what we call positive and negative dependencies. For
instance, the preferred color of a car may only be relevant for a certain type of
car but irrelevant for others. Similarly, optimal patterns do not allow for model-
ing the next-best preference if one preference cannot be satisfied. Therefore, we
extend our earlier work introducing structural preferences into Sparql with a
focus on retrieving reasonably sized result sets over incomplete data sets. We an-
alyze how multiple preferences can affect each other leading to a characterization
of different types of dependencies (Sect. 4). In order to cope with these different
types, we then introduce a suitable set of operators to be used in conjunction
with OPTIMAL.
Modeling preferences with Sparql has been the goal of several language ex-
tensions [5,10,14,15]. However, all of these works focus on a preference model
based on preferred data values, i. e., they allow for expressing certain prefer-
ences over the attribute values such as the lowest price for a car or a specific set
of brands. But, while these extensions work quite well for value-based ranking
and selection operations, users cannot express preferences with respect to the
completeness of query results. Indeed, expressing a simple query such as I am
looking for cars, preferably including information on prices and/or brands will
result in rather cumbersome expressions. And even if a user’s preferences are not
completely satisfiable in some given knowledge graph, he/she is still looking for
cars and expects that cars are returned whenever possible. Another difference
is that value preferences do usually not consider structural preference depen-
dencies. This is because no structural dependencies can exist if the returned
results are guaranteed to be structurally complete. In this case also the need
for non-structural dependencies is greatly reduced: The same results can often
be achieved by defining a priority order for preferences, in particular by eval-
uating independent preferences first. For a detailed discussion of other related
approaches see [1].

2 Preliminaries
Since Sparql is the recommended query language for the Resource Description
Framework (RDF) [11], we base our notions on that of RDF. Therefore, assume
infinite and disjoint universes of IRIs I and literals L. The core construct of RDF
is that of an RDF triple (s, p, o) ∈ I × I × IL3 In such a triple, predicates (p)
connect resources (s) to other resources or actual data values (o). Sets of RDF
triples form RDF graphs G.
Sparql offers expressions for querying information from RDF graphs. We
restrict our presentation to pattern matching capabilities, e. g., leaving out path
queries. Abstract syntax and semantics of Sparql follow the standard nota-
tion [11,12]. Let V denote the universe of variables, e. g., x, y, z ∈ V. A triple
pattern is a triple (u, a, v) ∈ VI × VI × VIL, i. e., variables x ∈ V may occur
in every component of triple patterns. Sets of triple patterns G also relate to
graph-like structures, then called basic graph patterns (BGPs).
Let t = (u, a, v) be a triple pattern and G an RDF graph. A partial function
µ : VIL → IL is called a match of t in G iff (a) µ(c) = c for all c ∈ IL,
(b) {u, a, v} ⊆ dom(µ)4 , and (c) (µ(u), µ(a), µ(v)) ∈ G. By ∅IL we denote the
empty match, being the partial function that respects (a) but is otherwise unde-
fined. The notion of a match naturally extends to BGPs G by requiring that µ
is a match of all triple patterns t ∈ G. We denote the set of all matches of t (G,
resp.) in G by JtKG (JGKG , resp.).
Sparql further supports complex queries in terms of unions, query conjunc-
tions, optional patterns, and filter conditions. The semantics of such queries Q
(w. r. t. RDF graphs G) can be found in [12] and is denoted by JQKG . The core
notion of compatibility for query conjunctions and optional patterns is also re-
quired for a formal account of OPTIMAL. In general, two partial functions µ1 , µ2 :
VIL → IL are compatible, denoted µ1 µ2 , iff for all x ∈ dom(µ1 ) ∩ dom(µ2 ),
µ1 (x) = µ2 (x), i. e., µ1 and µ2 agree on the variables they share. In conjunctions
and optional patterns, only compatible matches to subpatterns are joined in the
result sets.
3
We use IL as shorthand for I ∪ L.
4
dom(µ) refers to the set of all elements of VIL for which µ is defined.
3 SPARQL with OPTIMAL
Throughout this section, we present our extension of Sparql by optimal pat-
terns. Additional examples, an encoding by standard Sparql, as well as an
initial evaluation may be found in [1]. Here, we give a brief account of the syn-
tax and semantics. Furthermore, we discuss the properties of optimal patterns
formally (cf. Sections 3.1 and 3.2).
As with the other Sparql structures, OPTIMAL combines queries Q1 and
Q2 to an optimal pattern Q1 OPTIMAL Q2 . The intuition of optimal patterns
is that we state a preference for matches that are complete w. r. t. Q1 and Q2 ,
i. e., in the optimal case, we get the matches of Q1 AND Q2 . Only if this goal is
impossible to reach, i. e., there could be matches of Q1 but no compatible ones of
Q2 , we expect the matches of Q1 . The semantics of optimal patterns will adhere
to the following property:

JQ1 AND Q2 KG ⊆ JQ1 OPTIMAL Q2 KG ⊆ JQ1 OPTIONAL Q2 KG (1)

More concretely, if there are matches of Q1 AND Q2 in G, then JQ1 OPTIMAL
Q2 KG = JQ1 ANDQ2 KG . Unlike the respective optional pattern Q1 OPTIONALQ2 ,
we do not include a single match of only Q1 in this case. This matching behavior
allows us to express preferences over the completeness of the matches w. r. t. a
given RDF graph. Only if JQ1 ANDQ2 KG = ∅, the respective semantics of optimal
patterns and optional patterns align.
As long as Q1 and Q2 are free of optimal patterns, the previous paragraph
tells us the whole story of the new operator. The combination of (several) opti-
mal patterns with filter constraints allows for formulating meaningful preference
patterns on data values and structure.
Example 1. OPTIMAL can be used to model preferences over attribute values.
A simple price preference could be: I prefer cheap cars under 15,000e over other
cars under 20,000e over more expensive cars. It can be adequately modeled in
the form of: ?car rdf:typeof Car
OPTIMAL { ?car ex:price ?price FILTER( ?price < 15000 ) }
OPTIMAL { ?car ex:price ?price FILTER( ?price < 20000 ) }
The query will return all cars that are cheaper than 15,000e if any exist. If not,
then it will return all cars that are cheaper than 20,000e instead. If no such car
exists either, then the query will return all cars in the data set that are more
expensive or have no available price information.

Example 2. A user can pose structural preferences as well. Such a query could
be I prefer to see cars with available price and brand information. Such a query
could be adequately modeled as: ?car rdf:typeof Car
OPTIMAL { ?car ex:price ?price }
OPTIMAL { ?car ex:brand ?brand }
The query will return all cars with information for price and brand. If no such
cars can be retrieved, it will instead return all cars with price information. If
that also results in an empty set, then the query will return all cars with brand
information instead. Only if all of these attempts yield empty results, then the
query will return all cars instead.

The last example shows how optimal patterns may be used to impose an or-
dering on the preferred structure. To allow for the formulation of preferences of
equal importance, we extended the syntax of optimal patterns Q1 OPTIMAL Q2 ,
allowing Q2 to be a finite list of queries Q12 , Q22 , . . . , Qn2 , i. e., the syntax of these
optimal patterns is
Q1 OPTIMAL (Q12 , Q22 , . . . , Qn2 ).
The intuition is still that Q1 is necessary to be matched. However, a simple cor-
respondence as in Equation (1) cannot be derived. In the optimal case, we find
matches of all of the queries in the optional pattern, i. e.,
JQ1 AND Q12 AND Q22 AND . . . AND Qn2 KG ⊆ JQ1 OPTIMAL (Q12 , Q22 , . . . , Qn2 )KG .
Whenever we cannot reach the optimal case, we will obtain dominant matches
as answers.
Example 3. Given the query from Example 2, a possible answer could be a car for
a price of 17,000e but of no known brand. This result contains more information
than one with completely unknown attributes. Thus, the first match dominates
the second one. A match concerning a third car with a price of 18,000e and
a known brand contains even more information on the other hand. Thus, the
first match is dominated by the third one. If we consider price and brand to
be of equal importance, then the first match would not dominate a fourth one
that only has brand information. The two matches would be incomparable. If
we consider the price to be more important instead, then the fourth match does
not appear in the result set. The just sketched process is like asking for a skyline
over the price (which does not include the fourth match) and then construct a
skyline over the brand for the previous results.
Hence, the new construct requires a Pareto-style semantic, which entails a skyline
of matches. Subsequently, we order candidate matches by Pareto dominance.
Only the maximal candidates (w. r. t. Pareto dominance) are considered matches
of optimal patterns.
Definition 1 (Candidate Matches). Let Q be an optimal pattern, i. e.,
Q = Q0 OPTIMAL (Q1 , Q2 , . . . , Qk )
for some integer k > 0. A partial function µ : VIL → IL is called a candidate
match of Q in G iff there are µ0 , µ1 , µ2 , . . . , µk , such that 1. µ0 ∈ JQ0 KG , 2. µi ∈
JQi KG ∪ {∅IL } (0 < i ≤ k), 3. µi µj for all i, j ∈ {0, 1, 2, . . . , k}, and 4. µ =
µ0 ∪ µ1 ∪ µ2 ∪ . . . ∪ µk .
The first requirement accounts for the fact that Q0 is the necessary pattern to
be matched. Note, every other part of µ may be the empty match ∅IL . Using
the separation of candidate matches µ of Q (in G) into µ0 , µ1 , µ2 , . . . , µk , we
say that µ covers Qi (0 ≤ i ≤ k) iff µi 6= ∅IL . Note, Q0 must be covered
by all candidate matches. We denote by coverQ (µ) the set of all sub-queries
Qj ∈ {Q0 , Q1 , Q2 , . . . , Qk } covered by µ. Then the semantics of optimal patterns
boils down to the maximal matches w. r. t. inclusion of the sets of covered sub-
queries.

Definition 2 (Semantics of optimal patterns). Let Q be an optimal pat-
tern, G an RDF graph, and µ, µ0 two candidate matches of Q in G. µ0 dominates
µ w. r. t. Q, denoted by µ ≺Q µ0 , iff coverQ (µ) ( coverQ (µ0 ).
Candidate match µ of Q in G is called a match of Q in G iff there is no
candidate match µ0 of Q in G with µ ≺Q µ0 . The set of all matches of Q in G
is denoted by JQKG .

This formalization allows us to derive correctness of (1) for the special case of
optimal patterns with a single query as the right-hand side.

Proposition 1. For all Sparql queries Q1 and Q2 , (1) holds.

Proof. Let µ be a match of Q1 AND Q2 . Then there are matches µi ∈ JQi KG
(i = 1, 2) with µ = µ1 ∪ µ2 and µ1 µ2 . Thus, µ is a candidate match of
Q0 = Q1 OPTIMAL Q2 . Furthermore, every match of Q1 AND Q2 covers Q1 and
Q2 in Q0 . Hence, µ cannot be dominated by any other match of Q0 in G.
Let µ0 be a match of Q0 = Q1 OPTIMAL Q2 . We need to show that µ0 is a
match of Q1 OPTIONAL Q2 . Towards a contradiction assume, µ0 is not a match
of Q1 OPTIONAL Q2 . Then µ0 ∈ JQ1 KG but there is a match µ00 ∈ JQ2 KG , such
that µ0 µ00 . But then µ0 ∪ µ00 is a candidate match of Q0 and
coverQ0 (µ0 ∪ µ00 ) = {Q1 , Q2 } ) {Q1 } = coverQ0 (µ0 ),
which contradicts the assumption that µ0 is a match of Q0 , i. e., µ0 is not domi-
nated by any other candidate match. t
u

Subsequently, we justify the syntax style of query lists in optional patterns and
we consider RDF’s open world assumption in the context of optimal patterns.

3.1 Justifying Optimal Patterns

Our divergence from the standard binary operator structure of Sparql may
seem peculiar at first. However, this was a necessary step to allow for express-
ing preferences of equal importance. Suppose, we want to express a preference
of query Q over Q1 and Q2 , but Q1 and Q2 are equally important. Without
the query list notation, we have several candidates using optimal patterns and
standard Sparql operators:
– Q OPTIMAL(Q1 UNIONQ2 ): This query returns all the matches of Q, prefer-
ably joined with matches of Q1 or Q2 . However, as soon as Q1 and Q2 have a
shared variable, we will hardly see any matches fulfilling all three sub-queries.
– Q OPTIMAL (Q1 AND Q2 ): From this query we may expect only matches
that fulfill all three sub-queries or only Q. Hence, if Q1 cannot be matched,
we will also not see matches that fulfill Q and Q2 .
– (Q OPTIMAL Q1 ) AND (Q OPTIMAL Q2 ): This query is quite close to the
desired Pareto query. Here, Q1 and Q2 are seen as independent through the
use of query conjunction. For queries of that shape we may actually observe
matches fulfilling all three sub-queries or only a subset of these. However,
this query also accommodates some weird matching behavior. Suppose, Q1
and Q2 share a variable x, that does not occur in Q, but Q1 and Q2 cannot
simultaneously be matched in some RDF graph G. Then the query above
may easily yield no results at all, although Q may be matched together with
Q1 or Q2 separately.
Besides the possibilities above, it is always possible to unfold the Pareto query ex-
ponentially. However, such queries are hardly readable and maintainable. There-
fore, we decided to expand Sparql’s operator set in this respect.

3.2 Optimal Patterns and Certain Answers

How well do optimal patterns cope with RDF’s open world assumption (OWA).
The OWA influences the way the query semantics is actually executed. We often
see query semantics following the certain answers perspective [2]. A match is a
certain answer of a query Q in some OWA database iff the match is a match
in every possible interpretation of the OWA database. Regarding RDF, every
superset RDF graph H of G is a possible interpretation of G. According to
Arenas and Pèrez, the certain answers perspective for Sparql is manifested by
T
CertainAnswers(Q, G) := H⊇G JQKH .
Iterating over all infinitely many extensions H of G is infeasible in practice.
Therefore, handy characterizations can be proven. One such characterization is
concerned with the monotonicity of the given query Q. A query Q is mono-
tone iff for all RDF graph G ⊆ H, JQKG ⊆ JQKH , i. e., adding more informa-
tion may only lead to more query results. Since more information may lead to
different matches that cover more sub-queries, Arenas and Pèrez introduced
the notion of weak monotonicity. A query Q is weakly monotone iff for all
RDF graphs G ⊆ H. µ ∈ JQKG implies the existence of a µ0 ∈ JQKH with
µ ⊆ µ0 . It can easily be shown that in the case of (weakly) monotone queries Q,
CertainAnswers(Q, G) coincides with JQKG .
Unfortunately, not all queries containing optimal patterns are weakly mono-
tone. For instance, the third query shape in Sect. 3.1 is not monotone. Consider
Q1 and Q2 to share a variable x, that is not shared with Q. Now construct G
in such a way that it fulfills Q and Q1 but not Q2 . Disjointly add a minimal
structure to G that fulfills Q and Q2 but not Q1 . Then we simultaneously match
Q and Q1 as well as Q and Q2 , but the respective matches are necessarily in
conflict in x. Therefore, the result set will easily be empty, where it used to be
non-empty when Q2 was violated.
Hence, optimal patterns are neither better nor worse than optional patterns,
at least w. r. t. certain answers. It is simply two different styles of querying and
we have to let the user decide which style is appropriate in which scenarios.
4 Adding Dependencies over Preferences

In this section we propose a solution for modeling dependencies by introducing an
additional set of operators that complement OPTIMAL. Preferences may exist in
various different types and combinations. Being able to model such dependencies
is necessary to adequately model complex user requests.
Sometimes a subquery preference simply cannot return useful results for a
given preference unless another part of the query also returns valid matches. This
is the case when a query asks for a possibly missing node that is only indirectly
related to an enforced match. For example, a query may ask for a person’s car as
well as their car’s model. The person’s car can be trivially retrieved. However,
if that person does not own a car, it is impossible to give an adequate answer
regarding the car’s model. We call this kind of dependency a structural depen-
dency because these directly depend on the (in-)completeness in the knowledge
graph structure. In contrast, we call all dependencies that relate to data values
within the modeled preference as non-structural dependencies, e. g., a user may
prefer the color red for sports cars but not necessarily for other cars as well.
If one preference cannot be answered unless the found answers satisfy a spec-
ified condition, i. e., also cover a more relevant preference, we speak of a positive
dependency. The previously given examples fall into this category. We speak of
negative dependencies if one preference should only be answered if a specified
condition remains unsatisfied, i. e., a specific and more relevant preference is not
covered. For example, a user may prefer the color blue for any car that is not
a sports car with either no color preference, or a different one for sports cars.
Negative dependencies also model different alternatives of unequal importance,
e. g., a user interested in buying a car needs to know about the price of the
object. Most likely, that user will prefer offers made in his own currency. If no
suitable matches can be found, offers in foreign currencies may still be relevant.
Dependencies are not guaranteed to be 1:1 relations. Especially structurally
dependent preferences often rely on the same condition, implying a 1:m relation.
For example, asking for a person’s car and its properties will result in a structural
dependency of all properties towards the retrieval of a matching car. When
modeling multiple alternatives of unequal importance, the opposite behavior
is true. For example, a car manual should only be displayed in an unknown
language if it is not available in any language the user is fluent in. This yields
an n:1 relationship between preferences. Mixed types of questions can also lead
to general n:m relations between preferences which need to be accounted for.

4.1 Limitations of OPTIMAL

Expressing interdependent preferences is already possible using existing Sparql
operators which requires significant effort, even more so than individual pref-
erences. By the new OPTIMAL operator we have developed a tool to ease this
process. We have also shown how it can be used for individual preferences. How-
ever, it is still not ideal for modeling dependencies.
Let us assume a user is looking for a car, preferably a sports car. If it happens
to be a sports car, then the user prefers it to be red – otherwise any color is
acceptable. Simultaneously, the user also prefers cars with available price infor-
mation. Using multiple optimal patterns in a naive way (as shown in Example 4)
does not represent these requirements.
Example 4. The following query will look for all cars, preferably sports cars.
Among those, it will look for red ones. This works well if the database contains
any sports car.
?car rdf:typeof Car
OPTIMAL { ?car rdf:typeof SportsCar, ?car ex:hasPrice ?price }
OPTIMAL { ?car ex:hasColor "red" }
Additionally, the query also describes a dominance of red cars over non-red cars.
A more accurate model can be achieved by nesting one optimal pattern into an-
other one, being a technique also commonly used when modeling interdependent
optional patterns.

Example 5. The following query looks for sports cars with red color first and
then combines it with other preferences. As a result, red sports cars will be
preferred over other sports cars, and sports cars in general will be preferred over
other cars.
?car rdf:typeof Car OPTIMAL {
?car rdf:typeof SportsCar OPTIMAL { ?car ex:hasColor "red" },
?car ex:hasPrice ?price }
This time, red cars that are no sports cars will no longer be preferred over other
cars. So we managed to fix the underlying problem in our model. However, the
model is still not correct. Because the color preference is evaluated first, red
sports cars will always be ranked better than non-red sports cars. This also
means that a red sports car without price will be ranked higher than a non-red
sports car with a price. In our use case, these two preferences are meant to be
incomparable.

Modeling preferences with optimal patterns alone causes side effects by intro-
ducing unwanted ranking of actually incomparable query answers. For sure, it is
always possible to model them correctly using complex filter conditions (cf. en-
codings in [1]). However, our goal is to ease the modeling process. The required
use of unintuitive filters is very similar to the original motivation. Thus, we
would like to have a simple set of operators that lets a user define dependencies
between preferences directly.

4.2 Syntax and Semantics of THEN and OTHERWISE

In order to enable an easy way of modeling dependencies, we introduce two new
operators called THEN and OTHERWISE. Both operators can be used on the
right-hand side of an OPTIMAL operator in order to fine-tune requirements of
a preference’s matching behavior. A THEN B defines a positive dependency of
B towards A, meaning that preference B should only be answered if preference
A also yields a non-empty set of answers. A OTHERWISE B, on the other hand,
defines a negative dependency of B towards A, meaning that preference B is only
answered if preference A does return an empty set of answers.
For our OPTIMAL operator we have already introduced lists of preferences
for simultaneous evaluation (cf. Sect. 3). We apply the same concept to both
sides of the new operators. For a preference A and a set of dependent pref-
erences B towards A, A THEN (b1 , b2 , ..., bm ) applies a matching restriction for
all dependent preferences bj ∈ B. They will only be answered if preference A
is covered as well. Similarly, A OTHERWISE (b1 , b2 , ..., bm ) applies a negative de-
pendency restriction to all bj ∈ B. For a set of independent preferences A and
dependent preference B towards A, (a1 , a2 , ..., an ) THEN B applies a matching
restriction on B. It will only be answered if at least one ai ∈ A is covered. When
modeling a restriction towards all preferences in A (instead of any member) a
simple conjunction can be used in order to turn the conditions into a single one.
This method is best suited for handling alternative routes that lead to similar
objects. Likewise,(a1 , a2 , ..., an ) OTHERWISE B applies a matching restriction on
B. It will only be answered if none of the preferences ai ∈ A are covered. If both
A and B are a set of preferences then both the respective restrictions apply for
all bj ∈ B towards all ai ∈ A.
Just like the OPTIMAL operator, we define THEN and OTHERWISE as left-
associative. That way, the syntax for all following preferences Bi ∈ {B1 , ..., Bn }
of the form A 1 B1 2 ... n Bn (with i ∈ {THEN, OTHERWISE}) will always
imply a relation towards A, making the query easier to read. Query groups and
preference list separators can be used to model the order of execution explicitly.
Example 6. The following query adequately models the query we were looking
for in Sect. 4.1 and thereby resolves our issues with optimal patterns.
?car rdf:typeof Car OPTIMAL
({ ?car rdf:typeof SportsCar } THEN { ?car ex:color "red" },
?car ex:price ?price )

Example 7. In the following query (with negative dependencies), we consider the
price in US-Dollar as a fallback if no offers in Euros can be found. Finding both
currencies is not better than only Euros, though.
?car rdf:typeof Car OPTIMAL
({ ?car ex:price EUR ?p eur } OTHERWISE
{ ?car ex:price USD ?p usd })

4.3 Encoding Dependencies
In our previous work [1], we have demonstrated that optimal patterns can be en-
coded using existing query operators in order to work under standard Sparql 1.0
semantics. The same holds for the new set of operators. We have found two dif-
ferent styles of encodings – one using UNION and one using OPTIONAL. Since
the use of optional patterns follows more or less standard procedure, we restrict
our presentation to the union-style encoding.
The encoding using UNION has proven to be more stable regarding perfor-
mance. For this style we first construct a superset of desired results by combining
all possible preference combinations. For every such combination we bind a fresh
variable in the respective subquery to mark which combination an answer candi-
date is generated from. As a next step, we retrieve the results for every subquery a
second time (but with changed variable names) inside an EXISTS-operator. That
way, we can determine which combinations yield at least one candidate match.
Lastly, we use FILTER-conditions to remove any answers that are dominated by
at least one combination with a non-empty set of matches.
This method is easily changed to adapt for the new operators. When con-
structing the superset, we have to redefine what a possible combination is. Every
THEN or OTHERWISE operation results in additional constraints on this set. A
preference A that is dependent on B is not supposed to be matched unless a map-
ping for B also appears in the answer. Thus, any combination including mappings
for A but not B has to be removed from the superset. Likewise, a preference A
that has a negative dependency on B is only supposed to match if no mapping
for B appears in the answer. Thus, any combination including mappings for both
A and B has to be removed from the superset. Our prototypical implementation5
can easily be extended.
In terms of performance, we leave out an extensive analysis here. Usage of the
new operators only results in removal of answer candidates. Thus, any encoded
query using THEN or OTHERWISE will be shorter and most likely slightly faster
than an independent preference.

5 Conclusion
We have presented a new set of operators for expressing structural preferences
over the completeness of knowledge graph query results. The primary extension
consists of so-called optimal patterns, which we introduced in [1]. Here, we added
the possibility of stating semantic dependencies between conflicting preferences.
Furthermore, we discussed fundamental design decisions.
Although our discussion regarding certain answers (cf. Sect. 3.2) attests opti-
mal patterns to have a similar behavior as optional patterns, a detailed discussion
about the expressive power/complexity may still reveal important differences.
For future work, we plan to overcome the limitations we observed in [1] with
our encoding approach of evaluating optimal patterns by implementing custom
evaluation strategies that may even use Pareto-specific optimizations [3,9,7].

Acknowledgements. This work is partly supported by Deutsche Forschungsge-
meinschaft (DFG, German Research Foundation) in project number 389792660
(TRR 248, Center for Perspicuous Systems) and Emmy Noether grant KR
4381/1-1 (DIAMOND).
5
available at Github: https://github.com/ifis-tu-bs/optisparql
References
1. Affeldt, T., Mennicke, S., Balke, W.T.: Preference-driven Control over Incomplete-
ness of Knowledge Graph Query Answers. In: 12th ACM Web Science Conference
2020. WebSci’20, Southampton, United Kingdom, Association for Computing Ma-
chinery, New York, NY, USA (July 2020)
2. Arenas, M., Pérez, J.: Querying Semantic Web Data with SPARQL. In: Proceed-
ings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles
of Database Systems. pp. 305–316. PODS ’11, ACM, New York, NY, USA (2011),
event-place: Athens, Greece
3. Balke, W.T., Güntzer, U., Zheng, J.X.: Efficient Distributed Skylining for
Web Information Systems. In: Bertino, E., Christodoulakis, S., Plexousakis, D.,
Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) Advances in
Database Technology - EDBT 2004. pp. 256–273. Lecture Notes in Computer Sci-
ence, Springer, Berlin, Heidelberg (2004)
4. Cheng, S., Hartig, O.: OPT+: A Monotonic Alternative to OPTIONAL in
SPARQL. Journal of Web Engineering 18(1), 169–206 (2019)
5. Gueroussova, M., Polleres, A., McIlraith, S.A.: Sparql with qualitative and quan-
titative preferences. In: OrdRing@ ISWC. pp. 2–8 (October 2013)
6. Kaminski, M., Kostylev, E.V.: Beyond Well-designed SPARQL. In: Martens, W.,
Zeume, T. (eds.) 19th International Conference on Database Theory (ICDT 2016).
Leibniz International Proceedings in Informatics (LIPIcs), vol. 48, pp. 5:1–5:18.
Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2016)
7. Keles, I., Hose, K.: Skyline Queries over Knowledge Graphs. In: Ghidini, C., Hartig,
O., Maleshkova, M., Svátek, V., Cruz, I., Hogan, A., Song, J., Lefrançois, M.,
Gandon, F. (eds.) The Semantic Web – ISWC 2019. pp. 293–310. Lecture Notes
in Computer Science, Springer International Publishing, Cham (2019)
8. Kießling, W.: Foundations of preferences in database systems. In: Proceedings of
the 28th international conference on Very Large Data Bases. pp. 311–322. VLDB
’02, VLDB Endowment, Hong Kong, China (2002)
9. Morse, M., Patel, J.M., Jagadish, H.V.: Efficient skyline computation over low-
cardinality domains. In: Proceedings of the 33rd international conference on Very
large data bases. pp. 267–278. VLDB ’07, VLDB Endowment, Vienna, Austria
(Sep 2007)
10. Pivert, O., Slama, O., Thion, V.: Sparql extensions with preferences: A survey.
In: Proceedings of the 31st Annual ACM Symposium on Applied Computing. p.
1015–1020. SAC ’16, Association for Computing Machinery, New York, NY, USA
(2016). https://doi.org/10.1145/2851613.2851690
11. Prud’hommeaux, E., Seaborne, Andy: SPARQL Query Language for RDF. Tech.
rep., W3C (2008), https://www.w3.org/TR/rdf-sparql-query/
12. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM
Trans. Database Syst. 34(3), 16:1–16:45 (Sep 2009)
13. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL Query Optimization.
In: Proceedings of the 13th International Conference on Database Theory. pp. 4–33.
ICDT ’10, ACM, New York, NY, USA (2010), event-place: Lausanne, Switzerland
14. Siberski, W., Pan, J.Z., Thaden, U.: Querying the semantic web with preferences.
In: International Semantic Web Conference. pp. 612–624. Springer (2006)
15. Troumpoukis, A., Konstantopoulos, S., Charalambidis, A.: An extension of sparql
for expressing qualitative preferences. In: International Semantic Web Conference.
pp. 711–727. Springer (July 2017)