1. Introduction

German Conference on Artificial Intelligence, September

Towards Explainability of Approximate Lifted Model Construction: A Geometric Perspective

Jan Speller

Malte Luttermann

0 1

Marcel Gehrke

Tanya Braun

2 0 German Research Center for Artificial Intelligence (DFKI) , Ratzeburger Allee 160, 23562 Lübeck , Germany 1 University of Hamburg, Institute for Humanities-Centered AI (CHAI) , Warburgstraße 28, 20354 Hamburg , Germany 2 University of Münster, Computer Science Department , Einsteinstraße 62, 48149 Münster , Germany

2025

16 2025 0000 0003

Advanced colour passing (ACP) is the state-of-the-art algorithm for lifting a propositional probabilistic model to a first-order level by combining exchangeable factors, enabling the use of lifted inference algorithms to allow for tractable probabilistic inference with respect to domain sizes. More recently, an approximate version of ACP, called -ACP, ensures the practical applicability of ACP by accounting for inaccurate estimates of underlying distributions. -ACP permits underlying distributions, encoded as potential-based factorisations, to slightly deviate depending on a hyperparameter while maintaining a bounded approximation error. To navigate through diferent levels of compression versus accuracy, a hierarchical version of -ACP has emerged that builds a hierarchy of values. In a drive towards interpretability of results, this paper looks at geometric properties of -equivalence, a central notion employed by -ACP and its hierarchical version to quantify the maximum allowed deviation between potentials. Specifically, we present a unified view on the results for -ACP and its hierarchical version and provide a geometric interpretation of -equivalence in ℒ, thereby making results more interpretable.

eol>Lifted Inference Geometric Interpretability Approximate Model Factor Graph Colour Passing

1. Introduction

Intelligent systems tasked with modelling its environment have to contend with uncertainty in the world as well as objects and relations among them. The former is canonically modelled using probabilistic graphical models, encoding features as random variables (randvars) and relations between them in a factorised probability distribution. The latter is often represented using some form of higher-order logic, describing a world through hard constraints on objects and their relations. Combining both perspectives has lead to various probabilistic relational formalisms: Parametric factor graphs (PFGs) [ 1 ] emerge from the uncertainty perspective with logical constructs added to model objects and relations among them under uncertainty. Markov logic networks [ 2 ] originate from the logic perspective with weights added to the constraints to denote how likely they are to hold. Both formalisms follow grounding semantics [ 3, 4 ], yielding a full joint probability distribution over indistinguishable grounded (propositional) randvars. Over the years, many researchers have focused on developing eficient inference algorithms in such models, exploiting these first-order structures under the name of lifting [ 1 ] for episodic reasoning [ 5, 6, 7, 8, 9, 10 ], queries [ 11, 12 ], evidence [ 9, 13 ], temporal reasoning [ 14, 15 ], and decision making [ 16, 17, 18, 19, 20, 21, 22 ], allowing for tractable inference in the number of objects [ 23 ].

While these works document the impressive progress made over the years, these works usually assume a rfist-order model lifted from the ground level to start with. How to lift a propositional model has been a more overlooked research question. Colour Passing (CP) is the state-of-the-art algorithm to turn a propositional model into a first-order one, specifically lifting a factor graph ( FG) to a PFG, grouping factors with identical potentials [ 5 ], based on the Weisfeiler-Leman algorithm for graphs [ 24 ]. CP has been revisited over ten years later to further optimise the algorithm, considering commutativity among the randvars in factors [ 25, 26, 27 ] as well as scaling of potentials [ 28 ]. To deal with the fact that identical potentials are highly unlikely to emerge naturally in a ground FG, even if the underlying objects behave almost indistinguishably, -Advanced Colour Passing (-ACP) constitutes an approximate extension that groups factors whose potentials difer by a factor of at most (1 ± ), with as a hyperparameter [ 29 ]. In a step towards making the algorithm hyperparameter free, Hierarchical Advanced Colour Passing (HACP) proceeds to build a hierarchy of values that lead to increasingly more compact encodings at the expense of a higher approximation error [30], while keeping previous compressions in tact to improve explainability.

This paper takes the most recent advances in approximating Advanced Colour Passing (ACP) and builds toward making the resulting compressed representation more interpretable by considering a geometric perspective. Specifically, this paper contributes (i) a unified overview of the results for -ACP and HACP, (ii) proofs of novel properties of -equivalence in ℒ, (iii) a discussion of their value for interpretability, and (iv) a glimpse of potential advances for approximate lifted model construction.

The remaining part of this paper is structured as follows. We start with basic definitions and a recap of the results for -ACP and HACP. The main part then presents the geometric view of -equivalence in the ℒ space, including a closer look at the special case of the Euclidean space. The paper ends with a discussion and conclusion about the potential benefits of the new view and the resulting possibilities for further generalisations of ACP and its approximate variants. 2. Factor Graphs and Approximate Colour Passing To establish a formal foundation for lifted probabilistic inference under approximation, we use the same notation and definitions of Luttermann et al. [ 29 ] and its continuation [30]. The following sequence introduces FGs, probabilistic queries, and then progresses towards the central notion of -equivalent factors, which builds the backbone of the initial approximate lifted model construction approaches (i.e., -ACP and HACP) and the foundation of a geometric perspective developed in Section 3. First, properties of the introduced concepts are followed by a summary of the algorithms -ACP and HACP for approximate lifted model construction and their asymptotic behaviour in Section 2.1 and Section 2.2. Definition 1 (Factor graph [29, Def. 1]). An FG = ( , ) is an undirected bipartite graph consisting of a node set = ∪ Φ, where = {1, . . . , } is a set of randvars and Φ = {1, . . . , } is a set of factors (functions), as well as a set of edges ⊆ × Φ. There is an edge between a randvar ∈ and a factor ∈ Φ in if appears in the argument list of . A factor (ℛ ) defines a function : × ∈ℛ range() ↦→ R>0 that maps the ranges of its arguments ℛ (a sequence of randvars from ) to a positive real number, called potential. The term range() denotes the possible values a randvar can take. We further define the joint potential for an assignment (with being a shorthand for = ) as (1) (2) where is a projection of to the argument list of . With = ∑︀ ∏︀ =1 ( ) as the normalisation constant, the full joint probability distribution encoded by is then given by

() = ∏︁ ( ),

=1 () = 1 ∏=︁1 ( ) = 1

(). 1 2 true true false false true true false false true false true false true false true false 1(, ) 1 2 3 4 2(, ) 1 2 3 4

Example 1 (Factor Graph). Take a look at the FG depicted in Fig. 1. For the sake of the example, it holds that = {, , }, Φ = {1, 2}, = {{, 1}, {, 1}, {, 2}, {, 2}}, and range() = range() = range() = {true, false}. The potential tables (i.e., function definitions) of the factors 1 and 2 are shown next to the graph on the right. Specifically, it holds that 1(true, true) = 1, 1(true, false) = 2, and so on, where ∈ R>0, = 1, . . . , 4, are arbitrary positive real numbers.

An FG enables an eficient representation of complex distributions by decomposing them into factors over subsets of variables. Each factor captures an aspect of the joint distribution, giving rise to natural queries that we will define next to determine the objectives of probabilistic inference. Definition 2 (Query [29, Def. 2]). A query ( | 1 = 1, . . . , = ) consists of a query term and a set of events { = }=1 where and all , = 1, . . . , , are randvars. To query a specific probability instead of a distribution, the query term is an event = .

Example 2 (Query). The query ( | = true) asks for the probability distribution of given that the event = true is observed.

To enable approximate inference and structure compression, the notion of -equivalence has been introduced by Luttermann et al. [ 29 ]. -equivalence relaxes exact equality between potentials and enables coarser, more eficient representations under bounded deviation.

Definition 3 (-Equivalence [29, Def. 3]). Let ∈ R>0 be a positive real number. Two potentials 1, 2 ∈ R>0 are -equivalent, denoted as 1 = 2, if 1 ∈ [ 2 · (1 − ), 2 · (1 + )] and 2 ∈ [ 1 · (1 − ), 1 · (1 + )]. Further, two factors 1(1, . . . , ) and 2(1′, . . . , ′) are -equivalent, denoted as 1 = 2, if there exists a permutation of {1, . . . , } such that for all assignments (1, . . . , ) ∈ × =1range(), where 1(1, . . . , ) = 1 and 2( (1), . . . , ()) = 2, it holds that 1 = 2.

Example 3 (-Equivalence). Consider the potentials 1 = 0.49, 2 = 0.5, and let = 0.1. Due to 2 = 0.5 ∈ [ 1· (1− ) = 0.441, 1· (1+) = 0.539] and 1 = 0.49 ∈ [ 2· (1− ) = 0.45, 2· (1+) = 0.55], it holds that 1 and 2 are -equivalent (for = 0.1).

In general, it might happen that indistinguishable randvars are located at diferent positions in the argument list of their respective factors, which is the reason the definition of -equivalence involves permutations of arguments. For simplicity, in this paper, we stipulate that is the identity function (that is, we assume that for two -equivalent factors, all potentials in their potential tables are row-wise -equivalent). However, all results of this paper also apply to any other choice of [ 29 ].

While the above definition captures only indirectly the smallest possible value 0 for which equivalence holds, Speller et al. [30] introduced an easier access to the concept via a vector-based distance measure called one-dimensional -equivalence distance (1DEED), which enables us to quantify deviations pairwise between factors and decide -equivalence in a scalable and computationally more accessible manner. In the following, we denote the potential table of a factor as a vector in R> 0, where () denotes the -th entry, i.e., the potential associated with the -th row, in the potential table of . Definition 4 (One-dimensional -equivalence distance and -equivalence [30, Def. 4]). 1DEED, defined as the mapping ∞ : R> 0 × R> 0 → R for two n-dimensional vectors 1, 2 ∈ R> 0, is given by ∞(1, 2) :=

max =1,..., ⃒⃒

max =1,..., ︂{

|1() − 2()| min{|1()|, |2()|} (iii) -equivalence is not transitive.

(ii) It holds that ∞(1, 2) = 0 if and only if |1() − 2()| = 0 for all = 1, . . . , , which holds

Theorem.

∞(1, 2) ≤ holds.

Theorem 1 ([30, Thm. 2]). Two vectors 1, 2 ∈ R> 0 are -equivalent (Definition 3) if and only if

The next lemma provides a brief overview of the main characteristics of 1DEED and -equivalence. Lemma 2 ([30, Cor. 1, Prop. 9]). The following properties hold.

(i) 1DEED is non-negative and symmetric. (3) (4) (5) ( , ) =

︁) 2 ∑︁ (︁ (1, . . . , ) − (1, . . . , ) , 1,..., * of each group is chosen as with = 1, . . . , denoting all possible assignments to the arguments of and . The representative

With Def. 4, the concept of -equivalence can be re-expressed in terms of 1DEED, which Speller et al. [30] demonstrate to be equivalent to the original definition of -equivalence (Def. 3) via the upcoming * = arg min ∑︁ ( , ),

∈

Since transitivity of -equivalence does not hold (Lemma 2, iii), -equivalence cannot be considered an equivalence relation and thus is not suitable for defining equivalence classes. This is the main reason why the consideration of a hierarchical variant makes sense, because there is not necessarily a unique ordering of groups for all possible -values.

For more comprehensive examples of the defined concepts and the proofs of their first properties, we refer the reader to the detailed descriptions given by Luttermann et al. [ 29 ] and Speller et al. [30].

2.1. The Algorithms -ACP and HACP

two factors and is defined as the sum of squared deviations Both the -ACP algorithm and the HACP algorithm are primarily based on the ACP algorithm [ 25 ]. The ACP algorithm aims to identify symmetries within an FG, specifically those arising from exactly matching factors (that is, factors whose potentials are strictly equal). In this context, scalar multiples of factors can also be treated as equivalent, as a rescaling to the same magnitude can be achieved via normalisation [31]. In case the number of detected symmetries is insuficient or an

FG with potentials

estimated from data is given, for which scalability is desired, the concept of -equivalence (Def. 3 and Def. 4) has been introduced to approximate the construction of a lifted (compressed) model. The concept of -equivalence forms the foundation for the development of the -ACP algorithm.

Given ≥ 0, groups = {1, . . . , } of pairwise -equivalent factors ∈ R+ are determined.

A representative factor is chosen for every group by computing the optimal solution that minimises the loss function between the representative and every factor in , where the loss function between to minimise the total deviation between * and the group. All ∈ are replaced by the representative * , which is computed as the arithmetic mean over all factors in the group [29, Thm. 1]: * () = =1

1 ∑︁ ().

Replacing a group of factors by a single representative significantly reduces the storage requirements and thus also drastically speeds up run times for probabilistic inference depending on the size of , provided suficient symmetries are present. For more details, see the work by Luttermann et al. [ 29 ].

Building on the sorted groups of pairwise -equivalent factors, a hierarchical algorithm (HACP) has been introduced [30, Alg. 2] to ensure a consistent comparison of various approximate compressed FGs and to guide the choice of suitable values. To this end, an ordering of group memberships is ifrst established based on 1DEED [30, Alg. 1], which can be interpreted as an agglomerative clustering algorithm using 1DEED as a base distance. Only completely pairwise -equivalent groups are allowed to be merged (complete linkage within maximal deviation). The design of the implementation of the -ACP algorithm and the HACP algorithm guarantees the following useful properties. Corollary 3 ([29, Cor. 2]). If = 0, -ACP returns the same PFG as ACP.

Corollary 4. If = 0, HACP returns the same PFG as ACP.

Proof. Follows directly from [30, Prop. 4], with = 0. 2.2. Asymptotic Properties A key advantage of the HACP algorithm lies in its ability to retain the same upper bound on the deviation of probabilistic queries as established for -ACP. For a fixed > 0, -ACP ensures that the deviation of any query result in the approximate compressed FG from the original distribution remains within a boundary. HACP, although introducing a hierarchical structure, builds upon the same guarantee. To quantify the change in query results between the original and modified (compressed) model, we adopt the symmetric divergence measure introduced by Chan and Darwiche [32], which provides a tight bound on the maximal deviation between two distributions and ′ with respect to any assignment via ′ () ′ () ( , ′ ) := ln max () − ln min ()

The next results are proven asymptotic properties for the -ACP and HACP algorithm shown by

Luttermann et al. [29] and Speller et al. [30], respectively.

Theorem 5 ([29, Thm. 7], [30, Prop. 4]). Let = ( ∪ Φ, ) be an FG and let ′ be the output of -ACP or the output of HACP when run on . With and ′ being the underlying full joint probability distributions encoded by and ′, respectively, and = |Φ|, it holds that ( , ′ ) ≤ ln ︃( (︀ 1 + − 1 )︀( 1 + )︀ )︃ where the bound given in Eq. (8) is optimal (i.e., sharp).

Even though the worst-case bound can be attained in constructed examples, such cases are rare in practice. Nonetheless, the need to control this bound persists, as the selection of a specific for either algorithm can be guided by a prior assessment of the maximum permissible deviation. This enables a more informed decision before executing the actual lifted probabilistic inference algorithm and, consequently, before executing the -ACP or HACP algorithm. (6) (7) (8) (9) (10) running -ACP or HACP on can be bounded by Theorem 6 ([30, Thm. 5]). The maximal absolute deviation between any initial probability = ( | ) of given in model and the probability ′ = ′ ( | ) in the modified model ′ resulting from max Δ :=

max for any | | − ′| ≤ √ − 1 √ + 1 with = ( , ′ ).

Due to the monotonicity of √√−+11 in , Theorem 6 can easily be converted into the next corollary. Corollary 7 ([30, Cor. 6]). The change in any probabilistic query in an initial model and a modified model ′ obtained by running -ACP or HACP is bounded by ≤ √1 − 1 √1 + 1 √2 − 1 √2 + 1 max Δ ≤ with 1 = ( , ′ ) with 2 = ln ︃( (︀ 1 + − 1 )︀( 1 + )︀ )︃

. ∈ (0, 1), which is smaller or equal to

While we have calculated the deviation that follows from a specific choice of so far, we can also calculate how large can be chosen to get a maximal deviation of *Δ we want to allow. Theorem 8 ([30, Thm. 7]). For any given *Δ ∈ (0, 12 ], the output of -ACP and HACP guarantees for any with = ln before the algorithm has been started.

1 = − ︁( *Δ+1 )︁ 2 1− *Δ 1 + − 1

1 √ − 2 − 1 ⎯ ⎸ + ⎷⎸(︁ − the bound max Δ ≤ *Δ. 3. Geometric View of -Equivalence

Therefore, a calculation of 1(*Δ) bounds the maximal deviation *Δ of -ACP and HACP, respectively, than sticking to the Euclidean setting.

Although -equivalence works well in practice, it introduces a conceptual barrier due to its reduction of the relation between two factors to a single relative distance via 1DEED. This section aims to enhance the interpretability of the concept and to enable potential generalisations for -ACP and HACP. We investigate the geometric implications of -equivalence and the behaviour of pairwise -equivalent groups. All factors are assumed to lie in R+ , which corresponds to an orthant in the Euclidean space. However, to allow for a more general setting, we consider the general ℒ norm in Section 3.1 rather more precisely be replaced by the expression 1 +1 .

Before we start, we summarise the proven results on the relative ordering of lengths directly derived from the original definition of -equivalence (Def. 3). Two -equivalent factors may deviate maximally in the positive and negative direction, respectively, and due to symmetry, the expression (1 − ) can 1 +1 , 2() · (1 + )] and 2() ∈ [1() · 1 +1 , 1() · (1 + )] for = 1, . . . , .

Lemma 9 ([29, Lem. 6]). For two -equivalent factors 1, 2 ∈ R+ , it holds that 1() ∈ [2() ·

From Lemma 9, we derive a corollary reformulating the same statement in terms of quotients.

Corollary 10. If 1 = 2 holds for two factors, then it also holds that ︂[

1 1 +

︂] , 1 + for all ∈ {1, . . . , }.

(11)

Corollary 10 implies that the length deviations per dimension remain relatively small, and that deviations can be interpreted around the multiplicative identity 1. − 1-sphere in ℒ for ≥ (i) ‖1 − 2‖ ≤ , and

1. If 1 = 2 holds, then we have

Considering distances between two -equivalent, normalised factors yields the following result. Lemma 11. Let 1, 2 be two factors in R+ ∩ − 1 with − 1 := { ∈ R : ‖‖ = 1} being the unit -ball with centre and radius > 0.

(ii) 3− ∈ () ∩ − 1 for = 1, 2 with () := { ∈ R : ‖ − ‖ ≤ } being the closed Proof. By definition of -equivalence, we get for () the property

1() ∈ [(1 − )2(), (1 + )2()] for all = 1, . . . , , which is equivalent to

1() − 2() ∈ [− 2(), 2()] for all = 1, . . . , , taking into account the normalisation ‖2‖ = 1, we obtain the desired result: or in shorter form |1() − 2()| ≤ 2() for all = 1, . . . , . Substituting this into the ℒ norm ‖1 − 2‖ = ∑︁ |1() − 2()|

=1 ≤

∑︁ |2()| = ∑︁ |2()| = ‖2‖ = .

Property (ii) follows directly from property (i). The previous proof also contains a result for non-normalised factors, presented in the next corollary.

= 1, 2.

Corollary 12. Let 1, 2 be two -equivalent factors in R+ . Then, ‖1 − 2‖ ≤ ‖‖ holds for Proof. Follows from Eq. (16). By symmetry, the claim holds for = 2 and also = 1.

In contrast, when viewing the problem from the opposite direction, we can only establish a relatively weak statement about -equivalence.

Lemma 13. Let 1, 2 be two factors in R+ . If ‖1 − 2‖ ≤ , then 1 =′ 2 holds with

Proof. Consider Therefore, we get

Since |1() − 2()| ≥ ′ :=

min =1 ‖1 − 2‖ = ∑︁ |1() − 2()| ≤ . 1() ≤ + 2() = ︂( 0 for all , Eq. (18) also holds for every individually: |1() − 2()| ≤ . (12) (13) (14) (15) (16) (17) (18) (19) 1() ≥ − + 2() =

1 ︂(

)︂ − 2()

2(),

)︂ − 2()

)︂ − 1() and which implies is given by Analogously, we get the opposite inequality and interval inclusion for 2() ︂( ︂( 2(), 1 +

2() . 1(), 1 +

1() .

)︂ 2()

)︂ smallest denominator among all possibilities, which is given by min=1,...,,=1,2{()}. for = 1, 2, for = 1, . . . , , and a fixed

′ and therefore to guarantee -equivalence, we need to choose the largest () value for ′. It is easy to see that the broadest interval is generated by the

For a given > 0, we can construct an example that hits the same bound as mentioned in Lemma 13 independently of the choice of the -norm, because the worst deviation may occur in a single dimension: Example 4. Let be any value in [1, ∞) and = 0.01, then the ℒ norm of the two factors 1 = ︂( 0.01)︂ 1.0 , 2 =

min

‖1 − 2‖ = ∑︁ |1() − 2()| = 0.01 + 0 = 0.01 = and therefore fulfils the conditions of Lemma 13. Still, it leads to a comparably high value for -equivalence via 2(1) = 2.0 · 1(1) = (1 + ′)1(1) and This example can be extended with identical values 1() = 2() for all > 1 to arbitrary large dimensions . When normalising the two vectors ‖‖ = 1 for = 1, 2, the diferences in one dimension will afect the others, resulting in an ′ that depends on and the number of dimensions . Moreover, even normalisation cannot improve upon the result of Lemma 13, as entries of a single dimension can be arbitrarily small, leading to a large discrepancy between 1() and 2() in relative terms. 3.2. Groups of Pairwise -Equivalent Factors and let * () = 1 ∑︀ -equivalent factors.

The current implementation of -ACP and HACP uses the arithmetic mean over all factors in a group of pairwise -equivalent factors. An important property is that the factor defined as the arithmetic mean over a group of pairwise -equivalent factors itself is again -equivalent to all factors in the group. Lemma 14 ([29, Lem. 5]). Let = {1, . . . , } denote a group of pairwise -equivalent factors in R+ =1 () for all assignments . Then, * = {1, . . . , , * } is a group of pairwise

According to Luttermann et al. [29, Thm. 1], the arithmetic mean is the optimal choice for the squared loss objective used for minimisation within the error function (, ) (see Eq. (4)). However, one may wonder whether the arithmetic mean is always the optimal choice for any given task. In principle, any weighted mean can be used as a representative -equivalent factor, as shown in the next theorem. Theorem 15. Let = {1, . . . , } denote a group of pairwise -equivalent factors and let () = ∑︀=1 () be the weighted mean with weights ≥ 0 and ∑︀=1 = 1 for all assignments . Then, * = {1, . . . , , } is a group of pairwise -equivalent factors.

Proof. We show the claim in two directions by proving that () ∈ [() · (1 − ), () · (1 + )] and () ∈ [() · (1 − ), () · (1 + )] hold for any assignment and ∈ .

For the first direction, let be an arbitrary assignment and let ∈ . As all factors in are pairwise -equivalent, it holds that and analogously also for the upper bound () · (1 − ) ≤ ∈

min () () · (1 + ) ≥ ∈

max () ≤ ≥

= min () ∑︁ = ∑︁ · ∈

min () ∈ =1 =1

∑︁ · () = () =1

= max () ∑︁ = ∑︁ · ∈

max () ∈ =1 =1

∑︁ · () = ().

=1 For the second direction, it holds that for any assignment , every ∈ is contained in the interval [ () · (1 − ), () · (1 + )] for any ∈ {1, . . . , }. Therefore, we can get the lower bound for a specific ∈ {1, . . . , } and obtain

(1 − )() = (1 − ) ∑︁ · ()

=1 = ∑︁ · (1 − )() =1 ≤ ∑︁ () = () ∑︁ = ()

=1 =1 as well as the upper bound for a specific :

(1 + )() = (1 + ) ∑︁ · ()

=1 = ∑︁ · (1 + )() =1 (27) (28) (29) (30) (31) (32) (33) (34) (35) (36) (37) Therefore, we get for every the property () ∈ [(1 − )(), (1 + )()].

Geometrically, this property can be viewed as follows: The convex hull of a group of pairwise -equivalent factors consists entirely of points that are -equivalent to each other: conv(1, . . . , ) :=

{︃ ∑=︁1 ⃒⃒⃒⃒⃒ ≥ 0, ∑=︁1 = 1}︃ .

Besides the arithmetic mean, this also includes the geometric mean of the original factors, which can most easily be shown by an appropriate choice of weights.

Corollary 16. Let = {1, . . . , } denote a group of pairwise -equivalent factors and let () := √︀∏︀

=1 () be the geometric mean for all assignments . Then, * = {1, . . . , , } is a group of pairwise -equivalent factors.

Proof. Since () ∈ [min=1,..., (), max=1,..., ()], we can use 1 ≥ 1 := 1 ≥ 2 :=

() − max=1,..., () − min=1,..., ()

min=1,..., () ≥ 0 max=1,..., () − () max=1,..., () − min=1,..., () ≥ 0, which summarises to 1 + 2 = 1 and with Theorem 15, the proof can be completed: () = 1 · =m1,.i.n., () + 2 · =m1,a...x, ().

Note 1. For positive real numbers 1(), . . . , () the geometric mean is always smaller or equal to the arithmetic mean ⎯⎸ ⎷ 1 ∑︁ () = * (), () = ⎸∏︁ () ≤

=1 =1 and more robust against outliers.

Results previously shown in this subsection required no ℒ normalisation, assuming that the weighted average localises on a spherical shell.

Lemma 17. Let = {1, . . . , } denote a group of pairwise -equivalent factors with ∈ − 1 for = 1, . . . , and let () = ∑︀ =1 = 1 =1 () be the weighted mean with weights ≥ 0 and ∑︀ for all assignments . Then, the length of is bounded by ‖‖ ∈ [︁ 1 +1 , 1]︁.

{︁ }︁

In other words, ∈ 1(0) ∖ 1/(1+)(0) = ∈ R : 1 +1 ≤ ‖ ‖ ≤ 1 .

Proof. For ≥ 1 we can use the Jensen-inequality for the convex function () := || (also called convexity of ℒ norm, [33]) and get ‖‖ = ⃦⃦⃦⃦ ∑︁ ⃦⃦⃦⃦ = ∑︁ (︃ ∑︁ ())︃ ⃦ =1 ⃦ =1 =1

≤ ∑︁ ∑︁ () = ∑︁ ∑︁ () =1 =1 =1 =1 (39) (40) (41) (42) (43) (44) By the pairwise -equivalence of , from Theorem 15, we get () ≥ for = 1, . . . , . We therefore obtain max=1,...,{ ()} / (1 + ) (45) (46) (47) (48) (49) (50) (51) (52) (53) The last inequality is a reduction of the maximum to one arbitrary, but specific factor (in this case 1), to be able to take its scaling into account. Taking the -th root results in the desired lower bound. In the final inequality, we reduce the maximum to a single factor 1, allowing us to use its unit property in ℒ. This completes the proof, as taking the -th root yields the desired lower bound.

Consequently, we can naturally combine the idea of the convex hull from Theorem 15 with the normalisation from Lemma 17 to obtain an intuitive visual perspective. Given a set of factors 1, . . . , ∈ R+ , we normalise them to ensure fair starting conditions such that norm := / ‖‖ ∈ R+ ∩ − 1, i.e., each lies on the positive orthant of the -unit -sphere. If they are pairwise -equivalent, then any weighted mean in their convex hull remains pairwise -equivalent to them. Additionally, the ℒ norm 1 of any such mean lies between 1+ and 1. However, this norm is in general strictly less than 1 (except boundary factors), implying that normalisation does not preserve the normalised property under convex combinations. This leads to the subsequent observation, which prevents the geometric interpretation from being further simplified to a convex cone.

Note 2. Let := cone() = { ∈ R+ : = ∑︀=1 with ≥ 0, ∈ } be the finitely generated convex cone from a group of factors = {1, . . . , }, whose normalised set norm := {n orm = / ‖‖, = 1, . . . , } is pairwise -equivalent, contains normalised factors norm ∈ with ‖norm‖ = 1 that are not pairwise -equivalent to all elements of norm. Formally, there exists norm ∈ with ‖norm‖ = 1 such that for some ∈ {1, . . . , } we have norm ̸= n orm. Example 5 provides a counterexample that invalidates this geometric interpretation in general.

Example 5. When normalising any factor within the generated cone, it is not necessarily -equivalent to the original generating normalised factors. This is illustrated by the following example with We begin by determining the -equivalence of the original factors:

∞(1, 2) = 0.1 = ∞(1, 3) and ∞(2, 3) = 0.05.

After normalisation, for instance using the ℒ2-norm via n orm := / ‖‖2 for all , we obtain: ∞(n1orm, n2orm) = 0.07244888 = ∞(n1orm, n3orm) and ∞(n2orm, n3orm) = 0.05. Due to Def. 4, we have pairwise -equivalence for for this triple of normalised factors. For the convex combination := 0 · n1orm + 0.1 · n2orm + 0.9 · n3orm, pairwise 0-equivalence still holds (see also Theorem 15) with ∞(, n orm) ≤ 0 for = 1, 2, 3. However, the normalised version norm := /‖‖2 has ∞(norm, n1orm) = 0.07256301 > 0 and is in particular not 0-equivalent to n1orm and automatically not pairwise -equivalent to the normalised group anymore.

This example shows that the original idea of finding symmetries within the ACP framework can be extended from exact symmetries to normalised and also to -equivalent approximate symmetries. However, the latter cannot be interpreted as a finitely generated convex cone. Consequently, scaling assumes a more profound role than previously anticipated, particularly in determining a unique order in the hierarchical version HACP, making its choice significant. Nonetheless, the proven results ofer positive prospects by enabling extensions of the -ACP and HACP algorithms via alternative loss functions that permit diferent optimal representatives for a pairwise -equivalent group through weighted mean selection. In addition, the convex hull of a group of pairwise -equivalent factors is the smallest set containing these weighted elements. If it can be shown that a new factor is also pairwise -equivalent to each factor of the given group, the new convex hull is a superset and extends the possible representatives while retaining all previous ones. 3.3. Special Case: Euclidean Perspective We now focus on the geometric interpretation in the Euclidean space ℒ2 as a special case of Section 3.1. Although the counterexample from Example 5 still applies, this setting has advantages. In particular, it allows a more interpretable and intuitive understanding of -equivalence due to the accessibility of Euclidean norms and angles, which provide a one-dimensional comparison similar to 1DEED; see also [ 15, 31 ] for related discussions.

Definition 5. The Cosine distance cos for two non-zero vectors 1, 2 ∈ R is defined as cos(1, 2) := 1 −

In [31], a refined definition of the Cosine distance for factors is given, which is based more precisely on the assignments . It also includes the notion of exchangeable factors, i.e., two factors that are identical after scaling and permutation of potentials.

Theorem 18 (Theorem 2, [31]). Let 1(1, . . . , ) and 2(1, . . . , ) denote two factors. If 1 and 2 are exchangeable, then it holds that cos(1, 2) = 0.

When normalising the factors in ℒ2, the Cosine distance satisfies the following well-known property. Lemma 19. The Cosine distance for two normalised factors 1, 2 ∈ R> 0 and the Euclidean distance for two normalised factors 1, 2 ∈ R> 0 with ‖‖2 = 1 for = 1, 2 are the same besides a scaling factor, i.e., 2cos(1, 2) = ||1 − 2||2. (54) (55) (56) (57) (58)

As the Cosine distance for > 0 can be seen as an angle between two factors, we obtain a bound on the maximal angle diference between -equivalent factors.

Lemma 20. The angle ∈ [0, 2 ] of two -equivalent factors 1, 2 ∈ R> 0 ∩ 2− 1 is at most () ≤ 2 {︃arccos(1 − 2 ) for ≤ 2 for > 2.

Lemma 19 the inequality:

Proof. The angle of two vectors within an orthant in the Eucledian space is always maximal 2 , which will be the upper bound for > 2. For the other case of 0 ≤ ≤ 2 we get from Lemma 11 () and Proof. With the normalised vectors ||||2 = 1 for = 1, 2, consider ||1 − 2||2 = (1 − 2)(1 − 2)

‖1 − 2‖2 = cos(1, 2) = 1 − cos( ) both sides switches the direction of Eq. (66). The operation remains well-defined if ≤ 2. In the last step, taking the strictly monotonically decreasing arccos for values in [ 0, 1 ] or ∈ [ 0, 2 ] on

4. Discussion

Understanding of -equivalence.

As an extension of exchangeable factors, the general concept of -equivalence and the more manageable 1DEED seem to be a promising option to extend lifted inference to a new level via approximation of similar factors. Therefore, the understanding of similarity in this context is key. Building on the maximum metric (Chebyshev distance), 1DEED takes relative lengths into account, resulting in the lack of transitivity. However, the property of -equivalence is easy to check and leads to multiple practical properties including the guaranteed asymptotic bounds of Section 2.2 and a compressed (compact) and more interpretable model depending on the choice of . The hierarchical approach additionally helps to understand the complexity of the given FG by pre-analysing the 1DEED of all factors.

Geometric interpretability.

Understanding diferences between factors as an angle between vectors within the Euclidean space is not a new concept [31]. However, it is one way to interpret the concept of -equivalence. Pairwise -equivalent factors can also be seen as set of -dimensional vectors, which bound and generate a compact set (the convex hull) within the ℒ basically the minimal set containing the group of -equivalent factors. If another factor that is pairwise -equivalent to all factors in a group is added to the group, the resulting set becomes a superset and its newly generated convex hull is again a set of pairwise -equivalent factors containing the old set. space. This geometric object is (59) (60) (61) (62) (63) (64) (65) (66) New potential for future exploration. Using weighted means as representatives for groups of -equivalent factors enriches the possibilities of how the algorithms -ACP and HACP can be applied. The usage of weighted means opens the whole concept of approximate lifted model construction based on -equivalence to possible robustifications and generalisations by changing the original squared loss function and its optimal choice of the arithmetic mean as representatives for groups of -equivalent factors. Choosing diferent loss functions and their new corresponding optimal solution of a representative increases the flexibility of the -ACP algorithm and the HACP algorithm.

5. Conclusion

We presented fundamental properties of -equivalence related to lifted model construction under a unified view and have proven several additional properties for the ℒ space. By viewing the concept of -equivalence from geometric perspective in ℒ, we provided a solid foundation for geometric interpretability and gave a new perspective on the concept of pairwise -equivalence via the convex hull of groups of -equivalent factors. Our geometric interpretation opens up for advancements with respect to the -ACP algorithm and its hierarchical version HACP in terms of generalisation and robustification of the previously introduced compression techniques.

Acknowledgments

This work was partially funded by the Ministry of Culture and Science of the German State of North Rhine-Westphalia. The research of Malte Luttermann was funded by the BMBF project AnoMed 16KISA057.

Declaration on Generative AI The authors have not employed any generative AI tools. Construction, in: Proceedings of the Thirty-Fourth International Joint Conference on Artificial

Intelligence (IJCAI-2025), 2025. Https://arxiv.org/abs/2504.20784. [30] J. Speller, M. Luttermann, M. Gehrke, T. Braun, Compression versus Accuracy: A Hierarchy of

Lifted Models, 2025. Https://arxiv.org/abs/2505.22288.

[31] M. Luttermann, R. Möller, M. Gehrke, Lifted Model Construction without Normalisation: A Vectorised Approach to Exploit Symmetries in Factor Graphs, in: Proceedings of the Third

Learning on Graphs Conference (LoG-2024), PMLR, 2025, pp. 46:1–46:17.

[32] H. Chan, A. Darwiche, A Distance Measure for Bounding Probabilistic Belief Change, International

Journal of Approximate Reasoning 38 (2005) 149–174.

[33] J. L. W. V. Jensen, Sur les fonctions convexes et les inégalités entre les valeurs moyennes, Acta mathematica 30 (1906) 175–193.

[1]

Poole , First-order Probabilistic Inference , in: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-2003) , IJCAI Organization, 2003 , pp. 985 - 991 .

[2]

Richardson ,

Domingos , Markov Logic Networks, Machine Learning 62 ( 2006 ) 107 - 136 .

[3]

Fuhr , Probabilistic Datalog - A Logic for Powerful Retrieval Methods , in: SIGIR-95 Proc. of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , ACM, 1995 , pp. 282 - 290 .

[4]

Sato , A Statistical Learning Method for Logic Programs with Distribution Semantics , in: Proc. of the 12th International Conference on Logic Programming , MIT Press, 1995 , pp. 715 - 729 .

[5]

Ahmadi ,

Kersting ,

Mladenov ,

Natarajan , Exploiting Symmetries for Scaling Loopy Belief Propagation and

Relational

Training , Machine Learning 92 ( 2013 ) 91 - 132 .

[6]

Braun ,

Möller , Lifted Junction Tree Algorithm, in: Proceedings of the Thirty-Ninth German Conference on Artificial Intelligence (KI-2016) , Springer, 2016 , pp. 30 - 42 .

[7]

Gogate ,

Domingos , Probabilistic Theorem Proving , in: Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI-2011) , AUAI Press, 2011 , pp. 256 - 265 .

[8]

Hartwig ,

Möller , T. Braun, An Extended View on Lifting Gaussian Bayesian Networks, Artificial Intelligence 330 ( 2024 ) 104082 .

[9]

Taghipour ,

Fierens ,

Davis ,

Blockeel , Lifted Variable Elimination: Decoupling the Operators from the Constraint Language , Journal of Artificial Intelligence Research 47 ( 2013 ) 393 - 439 .

[10]

Van den Broeck , N. Taghipour,

Meert ,

Davis , L. De Raedt, Lifted Probabilistic Inference by First-order Knowledge Compilation , in: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI-2011) , IJCAI Organization, 2011 , pp. 2178 - 2185 .

[11]

Braun ,

Möller , Parameterised Queries and Lifted Query Answering , in: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-2018) , IJCAI Organization, 2018 , pp. 4980 - 4986 .

[12] G. Van den Broeck , Lifted Inference and Learning in Statistical Relational Models, Ph.D. thesis, KU Leuven , 2013 .

[13]

Van den Broeck , J. Davis, Conditioning in First-order Knowledge Compilation and Lifted Probabilistic Inference , in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-2012) , AAAI Press, 2012 , pp. 1961 - 1967 .

[14]

Gehrke ,

Braun ,

Möller , Relational Forward Backward Algorithm for Multiple Queries , in: FLAIRS-19 Proc. of the 32nd International Florida Artificial Intelligence Research Society Conference, AAAI Press, 2019 , pp. 464 - 469 .

[15]

Gehrke ,

Braun ,

Möller , Taming Reasoning in Temporal Probabilistic Relational Models , in: 9th International Workshop on Statistical Relational AI at the 34th AAAI Conference on Artificial Intelligence , 2020 .

[16]

Apsel , R. I. Brafman , Extended Lifted Inference with Joint Formulas , in: UAI-11 Proc. of the 27th Conference on Uncertainty in Artificial Intelligence , AUAI Press, 2011 , pp. 74 - 83 .

[17]

Nath ,

Domingos , A Language for Relational Decision Theory , in : Proc. of the 6th International Workshop on Statistical Relational Learning , 2009 .

[18]

Gehrke ,

Braun ,

Möller , Lifted Temporal Maximum Expected Utility , in: Proc. of the 32nd Canadian Conference on Artificial Intelligence , Canadian AI 2019 , Springer, 2019 , pp. 380 - 386 .

[19]

Gehrke ,

Braun ,

Polovina , Restricting the Maximum Number of Actions for Decision Support under Uncertaitny , in: ICCS-20 Proc. of the 25th International Conference on Conceptual Structures , Springer, 2020 , pp. 145 - 160 .

[20]

Sanner ,

Kersting , Symbolic Dynamic Programming for First-order POMDPs , in: AAAI-10 Proc. of the 24th AAAI Conference on Artificial Intelligence , AAAI Press, 2010 , pp. 1140 - 1146 .

[21]

Braun ,

Gehrke ,

Lau ,

Möller , Lifting in Multi-agent Systems under Uncertainty , in: UAI-22 Proc. of the 38th Conference on Uncertainty in Artificial Intelligence , AUAI Press, 2022 , pp. 1 - 8 .

[22]

N. N.

Karabulut , T. Braun, Lifting Partially Observable Stochastic Games, in: SUM-24 Proceedings of the 16th International Conference on Scalable Uncertainty Management , Springer, 2024 , pp. 201 - 216 .

[23]

Niepert , G. Van den Broeck, Tractability through Exchangeability: A New Perspective on Eficient Probabilistic Inference , in: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI-2014) , AAAI Press, 2014 , pp. 2467 - 2475 .

[24]

Weisfeiler ,

Leman , A Reduction of a Graph to a Canonical Form and Algebra Arising during this Reduction, Nauchno-Technicheskaya Informatsia 2 ( 1968 ) 12 - 16 . Translated from Russian.

[25]

Luttermann ,

Braun ,

Möller ,

Gehrke , Colour Passing Revisited: Lifted Model Construction with Commutative Factors , in: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-2024) , AAAI Press, 2024 , pp. 20500 - 20507 .

[26]

Luttermann ,

Machemer ,

Gehrke , Eficient Detection of Exchangeable Factors in Factor Graphs , in: FLAIRS-24 Proceedings of the 37th International FLAIRS Conference , volume 37 , Florida

Online Journals

, 2024 .

[27]

Luttermann ,

Machemer ,

Gehrke , Eficient Detection of Commutative Factors in Factor Graphs , in: PGM-24 Proceedings of the 12th International Conference on Probabilistic Graphical Models , volume 246 , PMLR , 2024 , pp. 38 - 56 .

[28]

Luttermann ,

Möller ,

Gehrke , Lifted Model Construction without Normalisation: A Vectorised Approach to Exploit Symmetries in Factor Graphs , in: Proceedings of the Third Learning on Graphs Conference (LoG-2024) , PMLR, 2025 , pp. 46 : 1 - 46 : 17 .

[29]

Luttermann ,

Speller ,

Gehrke ,

Braun ,

Möller ,

Hartwig , Approximate Lifted Model