=Paper= {{Paper |id=Vol-2986/paper7 |storemode=property |title=Neural Semirings |pdfUrl=https://ceur-ws.org/Vol-2986/paper7.pdf |volume=Vol-2986 |authors=Pedro Zuidberg Dos Martires |dblpUrl=https://dblp.org/rec/conf/nesy/Martires21 }} ==Neural Semirings== https://ceur-ws.org/Vol-2986/paper7.pdf
Neural Semirings
    Pedro Zuidberg Dos Martires1
1
    KU Leuven, Belgium


                                         Abstract
                                         From an abstract algebra perspective, reasoning tasks in symbolic artificial intelligence can be formulated
                                         in terms of commutative semiring operations. Usually, the reasoning task, e.g. probabilistic reasoning
                                         or fuzzy reasoning, is fixed upfront as part of the problem specification. In this paper we relax this
                                         assumption and render the reasoning operators (the semiring operations) subject to learning by replacing
                                         predefined operations with learnable neural networks. This unlocks the learn to reason paradigm in
                                         the semiring reasoning setting.

                                         Keywords
                                         neuro-symbolic, semirings, learn to reason




1. Introduction
At the core of symbolic AI lie Boolean functions consisting of Boolean literals and Boolean
connectives between these literals. An interesting representation of Boolean functions, for our
purposes, is the so-called negation normal form: the only connectives allowed are AND-gates,
OR-gates, and negations. The latter can only occur on the literals themselves. These three gates
are universal, which means they allow us to represent any Boolean function. At an abstract
level, reasoning tasks in symbolic AI are concerned with computing quantities of these logic
functions.
   In a symbolic AI system, the process of reasoning can be formulated using algebraic structures
called (commutative) semirings. This means that a symbolic AI is a computer program that
performs its computations while adhering to the structure imposed by semirings [1, 2, 3, 4].
   A well-known semiring in symbolic AI is the probability semiring. This is the semiring
used for inference in the probabilsitic programming language ProbLog [5] but also in its
neuro-symbolic extension DeepProbLog [6]. In the case of the Boolean literals representing
probabilistic Boolean variables, the probability semiring allows for the computation of the
probability of a Boolean function being satisfied:

                                                                               𝑝( ⋁ β‹€ β„“) = βˆ‘ ∏ 𝑝(β„“)                                                                              (1)
                                                                                    𝐼 βˆˆπ‘€(πœ“ ) β„“βˆˆπΌ               𝐼 βˆˆπ‘€(πœ“ ) β„“βˆˆπΌ

On the left hand-side we have the (probabilistic) Boolean function for which we want to compute
the probability. The symbol πœ“ denotes the entire Boolean function, 𝑀(πœ“ ) denotes the models of
πœ“, and the ℓ’s are the literals, i.e. Boolean atoms and their negation.
15th International Workshop on Neural-Symbolic Learning and Reasoning
Envelope-Open pedro.zudo@kuleuven.be (. P. Zuidberg Dos Martires)
                                       Β© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   Note how, on the right, we push the probability functional 𝑝 inside and apply it directly to
the literals instead of the entire formula. This is not always possible but for Boolean functions
that adhere to specific constraints [7] it enables poly-time probabilistic inference (in contrast to
the usual computational hardness of probabilistic reasoning). Note also how the ⋁ changed to
a sum operation and the β‹€ changed to a product operation. For probabilities, these operations
are simply the addition and multiplication of real values between 0 and 1. In the probability
semiring the real numbers between 0 and 1 are called the elements of the semiring.
   Researchers soon enough realized [8, 9, 2, 1] that different reasoning tasks common in AI, such
as probabilistic inference, the Viterbi algorithm, or forward differentiation, can be formulated
as computations in different semirings. We denote this generalized computation by 𝛼 and we
call 𝛼 the labeling functional:

                                𝛼( ⋁ β‹€ β„“) = ⨁ ⨂ 𝛼(β„“)                                             (2)
                                   𝐼 βˆˆπ‘€(πœ“ ) β„“βˆˆπΌ    𝐼 βˆˆπ‘€(πœ“ ) β„“βˆˆπΌ

Comparing Equation 2 to Equation 1, we see that, structurally, nothing changed. The only
difference is the quantity that we compute (denoted by 𝛼(β‹…)) and the generalization of the
summation and the product in the first equation to an abstract summation and multiplication.
This general framework to perform reasoning tasks was coined algebraic model counting (AMC)
in [2], where the authors do also give an overview of which reasoning tasks can be formulated
as a semiring computation and under which conditions.

Contribution
Learning within the semiring reasoning framework of algebraic model counting has mainly
been concerned with learning a mapping from literals to elements of a given semiring. For
instance, DeepProbLog takes high-dimensional data, such as the image of a number , and
uses a neural net to map it to the probability of the image showing the number 7.
   We will lift this restriction and allow the semiring operations (i.e. the reasoning operations)
to be subject to learning as well. We replace predefined semiring operations by neural networks
that adhere to the algebraic properties of semirings, while still being trainable. Swapping out
predefined semiring operations for learnable neural networks unlocks the learn to reason
paradigm [10] in the semiring reasoning framework.

Paper Organization
We start with the necessary mathematical background on algebraic model counting (Section 2),
which also includes a formal definition of semirings. In Section 3, we introduce learnable
semirings by showing how the semiring operations can be represented through neural networks
and in Section 4 we sketch a possible learning set-up for neural semirings. We end the paper by
discussing related work (including graph neural networks) in Section 5, and giving concluding
thoughts in Section 6.
2. Preliminaries on Algebraic Model Counting
Here we give the formal definitions of algebraic structures called monoids and semirings, and
also of algebraic model counting. These will allow us to learn to reason in an algebraic model
counting context by introducing neural semirings.

Definition 2.1 (Monoid). A monoid is an algebraic structure (π•Š, ∘, 𝑒) equipping a set of elements
π•Š with a closed binary operation ∘ ∢ π•Š Γ— π•Š β†’ π•Š and where 𝑒 ∈ π•Š is the so-called neutral element.
Furthermore the following properties have to be satisfied:
   1. associativity: βˆ€π‘Ž, 𝑏, 𝑐 ∈ π•Š it holds that (π‘Ž ∘ 𝑏) ∘ 𝑐 = π‘Ž ∘ (𝑏 ∘ 𝑐).
   2. neutral element: there exists an element 𝑒 ∈ π•Š called the neural such that βˆ€π‘  ∈ π•Š it holds that
      𝑒 ∘ 𝑠 = 𝑠 ∘ π‘Ž = 𝑠.
Additionally, a monoid is called commutative if the structure (π•Š, ∘, 𝑒) satisfies:
   3. commutativity: βˆ€π‘Ž, 𝑏 ∈ π•Š it holds that π‘Ž ∘ 𝑏 = 𝑏 ∘ π‘Ž.

Definition 2.2 (Semiring). A semiring is an algebraic structure (π•Š, βŠ•, βŠ—, 𝑒 βŠ• , 𝑒 βŠ— ) equipping a set
of elements π•Š with closed binary operations βŠ• and βŠ—, which are usually referred to as addition and
multiplication, respectively. Furthermore the following properties hold:
   1. the algebraic structure (π•Š, βŠ•, 𝑒 βŠ• ) is a commutative monoid.
   2. the algebraic structure (π•Š, βŠ—, 𝑒 βŠ— ) is a monoid.
   3. distributivity: the multiplication distributes over the addition, i.e. βˆ€π‘Ž, 𝑏, 𝑐 ∈ π•Š it holds that
      π‘Ž βŠ— (𝑏 βŠ• 𝑐) = π‘Ž βŠ— 𝑏 βŠ• π‘Ž βŠ— 𝑐.
   4. annihilating element: the neutral element of the addition monoid 𝑒 βŠ• is the annihilating
      element of the multiplication, i.e. βˆ€π‘  ∈ π•Š it holds that 𝑒 βŠ• βŠ— 𝑠 = 𝑠 βŠ— 𝑒 βŠ• = 𝑒 βŠ• .
Additionally, a semiring is called commutative if the structure (π•Š, βŠ—, 𝑒 βŠ— ) is a commutative monoid:
   5. commutativity of multiplication: βˆ€π‘Ž, 𝑏 ∈ π•Š it holds that π‘Ž βŠ— 𝑏 = 𝑏 βŠ— π‘Ž.

Definition 2.3 (Algebraic model counting [2]). Given are a propositional logic theory πœ“ over
a set of variables ℬ, a commutative semiring (π•Š, βŠ•, βŠ—, 𝑒 βŠ• , 𝑒 βŠ— ), and a labeling function 𝛼 ∢ 𝕃 β†’ π•Š,
with 𝕃 being the set of variables in ℬ and their negations. The algebraic model count of a theory πœ“
is then defined as:

                                  𝐴𝑀𝐢(πœ“ , 𝛼|ℬ) = ⨁ ⨂ 𝛼(β„“)                                           (3)
                                                    𝐼 βˆˆπ‘€(πœ“ ) β„“βˆˆπΌ

   While elegant in its formulation, computing the algebraic model count is generally a compu-
tationally hard problem, #P-hard to be precise [11]. This is where knowledge compilation [12]
comes to our rescue.
   Knowledge compilation is the process of transforming a propositional logic formula into
a form that allows for polytime computation of algebraic model counts. Although the
knowledge compilation step is itself computationally hard, the overall procedure yields a net
benefit when a logic formula has to be evaluated multiple times, possibly with different labeling
functionals. Combining AMC and knowledge compilation is a perfect match in an iterative
learning setting such as stochastic gradient descent, which has already been demonstrated in
the neuro-symbolic literature [13, 6].
   A well-known target representation for logic formulas used in knowledge compilation are
so-called binary decision diagrams (BDD) [14]. The advantage of using BDDs over formulas in
conjunctive normal forms (c.f. Equation 2) is that we can alternate conjunctions and disjunctions,
which in turn means that we can compute algebraic model counts by repeatedly nesting βŠ• and
βŠ— operations. Fundamentally, knowledge compilation turns the computation of an algebraic
model count into a dynamic programming algorithm, where the target representation of the
propositional formula, e.g. a BDD, determines the order in which the βŠ• adn βŠ— operations are
performed. We refer the reader to [2] for a detailed account.


3. Neural Semirings
Commutative neural semirings are algebraic structures of the form (π•Š, βŠ•, βŠ—, 𝑒 βŠ• , 𝑒 βŠ— ) (cf. Def-
inition 2.2), where the binary operations βŠ— and βŠ• are neural networks, and which adhere
to the properties of commutative semirings. In order to construct neural semirings we draw
inspirations from so-called deeps sets [15, 16]. Deep sets are neural networks that are invariant
under permutations of the inputs:

                                   𝑓 (π‘₯1 , … , π‘₯𝑀 ) = 𝑓 (π‘₯πœ‹(1) , … , π‘₯πœ‹(𝑀) )                       (4)

for all possible permutations πœ‹. We can now explicitly construct a permutation-invariant
function 𝑓 by assuming sum-decomposability:

                                  𝑓 (𝑋 = {π‘₯1 , … , π‘₯𝑀 }) = 𝜌 (βˆ‘ πœ™(π‘₯))                              (5)
                                                                 π‘₯βˆˆπ‘‹

Here, πœ™(π‘₯) is a vector and the summation is performed element-wise. The tuple (𝜌, πœ™) is
called a sum-decomposition of 𝑓. Reducing the number of inputs to two immediately yields
commutative binary operations that we can use for the definition of the βŠ— and βŠ• operations of
neural semirings:

             βŠ•(𝛼1 , 𝛼2 ) = πœŒβŠ• (πœ™βŠ• (𝛼1 ) + πœ™βŠ• (𝛼2 )) = πœŒβŠ• (πœ™βŠ• (𝛼2 ) + πœ™βŠ• (𝛼1 )) = βŠ•(𝛼2 , 𝛼1 )       (6)
             βŠ—(𝛼1 , 𝛼2 ) = πœŒβŠ— (πœ™βŠ— (𝛼1 ) + πœ™βŠ— (𝛼2 )) = πœŒβŠ— (πœ™βŠ— (𝛼2 ) + πœ™βŠ— (𝛼1 )) = βŠ—(𝛼2 , 𝛼1 )       (7)

where 𝛼1 , 𝛼2 ∈ π•Š. Assuming sum decomposability of the neural semiring operations results in
satisfying the commutativity property.
  The next property that we would like the neural semiring operators to satisfy is associativity.
That is, we would like to following equation to hold:

      βŠ• (𝛼1 , βŠ•(𝛼2 , 𝛼3 )) = βŠ•(βŠ•(𝛼1 , 𝛼2 ), 𝛼3 ))                                                  (8)

   ⇔ πœŒβŠ• [πœ™βŠ• (𝛼1 ) + πœ™βŠ• (πœŒβŠ• [πœ™βŠ• (𝛼2 ) + πœ™βŠ• (𝛼3 )])] = πœŒβŠ• [πœ™βŠ• (πœŒβŠ• [πœ™βŠ• (𝛼1 ) + πœ™βŠ• (𝛼2 )]) + πœ™βŠ• (𝛼3 )] (9)
                                        βˆ’1 . Imposing additionally an analog condition for the
We satisfy Equation 9 by imposing πœŒβŠ• = πœ™βŠ•
βŠ— operation we get the following sum-decomposition of our neural semiring operators:

                                              βˆ’1 [πœ™ (𝛼 ) + πœ™ (𝛼 )]
                               βŠ•(𝛼1 , 𝛼2 ) = πœ™βŠ•                                               (10)
                                                   βŠ• 1      βŠ• 2
                                              βˆ’1 [πœ™ (𝛼 ) + πœ™ (𝛼 )]
                               βŠ—(𝛼1 , 𝛼2 ) = πœ™βŠ—                                               (11)
                                                   βŠ— 1      βŠ— 2

Writing the neural semiring operations in this fashion alludes to an encoder/decoder interpreta-
tion. Let us consider the βŠ— operation: πœ™βŠ— takes an input 𝛼, which is an element of the semiring,
and enccodes it into a latent space where the βŠ—-operation reduces to a mere element-wise
addition. πœ™βŠ—βˆ’1 then takes this results of the element-wise addition and decodes it back into the

space of semiring elements. This interpretation opens up an avenue for implementing asso-
ciative neural operators. We could, for instance, fall back on an encoder/decoder architecture.
While this is a standard architecture, we would only have an approximation of associativity:
the decoder will never be the exact inverse of the encoder. Alternatively, we could enforce
𝜌 = πœ™ βˆ’1 by construction. The key challenge in this case would be the efficient construction of
the inverse of πœ™. Here, techniques from the neural network community [17] or the normalizing
flow community [18] could be used to construct neural networks that are easily invertible.
   Next, we turn our attention to the neutral elements and more specifically to their existence.
Assuming the sum-decomposition in Equation 10 we have to satisfy the following equality:

                                          βŠ•(𝛼, 𝑒 βŠ• ) = 𝛼                                      (12)

This is satisfied for every 𝛼 if we simply take πœ™βŠ• (𝑒 βŠ• ) = 0βƒ—. We then obtain the neutral element
of the addition with πœ™βŠ•  βˆ’1 (0
                             βƒ—). This means that a neutral element exists if 0βƒ— is included in the
co-domain of πœ™βŠ• . We can also follow a similar reasoning for the βŠ—-operation.
   Until now we have only been considering properties of neural semiring operations that
only concern one of the two operations at a time. However, what distinguishes semiring from
monoids (and groups from that matter, which are moinoids with invertible elements) is the
presence of properties that intertwingle the βŠ• and the βŠ— operations. Let us first have a look at
the annihilating property of semirings and then study the distributivity property.
   With the commutativity of the βŠ—-operation already holding, we need to have 𝑒 βŠ• βŠ— 𝛼 = 𝑒 βŠ• in
order to satisfy the annihilating property. Plugging in the sum-decomposition of Equation 11
we get:

                                           βˆ’1 [πœ™ (𝑒 βŠ• ) + πœ™ (𝛼)] = 𝑒 βŠ—
                          𝑒 βŠ• βŠ— 𝛼 = 𝑒 βŠ• ⇔ πœ™βŠ—                                                  (13)
                                                βŠ—          βŠ—
                                        ⇔ πœ™βŠ— (𝑒 βŠ• ) + πœ™βŠ— (𝛼) = πœ™βŠ— (𝑒 βŠ— )                      (14)
                                        ⇔ πœ™βŠ— (𝑒 βŠ• )𝑖 = ±∞                                     (15)

The index 𝑖 in the last line indexes the elements of the vector πœ™βŠ— (𝑒 βŠ• ) and the equation itself
tells us that each element of the vector has to be either plus or minus infinity. At first this
result seems unhinged. However, remembering that the neutral element of the addition in the
log-probability semiring is also βˆ’βˆž makes this result plausible. Equation 15 is the vector version
of the neutral element of the addition in the log-probability semiring.
  The trickiest property to enforce by construction in neural semirings is the distributivity of
the multiplication over the addition. Or more formally:
                   βˆ’1 [πœ™ (𝛼 ) + πœ™ (πœ™ βˆ’1 (πœ™ (𝛼 ) + πœ™ (𝛼 ))]
                  πœ™βŠ—    βŠ— 1      βŠ— βŠ•      βŠ• 2      βŠ• 3
                  βˆ’1 [πœ™ (πœ™ βˆ’1 (πœ™ (𝛼 ) + πœ™ (𝛼 ))) + πœ™ βˆ’1 [πœ™ (πœ™ βˆ’1 (πœ™ (𝛼 ) + πœ™ (𝛼 )))]
               = πœ™βŠ•                                                                                  (16)
                       βŠ• βŠ—      βŠ— 1      βŠ— 2        βŠ•     βŠ• βŠ—      βŠ— 1      βŠ— 3

where we again used the sum-decompositions from Equations 10 and 11. One possibility to
                                                    βˆ’1 , πœ™ , πœ™ βˆ’1 and πœ™ that satisfy Equation 16.
enforce the properties is to, again pick functions πœ™βŠ—     βŠ— βŠ•          βŠ•
For instance, we could pick:
                          βˆ’1 (𝛼) = exp(𝛼)
                         πœ™βŠ—                                        πœ™βŠ— (𝛼) = log(𝛼)
                               βˆ’1 (𝛼) = |𝛼| /𝑝
                              πœ™βŠ•
                                           1
                                                                       πœ™βŠ• (𝛼) = 𝛼 𝑝

Here we assume that 𝛼 is a vector and that the functions operate element-wise. In this case the
βŠ—-operation is equivalent to an element-wise multiplication of two vectors and the addition is
in fact an element-wise p-norm. Note that each dimension of the 𝛼-vector can have a different
parameter 𝑝. This means that every dimension is aggregated using a different p-norm. In a
neural semiring it would be this vector of parameters that would be learned, possibly via a
neural network.
   Alternatively, we could enforce distributivity through an extra loss term. Assume therefore
that we have two neural networks πœ™βŠ— and πœ™βŠ• and we know how to efficiently compute their
inverses πœ™βŠ—βˆ’1 and πœ™ βˆ’1 . Distributivity can then be enforced approximately by adding a term of
                   βŠ•
the form:

  ℒ𝑑𝑖𝑠𝑑 (𝛼1 , 𝛼2 , 𝛼3 ) = [      βˆ’1 [πœ™ (𝛼 ) + πœ™ (πœ™ βˆ’1 (πœ™ (𝛼 ) + πœ™ (𝛼 ))]
                                πœ™βŠ—    βŠ— 1      βŠ— βŠ•      βŠ• 2      βŠ• 3

                                                                                                      2
                                 βˆ’1      βˆ’1                          βˆ’1      βˆ’1
                              βˆ’ πœ™βŠ• [πœ™βŠ• (πœ™βŠ— (πœ™βŠ— (𝛼1 ) + πœ™βŠ— (𝛼2 ))) + πœ™βŠ• [πœ™βŠ• (πœ™βŠ— (πœ™βŠ— (𝛼1 ) + πœ™βŠ— (𝛼3 )))]

to the learning objective, which we would like to optimize. Here we picked the squared error
loss but other options might be viable as well. We will elaborate more on this distributivity loss
term in the next section.


4. Learning with Neural Semirings
The conceptually easiest learning paradigm applicable to learning with neural semirings is fully
supervised learning. In such a setting we assume to be given a set of literals and computation
graphs as input and a corresponding set of labels as supervision. The computation graph of
each training example first takes the literals and maps (encodes) them to a set of vectors. These
vectors are then combined using learnable sum-product operations. Figure 1 gives a conceptual
overview of the learning setup. An open question is where we obtain the computation graphs
from. One possibility would be to have as input a weighted Boolean formula (for instance
representing a weighted graph) and knowledge-compiling the formula into a tractable and
differentiable representation (cf. end of Section 2).
Figure 1: Schematic overview of a neural semiring computation. The diagram is read left to right. We
have as input the literals ℓ𝑖 , which are passed through the labeling functional 𝛼. The middle part abstractly
represents recursively nested sum-product computations, i.e. it represents the the computation graph
given as as input to the learning problem. In the end, we obtain the value of the query π‘ž by passing
the output of the nested sum-product through a learnable decoding function 𝛽 whose result we can
compare to the supervision (the label in the training data set).




   Using the notation presented in Figure 1, supervised learning in a neural semiring setting
consists of minimizing a loss function where we compare 𝛽(β„“1 , … , ℓ𝑛 ) and the corresponding label
𝐿 for each datum in the training data set. The parameters to be optimized are the parameters of
the encoder 𝛼, of the βŠ• and βŠ— operators, and of the decoder 𝛽. We call this loss function the
commutative semiring computation graph loss (CSCG-loss) and denote it by ℒ𝐢𝑆𝐢𝐺 .
   At the end of Section 3 we discussed the possibility of relaxing the semiring properties on the
βŠ• and βŠ— operator by enforcing distributivity through an extra term in the loss. The CSCG-loss
is then written as:

             ℒ𝐢𝑆𝐢𝐺 (β„“1 , … , β„“2 , 𝐿) β‰ˆ ℒ𝐢𝑆𝐢𝐺 βˆ— (β„“1 , … , β„“2 , 𝐿) + βˆ‘ ℒ𝑑𝑖𝑠𝑑 (𝛼(ℓ𝑖 ), 𝛼(ℓ𝑗 ), 𝛼(β„“π‘˜ ))      (17)
                                                                 ℓ𝑖 ,ℓ𝑗 ,β„“π‘˜

In above equation ℒ𝐢𝑆𝐢𝐺 βˆ— denotes the CSCG-loss but without the distributivity being enforced
by construction on the semiring operators. Furthermore, we can interpret the distributivity loss
(on the right) as a regularization term for ℒ𝐢𝑆𝐢𝐺 βˆ— .


5. Related Work
Historically, artificial intelligence can be divided into two main camps. The so-called symbolic
camp and the sub-symbolic or connectionist camp. The synthesis of both streams has been
dubbed neuro-symbolic AI [19, 20, 21].
   Formulating neuro-symbolic AI as semiring computations has usually been done in ap-
proaches originating in the symbolic camp, such as DeepProbLog [6], Tensorlog [22], Modular
Neural Networks [23], or also Logic Tensor Networks [24].
   All four approaches have in common that they are able to learn the labeling functional, which
we denote by 𝛼, but also that they pre-define the reasoning operations (βŠ• and βŠ—). For example,
in TensorLog the labeling functional 𝛼 produces vectors (elements of the semiring). The βŠ• and
βŠ— operations are then defined as element-wise addition and multiplication of these vectors.
This effectively means the embedding space in which reasoning is performed is chosen up front.
   A disadvantage of fixing the semiring operations (i.e. reasoning operations) up front is that
this severely restricts the embedding spaces that can be learned. For example, the modular
neural networks proposed by Andreas et al. [23] subdivide an image by overlaying a grid. Each
grid cell then corresponds to an entry in a tensor. In semiring terminology this is the labeling
step producing elements of the semiring. Elements are combined using predefined point-wise
tensor operations. This restricts reasoning and learning to grid worlds.
   In recent years neuro-symbolic AI has also experienced a revival in the connectionist camp.
The most prominent approach are probably graph neural networks [25], which were born out of
necessity to use structured data within neural networks. Inference and learning in graph neural
networks is performed by running a message passing algorithm on the structure provided by
a graph. Interestingly, message passing is nothing but a sum-product algorithm that can be
formulated as a semiring computation.
   In comparison to the research originating in the symbolic camp, we make two interesting
observations for GNNs and related approaches [26, 27]. 1) Contrary to symbolic approaches to
neuro-symbolic AI, connectionist approaches have been paying attention to learnable aggre-
gation functions, i.e. a learnable addition operator [15, 16, 28]. 2) Although the connectionist
camp has started to formulate deep learning in terms of abstract algebra, cf. geometric deep
leanring [29], this has only been done with regards to groups (monoids with an invertible
elements). Commutative semirings have been ignored so far.


6. Conclusions and Future Work
This paper introduces the concept of neural semirings, a framework that has the potential to
unlock the learn to reason paradigm in a semiring setting. In contrast to prior work, neural
semirings are not fixed up front anymore but learned from data and contrary to geometric deep
learning we formulate neuro-symbolic AI in terms of semirings instead of groups, where the
additional complication of distributivity emerges.
   Future work will obviously include an experimental evaluation of neural semirings and a more
detailed study of the ingredients to neural semirings. For instance, we might want to investigate
alternatives to sum-decompostions of the semiring operators such as product-decompostions.
We will also have to study in more detail the connections of neural semirings to approaches in
the connectionsist camp of neuro-symbolic AI, in particular the work of Xu et al. [10].


Acknowledgments
Pedro Zuidberg Dos Martires has received support from the KU Leuven Special Research Fund.


References
 [1] Z. Li, J. Eisner, First-and second-order expectation semirings with applications to minimum-
     risk training on translation forests, in: Conference on Empirical Methods in Natural
     Language Processing, 2009.
 [2] A. Kimmig, G. Van den Broeck, L. De Raedt, Algebraic model counting, Journal of Applied
     Logic (2017).
 [3] F. Orsini, P. Frasconi, L. De Raedt, kproblog: an algebraic prolog for machine learning,
     Machine Learning (2017).
 [4] V. Belle, L. De Raedt, Semiring programming: A semantic framework for generalized sum
     product problems, International Journal of Approximate Reasoning (2020).
 [5] D. Fierens, G. Van den Broeck, J. Renkens, D. Shterionov, B. Gutmann, I. Thon, G. Janssens,
     L. De Raedt, Inference and learning in probabilistic logic programs using weighted boolean
     formulas, Theory and Practice of Logic Programming (2015).
 [6] R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, L. De Raedt, Deepproblog: Neural
     probabilistic logic programming, in: Advances in Neural Information Processing Systems,
     2018.
 [7] A. Darwiche, Modeling and reasoning with Bayesian networks, Cambridge university
     press, 2009.
 [8] J. Goodman, Semiring parsing, Computational Linguistics (1999).
 [9] A. Kimmig, G. Van den Broeck, L. De Raedt, An algebraic prolog for reasoning about
     possible worlds, in: AAAI, 2011.
[10] K. Xu, J. Li, M. Zhang, S. S. Du, K.-i. Kawarabayashi, S. Jegelka, What can neural networks
     reason about?, in: ICLR, 2020.
[11] L. G. Valiant, The complexity of computing the permanent, Theoretical Computer Science
     8 (1979) 189–201.
[12] A. Darwiche, P. Marquis, A knowledge compilation map, Journal of Artificial Intelligence
     Research 17 (2002) 229–264.
[13] J. Xu, Z. Zhang, T. Friedman, Y. Liang, G. Broeck, A semantic loss function for deep
     learning with symbolic knowledge, in: International Conference on Machine Learning,
     PMLR, 2018, pp. 5502–5511.
[14] R. E. Bryant, Graph-Based Algorithms for Boolean Function Manipulation, IEEE Transac-
     tions on Computers 35 (1986).
[15] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. Salakhutdinov, A. J. Smola, Deep sets,
     in: NIPS, 2017.
[16] E. Wagstaff, F. Fuchs, M. Engelcke, I. Posner, M. A. Osborne, On the limitations of
     representing functions on sets, in: ICML, 2019.
[17] Y. Song, C. Meng, S. Ermon, Mintnet: Building invertible neural networks with masked
     convolutions, in: Advances in Neural Information Processing Systems, 2019.
[18] D. Rezende, S. Mohamed, Variational inference with normalizing flows, in: ICML, 2015.
[19] T. R. Besold, A. d. Garcez, S. Bader, H. Bowman, P. Domingos, P. Hitzler, K.-U. KΓΌhnberger,
     L. C. Lamb, D. Lowd, P. M. V. Lima, et al., Neural-symbolic learning and reasoning: A
     survey and interpretation, arXiv preprint arXiv:1711.03902 (2017).
[20] A. Garcez, M. Gori, L. Lamb, L. Serafini, M. Spranger, S. Tran, Neural-symbolic computing:
     An effective methodology for principled integration of machine learning and reasoning,
     Journal of Applied Logics (2019).
[21] A. d. Garcez, L. C. Lamb, Neurosymbolic ai: The 3rd wave, arXiv preprint arXiv:2012.05876
     (2020).
[22] W. Cohen, F. Yang, K. R. Mazaitis, Tensorlog: A probabilistic database implemented using
     deep-learning infrastructure, Journal of Artificial Intelligence Research (2020).
[23] J. Andreas, M. Rohrbach, T. Darrell, D. Klein, Learning to compose neural networks for
     question answering, in: NAACL, 2016.
[24] I. Donadello, L. Serafini, A. D. Garcez, Logic tensor networks for semantic image interpre-
     tation, IJCAI (2017).
[25] M. Gori, G. Monfardini, F. Scarselli, A new model for learning in graph domains, in: IEEE
     International Joint Conference on Neural Networks, 2005.
[26] P. VeličkoviΔ‡, G. Cucurull, A. Casanova, A. Romero, P. LiΓ², Y. Bengio, Graph attention
     networks, in: International Conference on Learning Representations, 2018.
[27] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, G. E. Dahl, Neural message passing for
     quantum chemistry, in: ICML, 2017.
[28] G. Pellegrini, A. Tibo, P. Frasconi, A. Passerini, M. Jaeger, Learning aggregation functions,
     arXiv preprint arXiv:2012.08482 (2020).
[29] M. M. Bronstein, J. Bruna, T. Cohen, P. VeličkoviΔ‡, Geometric deep learning: Grids, groups,
     graphs, geodesics, and gauges, arXiv preprint arXiv:2104.13478 (2021).