<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>1 Inference in Relational Neural Machines</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giuseppe Marra</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michelangelo Diligenti</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Gori</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lapo Faggi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Maggini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International</institution>
          ,
          <addr-line>CC BY 4.0</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science</institution>
          ,
          <addr-line>KU Leuven</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Information Engineering and Science, University of Siena</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Integrating logic reasoning and deep learning from sensory data is a key challenge to develop artificial agents able to operate in complex environments. Whereas deep learning can operate at a large scale thanks to recent hardware advancements (GPUs) as well as other important technical advancements like Stochastic Gradient Descent, logic inference can not be executed over large reasoning tasks, as it requires to consider a combinatorial number of possible assignments. Relational Neural Machines (RNMs) have been recently introduced in order to co-train a deep learning machine and a first-order probabilistic logic reasoner in a fully integrated way. In this context, it is crucial to avoid the logic inference to become a bottleneck, preventing the application of the methodology to large scale learning tasks. This paper proposes and compares different inference schemata for Relational Neural Machines together with some preliminary results to show the effectiveness of the proposed methodologies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Empowering machine learning with explicit reasoning capabilities
is a key step toward a trustworthy and human-centric AI, where
the decisions of the learners are explainable and with
humanunderstandable guarantees. While sub-symbolic approaches like
deep neural networks have achieved impressive results in several
tasks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], standard neural networks can struggle to represent
relational knowledge on different input patterns or relevant output
structures. Recently, some work has been done to learn and inject
relational features into the learning process [
        <xref ref-type="bibr" rid="ref16 ref17">17, 16</xref>
        ]. Symbolic
approaches [
        <xref ref-type="bibr" rid="ref18 ref3">3, 18</xref>
        ] based on probabilistic logic reasoners can
perform an explicit inference process in presence of uncertainty.
Another related line of research studies hybrid approaches leveraging
deep learning schemas and neural networks to learn the structure of
the reasoning process like done, for instance, by Neural Theorem
Provers [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] or TensorLog [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        However, bridging the gap between symbolic and sub-symbolic
levels is still an open problem which has been recently addressed by
neuro-symbolic approaches [
        <xref ref-type="bibr" rid="ref12 ref20">12, 20</xref>
        ]. Hu et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] inject the prior
knowledge into the network weights via a distillation process but
with no guarantee that the logic will be properly generalized to the
test cases. Deep Structured Models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Hazan et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] imposes
statistical structure on the output predictions. The Semantic Loss [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
defines a loss which encodes the desired output structure. However,
the loss does not define a probabilistic reasoning process, limiting
the flexibility of the approach. Deep ProbLog [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] extends the
probabilistic logic programming language ProbLog [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with predicates
implemented by a deep learner. This approach is powerful but
limited to cases where exact inference is possible. Deep Logic
Models [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] improve over related approaches like Semantic-based
Regularization [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and Logic Tensor Networks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], but they fail to
perfectly match the discrimination abilities of a pure-supervised learner.
      </p>
      <p>
        Relational Neural Machines (RNM) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] can perfectly replicate
the effectiveness of training from supervised data of deep
architectures, while integrating the full expressivity and rich reasoning
process of Markov Logic Networks [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. RNMs like any probabilistic
reasoner based on a graphical model representing the fully grounded
FOL knowledge are strongly limited in the scale at which they can
operate. Indeed, the large combinatorial number of possible
assignments together with the complex casual dependency structure of the
inference requires to device appropriate approximate inference
algorithmic solutions. This paper proposes and studies different new
inference solutions that are thought to be effective for RNMs.
      </p>
      <p>The outline of the paper is as follows. Section 2 presents the model
and how it can be used to integrate logic and learning. Sections 3 and
4 study tractable approaches to perform training and inference with
the model, respectively. Section 5 shows the experimental evaluation
of the proposed ideas on various datasets. Finally, Section 6 draws
some conclusions and highlights some planned future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Model</title>
      <p>A Relational Neural Machine establishes a probability distribution
over a set of n output variables of interest y = fy1; : : : ; yng, given
a set of predictions made by one or multiple deep architectures, and
the model parameters. In this paper the output variables are assumed
to be binary, i.e. yi = f0; 1g.</p>
      <p>Unlike standard neural networks which compute the output via a
simple forward pass, the output computation in an RNM can be
decomposed into two stages: a low-level stage processing the input
patterns, and a subsequent semantic stage, expressing constraints over
the output and performing higher level reasoning. The first stage
processes D input patterns x = fx1; : : : ; xDg, returning the values f
using the network with parameters w. The higher layer takes as input
f and applies reasoning using a set of constraints, whose parameters
are indicated as , then it returns the set of output variables y.</p>
      <p>A RNM model defines a conditional probability distribution in the
exponential family defined as:</p>
      <p>X
!
where Z is the partition function and the C potentials c express
some properties on the input and output variables. The parameters
fsavanna(x1)</p>
      <p>fzoo(x1)
y3 = lion(x1)
= f 1; : : : ; C g determine the strength of the potentials c.</p>
      <p>In a classical and pure supervised learning setup, the patterns are
i.i.d., it is therefore possible to split the y; f into disjoint sets
grouping the variables of each pattern, forming separate cliques. Let us
indicate as y(x); f (x) the portion of the output and function variables
referring to the processing of an input pattern x. A single potential
0 corresponding to the dot product between y and f is needed to
represent supervised learning, and this potential decomposes over the
patterns yielding the distribution,
p0(yjf ; ) =</p>
      <p>exp
1
Z</p>
      <p>X
x2S</p>
      <p>
        !
0(y(x); f (x))
(1)
where S x is the set of supervised patterns. As shown by Marra et
al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the cross entropy loss over sigmoidal or softmax outputs can
be exactly recovered by maximizing the log-likelihood of a RNM for
one-label and multi-label classification tasks, respectively.
      </p>
      <p>Neuro-symbolic integration can be obtained by employing one
potential 0 enforcing the consistency with the supervised data
together with potentials representing the logic knowledge. Using a
similar approach to Markov Logic Networks, a set of First–Order Logic
(FOL) formulas is input to the system, and a potential c for each
formula is considered. It is assumed that some (or all) the predicates
in a KB are unknown and need to be learned together with the
parameters driving the reasoning process.</p>
      <p>In the following we refer to grounded expression (the same
applies to atom or predicate) as a FOL rule whose variables are
assigned to specific constants. It is assumed that the undirected
graphical model is built such that: each grounded atom corresponds to a
node in the graph; all the nodes corresponding to grounded atoms
co-occurring in at least one rule are connected on the graph. As a
result, there is one clique (and then potential) for each grounding gc of
the formula in y. It is assumed that all the potentials resulting from
the c-th formula share the same weight c, therefore the potential
c is the sum over all groundings of c in the world y, such that:
c(y) = Pyc;g c(yc:g) where c(gc) assumes a value equal to 1
and 0 if the grounded formula holds true and false. This yields the
probability distribution:</p>
      <p>0
c
X
yc;g
This will allow to develop the data embeddings as part of training by
enforcing the consistency between the reasoner and network outputs,
while distilling the logical knowledge into the network weights.</p>
      <p>Figure 1 shows the graphical model obtained for a simple
multiclass image classification task. The goal of the training process is to
train the classifiers approximating the predicates, but also to establish
the relevance of each rule. For example, in an image classification
task, the formula 8x Antelope(x) ^ Lion(x) is likely to be
associated to a higher weight than 8x P olarBear(x) ^ Lion(x), which
are unlikely to correlate in the data.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Training</title>
      <p>
        The computation of the partition function requires a summation over
all possible assignments of the output variables, which is intractable
for all but trivial cases. A particularly interesting case is when it is
assumed that the partition function factorizes over the potentials like
done in piecewise likelihood [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]:
2
3
Z
      </p>
      <p>Y Zc = Y
c c
4X exp( c c(f ; yc0))5
y0
c
(2)
where yc is the subset of variables in y that are involved in the
computation of c. Then the piecewise-local probability for the c-th
constraint can be expressed as:
pcP L(ycjf ; ) =
exp ( c c(f ; yc))</p>
      <p>Zc</p>
      <p>Under this assumption, the factors can be distributed over the
potential giving the following generalized piecewise likelihood:
p(yjf ; )</p>
      <p>Y p(ycjynyc; f ; c) =
c</p>
      <p>Y pcP L(ycjf ; )
c</p>
      <p>If the variables in y are binary, the computation of Z requires
summation over all possible assignments which has O(2jyj) complexity.
Using the local decomposition this is reduced to O(jycj 2nc ), where
c is the index of the formula corresponding to the potential with the
largest number nc of variables to ground.</p>
      <p>If the c-th constraint is factorized using the P L partitioning, the
derivatives of the log-likelihood with respect to the model potential
weights are:</p>
      <p>EpcP L [ c]
and with respect to the learner parameters:</p>
      <p>i
yi</p>
      <p>Ey0 pP L y0 i
0 i
EM When the world is not fully observed during training, an
iterative Expectation Maximization (EM) schema can be used to
marginalize over the unobserved data in the expectation step using
the inference methodology as described in the next paragraph. Then,
the average constraint satisfaction can be recomputed, and, finally,
the ; w parameters can be updated in the maximization step. This
process is then iterated until convergence.</p>
    </sec>
    <sec id="sec-4">
      <title>Inference.</title>
      <p>This sections proposes some general methodologies which can be
used to make RNM inference tractable.</p>
      <p>Inference tasks can be sub-categorized into different groups. In
particular, MAP inference methods search the most probable
assignment of the y given the evidence and the fixed parameters w; . The
problem of finding the best assignment y? to the unobserved query
variables given the evidence ye and current parameters can be stated
as:
y? = arg max X
y0 c
c c(f ; [y0; ye])
(3)
where [y0; ye] indicates a full assignment to the y variables, split
into the query and evidence sets.</p>
      <p>On the other hand, MARG inference methods compute the
marginal probability of a set of random variables given some
evidence. MARG inference sums the probability of an assignment of a
query variable yq over all possible worlds. MARG inference for a
single query variable is defined as:
p(yqj ; f ; ye) =</p>
      <p>Both MAP and MARG inference are intractable in the most
general cases as they require to consider all possible assignments.
Therefore, approximate methods must be devised to be able to tackle the
most interesting applications. This section proposes a few inference
solutions that can be naturally applied to RNM.</p>
      <p>Piecewise MAP Inference. MAP inference in RNMs requires
to evaluates all possible 2jyj assignments, which is generally
intractable. A possible solution is to employ the the piecewise
approximation (Equation 2) to separately optimize each single factor, so
reducing the complexity to 2nc^ with nc^ the size of the largest
factor. The main issue with this solution is that the same variable can
be present in different factors, and the piecewise assignments can be
inconsistent across the factors. The assignments to variables shared
across factors can be performed by selecting the assignment selected
by the most factors.</p>
      <p>
        Fuzzy Logic MAP Inference. The y values can be relaxed into
the [0; 1] interval and assume that each potential c(f ; y) has a
fuzzy-logic [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] continuous surrogate cs(f ; y) which collapses into
the original potential when the y assume crisp values and is
continuous with respect to each yi. When relaxing the potentials to accept
continuous variables, the MAP problem stated by Equation 3 can
be solved by gradient-based techniques, by computing the derivative
with respect of each output variable:
Different t-norm fuzzy logics have been considered as continuous
relaxations of logic formulas [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] e.g. Go¨ del, Łukasiewicz and
Product logics. Furthermore, a fragment of the Łukasiewicz logic [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] has
been recently proposed for translating logic inference into a convex
optimization problem.
      </p>
      <p>Piecewise MARG. The piecewise approximation defined by
Equation 2 allows to marginalize the probability of an assignment
over the single factors, allowing to efficiently perform MARG
inference as:
p(yqj ; f ; ye)</p>
      <p>X
yu=yn(ye;yq)</p>
      <p>hp0P L(yu; yqjf )
Y pcP L(yu; yqj c; f ; ye)
c
#
Finally the shared variable can be reconciled by selecting the
assignment for a shared variable that has the highest marginal probability.
Piecewise Gibbs. Gibbs sampling can be used to sample from the
distribution and then select the sample with the highest probability:
y? = arg max X
y0 p c
c c(f ; [y0; ye])
In order to speed up the process, a blocked Gibbs sampler may
be considered, by grouping many variables and then sampling by
their joint distribution. For instance, piecewise approaches suggest to
group the variables belonging to potential, exploiting the constraints
expressed by the potential on the samples. To speed up the process,
a certain flip can be accepted, only if it yields a strictly greater
probability value (Monotonic Gibbs sampling).</p>
      <sec id="sec-4-1">
        <title>Piecewise Gibbs with Fuzzy Map Inference. Gibbs sampling</title>
        <p>would generally require a high number of samples to converge to
the correct distribution. Hybrid inference methodologies can be used
to reduce the burn in time by starting the Gibbs sampler from the
MAP solution found using efficient approximate inference methods
like Fuzzy Logic MAP. The sampler then modifies the solution by
iteratively sampling from the piecewise local distributions. Combining
Fuzzy Logic MAP and a Gibbs sampler allows to avoid low-quality
local minima where fuzzy MAP solutions can get stuck, while
speeding up the Gibbs sampling convergence.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Relax, Compensate and Recover. Relax, Compensate &amp; Recover</title>
        <p>
          (RCR) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is an algorithm to transform a graphical model into a
simplified model, where inference is tractable. This simplified model is
changed while running the algorithm, by computing compensations
to recover a solution as close as possible to the correct distribution.
Graph Neural Networks. A few recent works show that inference
in probabilistic models can be approximated by using Graph
Neural Networks [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. This would allow to define an end-to-end neural
RNM formulation.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>The experimental evaluation aims at testing some of the proposed
inference methods. The evaluation is still partial as more methods
are currently being implemented.</p>
      <p>The evaluation is carried out on a toy task, that is designed to
highlight the capability of RNMs to learn and employ soft rules that are
holding only for a sub-portion of the whole dataset. The MNIST
dataset contains images of single handwritten digits. In this task it
is assumed that additional relational logic knowledge is available in
the form of a binary predicate link connecting image pairs. Given
two images x; y, whose corresponding digits are denoted by i; j, a
link between x and y is established if the second digit follows the
first one, i.e. i = j + 1. However, the link predicate can be noisy,
such that there is a degree of probability that the link(x; y) is
established for i 6= j + 1. The knowledge about the link predicate can be
represented by the FOL formulas:
8x8y link(x; y) ^ digit(x; i) ) digit(x; i + 1)
i = 0; : : : ; 8 ;
where digit(x; i) is a binary predicate indicating if a number i is the
digit class of the image x. Since the link predicate holds true also for
pairs of non-consecutive digits, the above rule is violated by a certain
percentage of digit pairs. Therefore, the manifolds established by the
formulas can help in driving the predictions, but the noisy links force
the reasoner to be flexible about how to employ the knowledge. The
training set is created by randomly selecting 1000 images from the
MNIST dataset and by adding the link relation with an
incremental degree of noise. For each degree of noise in the training set, we
created an equally sized test set with the same degree of noise. A
neural network with 100 hidden relu neurons is used to process the
input images. Table 1 reports the results of RNM inference
meth%link noise
0.0
0.2
0.4
0.6
0.8
0.9</p>
      <p>NN
0.78
0.78
0.78
0.78
0.78
0.78</p>
      <p>Fuzzy MAP
1.00
1.00
0.99
0.89
0.86
0.78</p>
      <p>Piecewise Gibbs
1.00
1.00
0.98
0.88
0.64
0.28</p>
      <p>Fuzzy MAP EM
1.00
1.00
0.96
0.96
0.86
0.78
ods against the baseline provided by the neural network varying the
percentage of links that are predictive of a digit to follow another
one. Using an EM based schema tends to consistently outperform
other methods, this is because EM allows to also improve the
underlying neural network by back-propagating the reasoning predictions
to the learner. Other inference methods will be tested within EM.
Fuzzy MAP tends to find good solutions constantly improving over
the baseline. It is important to notice how RNM is robust with
respect to the high link noise levels even when the relational data is not
carrying any useful information, the final solution still matches the
baseline.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>This paper shows different inference methods for Relational Neural
Machines, a novel framework to provide a tight integration between
learning from supervised data and logic reasoning. As future work,
we plan to undertake a larger experimental exploration of RNM for
more structured problems using all the inference methods proposed
by this paper.</p>
    </sec>
    <sec id="sec-7">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This project has received funding from the European Union’s
Horizon 2020 research and innovation program under grant agreement
No 825619.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Liang-Chieh</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Alexander Schwing, Alan Yuille, and Raquel Urtasun, '
          <article-title>Learning deep structured models'</article-title>
          ,
          <source>in International Conference on Machine Learning</source>
          , pp.
          <fpage>1785</fpage>
          -
          <lpage>1794</lpage>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Arthur</given-names>
            <surname>Choi</surname>
          </string-name>
          and Adnan Darwiche, '
          <article-title>Relax, compensate and then recover'</article-title>
          ,
          <source>in JSAI International Symposium on Artificial Intelligence</source>
          , pp.
          <fpage>167</fpage>
          -
          <lpage>180</lpage>
          . Springer, (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Luc</given-names>
            <surname>De Raedt</surname>
          </string-name>
          , Angelika Kimmig, and Hannu Toivonen, '
          <article-title>Problog: A probabilistic prolog and its application in link discovery'</article-title>
          ,
          <source>in Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07</source>
          , pp.
          <fpage>2468</fpage>
          -
          <lpage>2473</lpage>
          , San Francisco, CA, USA, (
          <year>2007</year>
          ). Morgan Kaufmann Publishers Inc.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Michelangelo</given-names>
            <surname>Diligenti</surname>
          </string-name>
          , Marco Gori, and Claudio Sacca, '
          <article-title>Semanticbased regularization for learning and inference'</article-title>
          ,
          <source>Artificial Intelligence</source>
          ,
          <volume>244</volume>
          ,
          <fpage>143</fpage>
          -
          <lpage>165</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I</given-names>
            <surname>Donadello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <article-title>and AS d'Avila Garcez, 'Logic tensor networks for semantic image interpretation'</article-title>
          ,
          <source>in IJCAI International Joint Conference on Artificial Intelligence</source>
          , pp.
          <fpage>1596</fpage>
          -
          <lpage>1602</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Giannini</surname>
          </string-name>
          , Michelangelo Diligenti, Marco Gori, and Marco Maggini, '
          <article-title>On a convex logic fragment for learning and reasoning'</article-title>
          ,
          <source>IEEE Transactions on Fuzzy Systems</source>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Ian</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          , Yoshua Bengio, Aaron Courville, and Yoshua Bengio,
          <article-title>Deep learning</article-title>
          , volume
          <volume>1</volume>
          , MIT press Cambridge,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Petr</given-names>
            <surname>Ha</surname>
          </string-name>
          <article-title>´jek, Metamathematics of fuzzy logic</article-title>
          , volume
          <volume>4</volume>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>Science</given-names>
          </string-name>
          &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Tamir</given-names>
            <surname>Hazan</surname>
          </string-name>
          , Alexander G Schwing, and Raquel Urtasun, '
          <article-title>Blending learning and inference in conditional random fields'</article-title>
          ,
          <source>The Journal of Machine Learning Research</source>
          ,
          <volume>17</volume>
          (
          <issue>1</issue>
          ),
          <fpage>8305</fpage>
          -
          <lpage>8329</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Zhiting</surname>
            <given-names>Hu</given-names>
          </string-name>
          , Xuezhe Ma, Zhengzhong Liu,
          <string-name>
            <surname>Eduard H. Hovy</surname>
          </string-name>
          , and Eric P. Xing, '
          <article-title>Harnessing deep neural networks with logic rules'</article-title>
          ,
          <source>in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12</source>
          ,
          <year>2016</year>
          , Berlin, Germany, Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers</given-names>
          </string-name>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>William</surname>
            <given-names>W Cohen</given-names>
          </string-name>
          <string-name>
            <surname>Fan Yang Kathryn and Rivard Mazaitis</surname>
          </string-name>
          , 'Tensorlog:
          <article-title>Deep learning meets probabilistic databases'</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>1</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Navdeep</surname>
            <given-names>Kaur</given-names>
          </string-name>
          , Gautam Kunapuli, Tushar Khot, Kristian Kersting, William Cohen, and Sriraam Natarajan, '
          <article-title>Relational restricted boltzmann machines: A probabilistic logic learning approach'</article-title>
          ,
          <source>in International Conference on Inductive Logic Programming</source>
          , pp.
          <fpage>94</fpage>
          -
          <lpage>111</lpage>
          . Springer, (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Robin</surname>
            <given-names>Manhaeve</given-names>
          </string-name>
          , Sebastijan Dumancˇic´, Angelika Kimmig, Thomas Demeester, and Luc De Raedt, 'Deepproblog:
          <article-title>Neural probabilistic logic programming'</article-title>
          , arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>10872</volume>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Giuseppe</surname>
            <given-names>Marra</given-names>
          </string-name>
          , Francesco Giannini, Michelangelo Diligenti, and Marco Gori, '
          <article-title>Integrating learning and reasoning with deep logic models'</article-title>
          ,
          <source>in Proceedings of the European Conference on Machine Learning</source>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Giuseppe</surname>
            <given-names>Marra</given-names>
          </string-name>
          , Francesco Giannini, Michelangelo Diligenti, Marco Maggini, and
          <string-name>
            <surname>Gori</surname>
          </string-name>
          . Marco, '
          <article-title>Reational neural machines'</article-title>
          ,
          <source>in Proceedings of the European Conference on Artificial Intelligence (ECAI)</source>
          , (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Maximilian</surname>
            <given-names>Nickel</given-names>
          </string-name>
          , Lorenzo Rosasco, and Tomaso Poggio, '
          <article-title>Holographic embeddings of knowledge graphs'</article-title>
          ,
          <source>in Thirtieth Aaai conference on artificial intelligence,</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Mathias</surname>
            <given-names>Niepert</given-names>
          </string-name>
          , '
          <article-title>Discriminative gaifman models'</article-title>
          ,
          <source>in Advances in Neural Information Processing Systems</source>
          , pp.
          <fpage>3405</fpage>
          -
          <lpage>3413</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Richardson</surname>
          </string-name>
          and Pedro Domingos, '
          <article-title>Markov logic networks'</article-title>
          ,
          <source>Machine learning</source>
          ,
          <volume>62</volume>
          (
          <issue>1</issue>
          ),
          <fpage>107</fpage>
          -
          <lpage>136</lpage>
          , (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Tim</given-names>
            <surname>Rockta</surname>
          </string-name>
          <article-title>¨schel and Sebastian Riedel, 'End-to-end differentiable proving'</article-title>
          ,
          <source>in Advances in Neural Information Processing Systems</source>
          , pp.
          <fpage>3788</fpage>
          -
          <lpage>3800</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Gustav</surname>
            <given-names>Sourek</given-names>
          </string-name>
          , Vojtech Aschenbrenner, Filip Zelezny, Steven Schockaert, and Ondrej Kuzelka, '
          <article-title>Lifted relational neural networks: Efficient learning of latent relational structures'</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>62</volume>
          ,
          <fpage>69</fpage>
          -
          <lpage>100</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Charles</given-names>
            <surname>Sutton</surname>
          </string-name>
          and
          <string-name>
            <surname>Andrew McCallum</surname>
          </string-name>
          , '
          <article-title>Piecewise pseudolikelihood for efficient training of conditional random fields'</article-title>
          ,
          <source>in Proceedings of the 24th international conference on Machine learning</source>
          , pp.
          <fpage>863</fpage>
          -
          <lpage>870</lpage>
          . ACM, (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Jingyi</surname>
            <given-names>Xu</given-names>
          </string-name>
          , Zilu Zhang, Tal Friedman, Yitao Liang, and Guy Van den Broeck, '
          <article-title>A semantic loss function for deep learning with symbolic knowledge'</article-title>
          ,
          <source>in Proceedings of the 35th International Conference on Machine Learning (ICML)</source>
          ,
          <source>(July</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>KiJung</given-names>
            <surname>Yoon</surname>
          </string-name>
          , Renjie Liao, Yuwen Xiong, Lisa Zhang, Ethan Fetaya, Raquel Urtasun, Richard Zemel, and Xaq Pitkow, '
          <article-title>Inference in probabilistic graphical models by graph neural networks'</article-title>
          , arXiv preprint arXiv:
          <year>1803</year>
          .
          <volume>07710</volume>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>