<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Logic Tensor Networks for Top-N Recom mendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tommaso Carraro</string-name>
          <email>tcarraro@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Daniele</string-name>
          <email>daniele@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Aiolli</string-name>
          <email>aiolli@math.unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luciano Serafini</string-name>
          <email>serafini@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data and Knowledge Management, Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Via Sommarive, 18, 38123 Povo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Mathematics, University of Padova</institution>
          ,
          <addr-line>Via Trieste, 63, 35131 Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>Despite being studied for more than twenty years, state-of-the-art recommendation systems still sufer from important drawbacks which limit their usage in real-world scenarios. Among the well-known issues of recommender systems, there are data sparsity and the cold-start problem. These limitations can be addressed by providing some background knowledge to the model to compensate for the scarcity of data. Following this intuition, we propose to use Logic Tensor Networks (LTNs) to tackle the top-n item recommendation problem. In particular, we show how LTNs can be used to easily and efectively inject commonsense recommendation knowledge inside a recommender system. We evaluate our method on MindReader, a knowledge graph-based movie recommendation dataset containing plentiful side information. In particular, we perform an experiment to show how the benefits of the knowledge increase with the sparsity of the dataset. Eventually, a comparison with a standard Matrix Factorization approach reveals that our model is able to reach and, in many cases, outperform state-of-the-art performance.</p>
      </abstract>
      <kwd-group>
        <kwd>recommender systems</kwd>
        <kwd>top-n recommendation</kwd>
        <kwd>logic tensor networks</kwd>
        <kwd>neural-symbolic integration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>(L. Serafini)
Factorization [14] and Factorization Machines [15, 16] have been proposed recently. These
models allow to efectively extend the user-item matrix by adding new dimensions containing
content (e.g., movie genres, demographic information) and/or contextual side information (e.g.,
location, time). Though these techniques have been shown to improve the recommendation
performance, they are usually specifically designed for one type of side information (e.g., the
user or item content) and lack explainability [17, 18]. Novel recommendation datasets (e.g.,
[19]) provide manifold side information (e.g., ratings on movie genres, actors, directors), and
hence models which can exploit all the available information are required.</p>
      <p>Neural-Symbolic Integration (NeSy) [20] and Statistical Relational Learning (SRL) [21]
represent good candidates to incorporate knowledge with learning. These two branches of Artificial
Intelligence study approaches for the integration of some form of prior knowledge, usually
expressed through First-Order Logic (FOL), with statistical models. The integration has been
shown beneficial to address data scarcity [ 22].</p>
      <p>In this paper, we propose to use a Logic Tensor Network (LTN) [23] to inject commonsense
knowledge into a standard Matrix Factorization model for the top-n item recommendation task.
LTN is a NeSy framework that allows using logical formulas to instruct the learning of a neural
model. We propose to use the MindReader dataset [19] to test our model. This dataset includes
a variety of information, such as users’ tastes across movie genres, actors, and directors. In
this work, we show how LTN can naturally and efectively exploit all this various information
to improve the generalization capabilities of the MF model. In addition, an experiment that
drastically reduces the density of the training ratings reveals that our model can efectively
mitigate the sparsity of data, outperforming the standard MF model, especially in the most
challenging scenarios.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>The integration of logical reasoning and learning in RSs is still in its early stages. Among the
NeSy approaches for RSs, the most prominent is NCR [24]. In this work, the recommendation
problem is formalized into a logical reasoning problem. In particular, the user’s ratings are
represented using logical variables, then, logical operators are used to construct formulas
that express facts about them. Afterward, NCR maps the variables to logical embeddings and
the operators to neural networks which act on those embeddings. By doing so, each logical
expression can be equivalently organized as a neural network, so that logical reasoning and
prediction can be conducted in a continuous space. In [25], the idea of NCR is applied to
knowledge graphs for RSs, while [26] uses a NeSy approach to tackle the explainability of RSs.</p>
      <p>The seminal approach that successfully applied SRL to RSs has been HyPER [27], which is
based on Probabilistic Soft Logic (PSL) [ 28]. In particular, HyPER exploits the expressiveness of
FOL to encode knowledge from a wide range of information sources, such as multiple user and
item similarity measures, content, and social information. Then, Hinge-Loss Markov Random
Fields are used to learn how to balance the diferent information types. HyPER is highly related
to our work since the logical formulas that we use resemble the ones used in HyPER. After
HyPER, other SRL approaches have been proposed for RSs [29, 30].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Background</title>
      <sec id="sec-3-1">
        <title>3.1. Notation</title>
        <p>This section provides useful notation and terminology used in the remainder of the paper.
Bold notation is used to diferentiate between vectors, e.g., x = [3.2, 2.1], and scalars, e.g.,  = 5 .
Matrices and tensors are denoted with upper case bold notation, e.g., X. Then, X is used to
denote the  -th row of X, while X, to denote the position at row  and column  . We refer to the
set of users of a RS with  , where | | =  . Similarly, the set of items is referred to as ℐ such
that |ℐ | =  . We use  to denote a dataset.  is defined as a set of  triples  = {(, ,  ) () }=1 ,
where  ∈  ,  ∈ ℐ , and  ∈ ℕ is a rating. We assume that a user  cannot give more than one
rating to an item  , namely ∄ 1,  2 ∈ ℕ,  1 ≠  2 ∶ {(, ,  1)} ∪ {(, ,  2)} ⊆  .  can be reorganized
in the so-called user-item matrix R ∈ ℕ× , where users are on the rows and items on the
columns, such that R, =  if (, ,  ) ∈  , 0 otherwise.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Matrix Factorization</title>
        <p>Matrix Factorization (MF) is a Latent Factor Model that aims at factorizing the user-item matrix
R into the product of two lower-dimensional rectangular matrices, denoted as U and I. U ∈ ℝ×
and I ∈ ℝ× are matrices containing the users’ and items’ latent factors, respectively, where 
is the number of latent factors. The objective of MF is to find U and I such that R ≈ U ⋅ I⊤. An
efective way to learn the latent factors is by using gradient-descent optimization. Given the
dataset  , a MF model seeks to minimize the following loss function:</p>
        <p>L( ) = 1</p>
        <p>
          ∑ || −̃  || 2 + ||  ||2
 (,,)∈
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
where  ̃= U ⋅ I⊤ and  = {U, I}. The first term of Equation (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) is the Mean Squared Error (MSE)
between the predicted and target ratings, while the second one is an 2 regularization term. 
is an hyper-parameter to set strength of the regularization.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Logic Tensor Networks</title>
        <p>Logic Tensor Networks [23] (LTNs) are a Neural-Symbolic framework that enables efective
integration of deep learning and logical reasoning. It allows to define a knowledge base
composed of a set of logical axioms and to use them as the objective of a neural model. To
define the knowledge base, LTN uses a specific first-order language, called Real Logic, which
forms the basis of the framework. It is fully diferentiable and has a concrete semantics that
allows mapping every symbolic expression into the domain of real numbers. Thanks to Real
Logic, LTN can convert logical formulas into computational graphs that enable gradient-based
optimization based on fuzzy logic semantics.</p>
        <p>Real Logic is defined on a first-order language ℒ with a signature that contains a set 
of constant symbols, a set  of variable symbols, a set ℱ of functional symbols, and a set
 of predicate symbols. A term is constructed recursively from constants, variables, and
functional symbols. An expression formed by applying a predicate symbol to some term(s) is
called an atomic formula. Complex formulas are constructed recursively using connectives (i.e.,
¬, ∧, ∨, ⟹ , ↔) and quantifiers (i.e., ∀, ∃).</p>
        <p>To emphasize the fact that symbols are grounded onto real-valued features, we use the term
grounding1, denoted by  . In particular, each individual is grounded as a tensor of real features,
functions as real functions, and predicates as real functions that specifically project onto a
value in the interval [0, 1]. A variable  is grounded to a sequence of   individuals from a
domain, with   ∈ ℕ+,   &gt; 0. As a consequence, a term () or a formula P(), constructed
recursively with a free variable  , will be grounded to a sequence of   values too. Afterward,
connectives are grounded using fuzzy semantics, while quantifiers using special aggregation
functions. In this paper, we use the product configuration , which is better suited for
gradientbased optimization [31]. Specifically, conjunctions are grounded using the product t-norm
negations using the standard fuzzy negation N , implications using the Reichenbach implication
I , and the universal quantifier using the generalized mean w.r.t the error values
other connectives and quantifiers are not used in this paper, hence not reported.
ME . The</p>
        <p>T

Connective operators are applied element-wise to the tensors in input, while aggregators
aggregate the dimension of the tensor in input that corresponds to the quantified variable. Real
Logic provides also a special type of quantification, called diagonal quantification, denoted
as Diag( 1, … ,   ). It applies only to variables that have the same number of individuals (i.e.,
  1 =   2 = ⋯ =    ) and allows to quantify over specific tuples of individuals, such that the  -th
tuple contains the  -th individual of each of the variables in the argument of Diag. An intuition
about how these operations work in practice is given in Appendix D.
parameters  ∗ that maximize the satisfaction of  :</p>
        <p>Given a Real Logic knowledge base  = {
1, … ,   }, where  1, … ,   are closed formulas,
LTN allows to learn the grounding of constants, functions, and predicates appearing in them.
In particular, if constants are grounded as embeddings, and functions/predicates onto neural
networks, their grounding  depends on some learnable parameters  . We denote a parametric
grounding as  (⋅|  ). In LTN, the learning of parametric groundings is obtained by finding
 ∗ = argmax SatAgg∈
 (|  )
where, SatAgg ∶ [0, 1]∗ ↦ [0, 1]is a formula aggregating operator, often defined using</p>
        <p>Because Real Logic grounds expressions in real and continuous domains, LTN attaches
gradients to every sub-expression and consequently learns through gradient-descent optimization.
1Notice that this is diferent from the common use of the term
grounding in logic, which indicates the operation of
replacing the variables of a term or formula with constants or terms containing no variables. To avoid confusion,
we use the synonym instantiation for this purpose.</p>
        <p>
          (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
ME .
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Method</title>
      <p>Our approach uses a Logic Tensor Network to train a basic Matrix Factorization (MF) model
for the top-n item recommendation task. The LTN is trained using a Real Logic knowledge
base containing commonsense knowledge facts about the movie recommendation domain. This
section formalizes the knowledge base used by our model, how the symbols appearing in it are
grounded in the real field, and how the learning of the LTN takes place.</p>
      <sec id="sec-4-1">
        <title>4.1. Knowledge base</title>
        <p>following axioms.</p>
        <p>The Real Logic knowledge base that our model seeks to maximally satisfy is composed of the
 1 ∶ ∀ Diag( ,  ,  )(</p>
        <p>Sim(Likes( ,  ),  ))
 2 ∶ ∀( ,  ,  )(¬
⟹ Sim(Likes( ,  ),</p>
        <p>−))
LikesGenre( ,  ) ∧</p>
        <p>
          HasGenre( ,  )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
where 
,  
,  
, and
        </p>
        <p>are variable symbols to denote the users of the system,
the items of the system, the ratings given by the users to the items, and the genres of the
movies, respectively.  
− is a constant symbol denoting the negative rating. Likes(, )
is a
functional symbol returning the prediction for the rating given by user  to movie  . Sim( 1,  2)
is a predicate symbol measuring the similarity between two ratings,  1 and  2. LikesGenre(, ) is
a predicate symbol denoting whether the user  likes the genre  . HasGenre(, )
is a predicate
symbol denoting whether the movie  belongs to the genre  .</p>
        <p>
          Notice the use of the diagonal quantification on Axiom
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ). When 
,  
, and  
are
grounded with three sequences of values, the  -th value of each variable matches with the values
of the other variables. This is useful in this case since the dataset  comes as a set of triples.
Diagonal quantification allows forcing the satisfaction of Axiom (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) for these triples only, rather
than any combination of users, items, and ratings in  .
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Grounding of the knowledge base</title>
        <p>The grounding allows to define how the symbols of the language are mapped onto the real field,
and hence how they can be used to construct the architecture of the LTN. In particular, given
 = {(, ,  )}
indexes in  .  ( ) = ⟨
indexes in  .  ( ) = ⟨

=1 ,  ( ) = ⟨</p>
        <p>() ⟩=1 , namely 
() ⟩=1 , namely</p>
        <p>is grounded as a sequence of the  user
is grounded as a sequence of the  movie
() ⟩
=1 with  () ∈ {0, 1} ∀ , namely  
is grounded as a sequence
of the  ratings in  , where 0 denotes a negative rating and 1 a positive one.  ( 
namely  
− is grounded as the negative rating.  ( ) = ⟨1, … , 
 ⟩, namely  
grounded as a sequence of   genre indexes, where   is the number of genres appearing in
−) = 0,
is
the movies of  .  ( Likes |U, I) ∶ ,  ↦</p>
        <p>U ⋅ I⊤ , namely Likes is grounded onto a function that
takes as input a user index  and a movie index  and returns the prediction of the MF model
for user at index  and movie at index  , where U ∈ ℝ× and I×
are the matrices of the users’
and items’ latent factors, respectively.  ( LikesGenre) ∶ ,  ↦ {0, 1} , namely LikesGenre is
grounded onto a function that takes as input a user index  and a genre index  and returns 1 if
the user  likes the genre  in the dataset, 0 otherwise. Similarly,  (  ) ∶ ,  ↦ {0, 1} ,
namely HasGenre is grounded onto a function that takes as input a movie index  and a genre
index  and returns 1 if the movie  belongs to genre  in the dataset, 0 otherwise. Finally,
 ( Sim) ∶  ,̃  ↦ exp(−|| −̃  || 2), namely Sim is grounded onto a function that computes the
similarity between a predicted rating  ̃ and a target rating  . The use of the exponential allows to
treat Sim as a predicate since the output is restricted in the interval [0, 1]. The squared is used
to give more penalty to larger errors in the optimization.  is an hyper-parameter to change the
smoothness of the function.</p>
        <p>
          Intuitively, Axiom (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) states that for each user-movie-rating triple in the dataset  =
{(, ,  ) () }=1 , the prediction computed by the MF model for the user  and movie  should be
similar to the target rating  provided by the user  for the movie  . Instead, Axiom (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) states
that for each possible combination of users, movies, and genres, taken from the dataset, if the
user  does not like a genre of the movie  , then the prediction computed by the MF model for
the user  and movie  should be similar to the negative rating   −, namely the user should
not like the movie  . By forcing the satisfaction of Axiom (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ), the model learns to factorize the
user-item matrix using the ground truth, while Axiom (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) acts as a kind of regularization for
the latent factors of the MF model.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Learning of the LTN</title>
        <p>The objective of our LTN is to learn the latent factors in U and I such that
the axioms in the knowledge base  = { 1,  2} are maximally satisfied, namely
argmax SatAgg∈  (,,)← (|  )2, where  = {U, I}. In practice, this objective
corresponds to the following loss function:</p>
        <p>
          L( ) = (1 −SatAgg∈
 (,,)←ℬ
(|  )) + ||  ||2
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
where ℬ denotes a batch of training triples randomly sampled from  . An 2 regularization
term has been added to the loss to prevent overfitting. Hyper-parameter  is used to define the
strength of the regularization. Notice that the loss does not specify how the variable   is
grounded. Its grounding depends on the sampled batch ℬ. In our experiments, we grounded it
with the sequence of genres of the movies in the batch.
        </p>
        <p>It is worth highlighting that the loss function depends on the semantics used to approximate
the logical connectives, quantifiers, and formula aggregating operator. In our experiments, we
used the stable product configuration , a stable version of the product configuration introduced
in [23]. Then, we selected ME as formula aggregating operator, with  = 2 .
2In the notation, (, , ) ←  means that variables  ,  , and 
taken from the dataset  , namely  takes the sequence of user indexes, 
 the sequence of ratings.</p>
        <p>are grounded with the triples
the sequence of movie indexes, and</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>This section presents the experiments we have performed with our method. They have been
executed on an Apple MacBook Pro (2019) with a 2,6 GHz 6-Core Intel Core i7. The model has
been implemented in Python using PyTorch. In particular, we used the LTNtorch3 library. Our
source code is available at URL4.</p>
      <sec id="sec-5-1">
        <title>5.1. Dataset</title>
        <p>
          In our experiments, we used the MindReader [19] dataset. It contains 102,160 explicit ratings
collected from 1,174 real users on 10,030 entities (e.g., movies, actors, movie genres) taken
from a knowledge graph in the movie domain. The explicit ratings in the dataset can be of
three types: like (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), dislike (−1), or unknown (0). The dataset is subdivided in 10 splits. In our
experiments, we used split 0. Each split has a training set, a validation set, and a test set. The
training set contains both ratings given on movies and on the other entities, while validation
and test sets contain only ratings given on movies. The validation and test sets are built in
such a way to perform a leave-one-out evaluation. In particular, for each user of the training
set, one random positive movie rating is held out for the validation set, and one for the test
set. The validation/test example of the user is completed by adding 100 randomly sampled
negative movie ratings from the dataset. To improve the quality of the dataset, we removed the
unknown ratings. Moreover, we removed the top 2% of popular movies from the test set to see
how the model performs on non-trivial recommendations, as suggested in [19]. Afterward, we
considered only the training ratings given on movies and movie genres since our model uses
only this information. After these steps, we converted the negative ratings from -1 to 0. Our
ifnal dataset contains 962 users, 3,034 movies, 164 genres, 16,351 ratings on movies, and 10,889
ratings on movie genres. The density of the user-movie ratings is 0.37%.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Experimental setting</title>
        <p>
          In our experiments, we compared the performance of three models: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )a standard MF model
trained on the movie ratings of MindReader using Equation (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), denoted as MF, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )a LTN model
trained on the movie ratings of MindReader using Equation (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) with  = { 1}, denoted as LTN,
and (
          <xref ref-type="bibr" rid="ref3">3</xref>
          )a LTN model trained on the movie and genre ratings of MindReader using Equation (
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
with  = { 1,  2}, denoted as LTNgenres. To compare the performance of the models, we used
two widely used ranking-based metrics, namely hit@k and ndcg@k, explained in Appendix A. In
our experiments, we used the following procedure: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )we generated additional training sets by
randomly sampling the 80%, 60%, 40%, and 20% of the movie ratings of each user from the entire
training set, referred to as 100%. Then, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )for each training set   ∈ {100%, 80%, 60%, 40%, 20%}
and for each model  ∈ { MF, LTN, LTNgenres}: (2) we performed a grid search of model  on
training set   to find the best hyper-parameters on the validation set using hit@10 as validation
metric; then, (2) we tested the performance of the best model on the test set in terms of hit@10
and ndcg@10. We repeated this procedure 30 times using seeds from 0 to 29. The test metrics
3https://github.com/logictensornetworks/LTNtorch
4https://github.com/tommasocarraro/LTNrec
have been averaged across these runs and reported in Table 1. Due to computational time, the
grid search has been computed only for the first run. Starting from the second run, step (2) is
replaced with the training of model  on the training set   with the best hyper-parameters
found during the first run. A description of the hyper-parameters tested in the grid searches as
well as the training details of the models is explained in Appendix B.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>A comparison between MF, LTN, and LTNgenres is reported in Table 1. The table reports the
performance of the three models on a variety of tasks with diferent sparsity.
By looking at the table, it is possible to observe that LTN outperforms MF in all the five tasks.
In particular, for the dataset with 20% of training ratings, the improvement is drastic (27.33%
on hit@10). We want to emphasize that the two models only difer in the loss function. This
demonstrates that the loss based on fuzzy logic semantics of LTN is beneficial to deal with the
sparsity of data. Then, with the addition of knowledge regarding the users’ tastes across the
movie genres, it is possible to further improve the results, as shown in the last column of the
table. LTNgenres outperforms the other models on almost all the tasks. For the dataset with
the 20% of the ratings, the hit@10 of LTNgenres is slightly worse compared to LTN. This could
be related to the quality of the training ratings sampled from the original dataset. This is also
suggested by the higher standard deviation associated with the datasets with higher sparsity.
For considerations about the training times of the models refer to Appendix C.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>In this paper, we proposed to use Logic Tensor Networks to tackle the top-n recommendation
task. We showed how, by design, LTN permits to easily integrate side information inside a
recommendation model. We compared our LTN models with a standard MF model, in a variety
of tasks with diferent sparsity, showing the benefits provided by the background knowledge,
especially when the task is challenging due to data scarcity.
[13] M. Polato, F. Aiolli, Exploiting sparsity to build eficient kernel based collaborative
ifltering for top-n item recommendation, Neurocomputing 268 (2017) 17–26. URL: https://
www.sciencedirect.com/science/article/pii/S0925231217307592. doi:https://doi.org/10.
1016/j.neucom.2016.12.090, advances in artificial neural networks, machine learning
and computational intelligence.
[14] P. Bhargava, T. Phan, J. Zhou, J. Lee, Who, what, when, and where: Multi-dimensional
collaborative recommendations using tensor factorization on sparse user-generated data,
in: Proceedings of the 24th International Conference on World Wide Web, WWW ’15,
International World Wide Web Conferences Steering Committee, Republic and Canton
of Geneva, CHE, 2015, p. 130–140. URL: https://doi.org/10.1145/2736277.2741077. doi:10.
1145/2736277.2741077.
[15] S. Rendle, Factorization machines, in: 2010 IEEE International Conference on Data Mining,
2010, pp. 995–1000. doi:10.1109/ICDM.2010.127.
[16] X. Xin, B. Chen, X. He, D. Wang, Y. Ding, J. Jose, Cfm: Convolutional factorization
machines for context-aware recommendation, in: Proceedings of the Twenty-Eighth
International Joint Conference on Artificial Intelligence, IJCAI-19, International Joint
Conferences on Artificial Intelligence Organization, 2019, pp. 3926–3932. URL: https:
//doi.org/10.24963/ijcai.2019/545. doi:10.24963/ijcai.2019/545.
[17] Y. Zhang, X. Chen, Explainable recommendation: A survey and new perspectives,
Foundations and Trends® in Information Retrieval 14 (2020) 1–101. URL: https://doi.org/10.1561%
2F1500000066. doi:10.1561/1500000066.
[18] T. Carraro, M. Polato, F. Aiolli, A look inside the black-box: Towards the interpretability
of conditioned variational autoencoder for collaborative filtering, in: Adjunct Publication
of the 28th ACM Conference on User Modeling, Adaptation and Personalization, UMAP
’20 Adjunct, Association for Computing Machinery, New York, NY, USA, 2020, p. 233–236.</p>
      <p>URL: https://doi.org/10.1145/3386392.3399305. doi:10.1145/3386392.3399305.
[19] A. H. Brams, A. L. Jakobsen, T. E. Jendal, M. Lissandrini, P. Dolog, K. Hose, Mindreader:
Recommendation over knowledge graph entities with explicit user ratings, CIKM ’20,
Association for Computing Machinery, New York, NY, USA, 2020, p. 2975–2982. URL:
https://doi.org/10.1145/3340531.3412759. doi:10.1145/3340531.3412759.
[20] T. R. Besold, A. d. Garcez, S. Bader, H. Bowman, P. Domingos, P. Hitzler, K.-U. Kuehnberger,
L. C. Lamb, D. Lowd, P. M. V. Lima, L. de Penning, G. Pinkas, H. Poon, G. Zaverucha,
Neural-symbolic learning and reasoning: A survey and interpretation, 2017. URL: https:
//arxiv.org/abs/1711.03902. doi:10.48550/ARXIV.1711.03902.
[21] L. D. Raedt, K. Kersting, Statistical Relational Learning, Springer US, Boston, MA,
2010, pp. 916–924. URL: https://doi.org/10.1007/978-0-387-30164-8_786. doi:10.1007/
978- 0- 387- 30164- 8_786.
[22] A. Daniele, L. Serafini, Neural networks enhancement with logical knowledge, 2020. URL:
https://arxiv.org/abs/2009.06087. doi:10.48550/ARXIV.2009.06087.
[23] S. Badreddine, A. d'Avila Garcez, L. Serafini, M. Spranger, Logic tensor networks, Artificial
Intelligence 303 (2022) 103649. URL: https://doi.org/10.1016%2Fj.artint.2021.103649. doi:10.
1016/j.artint.2021.103649.
[24] H. Chen, S. Shi, Y. Li, Y. Zhang, Neural collaborative reasoning, in: Proceedings of
the Web Conference 2021, ACM, 2021. URL: https://doi.org/10.1145%2F3442381.3449973.
doi:10.1145/3442381.3449973.
[25] H. Chen, Y. Li, S. Shi, S. Liu, H. Zhu, Y. Zhang, Graph collaborative reasoning, in:
Proceedings of the Fifteenth ACM International Conference on Web Search and Data
Mining, WSDM ’22, Association for Computing Machinery, New York, NY, USA, 2022, p.
75–84. URL: https://doi.org/10.1145/3488560.3498410. doi:10.1145/3488560.3498410.
[26] Y. Xian, Z. Fu, H. Zhao, Y. Ge, X. Chen, Q. Huang, S. Geng, Z. Qin, G. de Melo, S.
Muthukrishnan, Y. Zhang, Cafe: Coarse-to-fine neural symbolic reasoning for explainable
recommendation, in: Proceedings of the 29th ACM International Conference on
Information Knowledge Management, CIKM ’20, Association for Computing Machinery,
New York, NY, USA, 2020, p. 1645–1654. URL: https://doi.org/10.1145/3340531.3412038.
doi:10.1145/3340531.3412038.
[27] P. Kouki, S. Fakhraei, J. Foulds, M. Eirinaki, L. Getoor, Hyper: A flexible and extensible
probabilistic framework for hybrid recommender systems, RecSys ’15, Association for
Computing Machinery, New York, NY, USA, 2015, p. 99–106. URL: https://doi.org/10.1145/
2792838.2800175. doi:10.1145/2792838.2800175.
[28] A. Kimmig, S. Bach, M. Broecheler, B. Huang, L. Getoor, A short introduction to probabilistic
soft logic, Mansinghka, Vikash, 2012, pp. 1–4. URL: https://lirias.kuleuven.be/retrieve/
204697.
[29] R. Catherine, W. Cohen, Personalized recommendations using knowledge graphs: A
probabilistic logic programming approach, in: Proceedings of the 10th ACM Conference
on Recommender Systems, RecSys ’16, Association for Computing Machinery, New York,
NY, USA, 2016, p. 325–332. URL: https://doi.org/10.1145/2959100.2959131. doi:10.1145/
2959100.2959131.
[30] M. Gridach, Hybrid deep neural networks for recommender systems, Neurocomputing 413
(2020) 23–30. URL: https://www.sciencedirect.com/science/article/pii/S0925231220309966.
doi:https://doi.org/10.1016/j.neucom.2020.06.025.
[31] E. van Krieken, E. Acar, F. van Harmelen, Analyzing diferentiable fuzzy logic operators,
Artificial Intelligence 302 (2022) 103602. URL: https://doi.org/10.1016%2Fj.artint.2021.103602.
doi:10.1016/j.artint.2021.103602.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Metrics</title>
      <p>To validate and test our models, we selected two widely used ranking-based metrics, namely
• hit@k: Hit Ratio measures whether a testing item is placed in the top-k positions of the
ranking, considering the presence of an item as a hit;
• ndcg@k: Normalized Discounted Cumulative Gain measures the quality of the
recommendation based on the position of the target item in the ranking. In particular, it uses a
monotonically increasing discount to emphasize the importance of higher ranks versus
lower ones.</p>
      <p>Formally, let us define ( ) as the item at rank  , [⋅] as the indicator function, and   as the set
of held-out items for user  . hit@k for user  is defined as
Truncated discounted cumulative gain (dcg@k) for user  is defined as

=1

∑
=1</p>
      <p>[(∑  [( ) ∈   ]) ≥ 1] .
2 [()∈  ] − 1
log( + 1)
.
all the held-out items are ranked at the top. Notice that in this paper |  | = 1.</p>
    </sec>
    <sec id="sec-9">
      <title>B. Training details</title>
      <p>
        The hyper-parameters tested during the grid searches explained in Section 5.2 vary depending on
the model. For all the models, we tried a number of latent factors  ∈ {1, 5, 10, 25} , regularization
coeficient  ∈ {0.001, 0.0001} , batch size in {32, 64}, and whether it was better to add users’ and
items’ biases to the model. For LTN and LTNgenres, we tried  ∈ {0.05, 0.1, 0.2} for the predicate
Sim and used  = 2 for the aggregator ME of Axiom (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ). For LTNgenres, we tried  ∈ {2, 5}
for the aggregator ME of Axiom (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ). Notice that lim→∞
ME ( 1, … ,   ) = min{ 1, … ,   }.
      </p>
      <p>Intuitively,  ofers flexibility to account for outliers in the data. The higher the
 , the more
focus the model will have on the outliers.</p>
      <p>For all the models, the latent factors U and I, for users and items, respectively, have been
randomly initialized using the Glorot initialization, while the biases with values sampled from a
normal distribution with 0 mean and unitary variance. All the models have been trained for
200 epochs by using the Adam optimizer with a learning rate of 0.001. For each training, we
used early stopping to stop the learning if after 20 epochs no improvements were found on the
validation metric (i.e., hit@10).</p>
    </sec>
    <sec id="sec-10">
      <title>C. Training time</title>
      <p>A comparison of the training times required by the models on the diferent datasets is presented
in Table 2. The models have been trained for 200 epochs with a learning rate of 0.001, batch
size of 64, one latent factor (i.e.,  = 1 ), without bias terms, and without early stopping. The
other hyper-parameters do not afect training time. In particular, LTN increases the time
complexity considerably. This is due to Axiom 4, which has to be evaluated for each possible
combination of users, items, and genres. This drawback can limit the application of LTN
in datasets with a higher number of users and items. However, it is possible to boost training
time using GPUs or by designing logical axioms which make use of diagonal quantification.</p>
    </sec>
    <sec id="sec-11">
      <title>D. Intuition of Real Logic grounding</title>
      <p>In Real Logic, diferently from first-order logic, a variable  is grounded as a sequence of  
individuals (i.e., tensors) from a domain, with   ∈ ℕ+,   &gt; 0. As a direct consequence, a term
() or a formula P(), with a free variable  , is grounded to a sequence of   values too. For
= 1 , where   is the  -th individual of
example, P() returns a vector in [0, 1 ]  , namely ⟨P(  )⟩
 . Similarly, ( ) returns a matrix in ℝ  × , assuming that  maps to individuals in ℝ . This
formalization is intuitively extended to terms and formulas with arity greater than one. In
such cases, Real Logic organizes the output tensor in such a way that it has a dimension for
each free variable involved in the expression. For instance,  2(,  ) returns a tensor in ℝ  ×  × ,
assuming that  2 maps to individuals in ℝ . In particular, at position (, ) there is the evaluation
of  2(  ,   ), where   denotes the  -th individual of  and   the  -th individual of  . Similarly,
P2(,  ) returns a tensor in [0, 1 ]  ×  , where at position (, ) there is the evaluation of P(  ,   ).</p>
      <p>The connective operators are applied element-wise to the tensors in input. For instance,
¬ P2(,  ) returns a tensor in [0, 1 ]  ×  , where at position (, ) there is the evaluation of
¬ P2(  ,   ), namely N (i.e., ¬) is applied to each truth value in the tensor P2(,  ) ∈ [0, 1]   ×  .
For binary connectives, the behavior is similar. For instance, let Q be a predicate symbol and
 a variable. Then, P2(,  ) ∧ Q(, ) returns a tensor in [0, 1 ]  ×  ×  , where at position (, , )
there is the evaluation of the formula on the  -th individual of  ,  -th individual of  , and  -th
individual of  .</p>
      <p>The quantifiers aggregate the dimension that corresponds to the quantified variable. For
instance, ∀ P2(,  ) returns a tensor in [0, 1 ]  , namely the aggregation is performed across the
dimension of  . Since  is the only free variable remaining in the expression, the output has
one single dimension, corresponding to the dimension of  . Specifically, the framework
computes P2(,  ) ∈ [0, 1]   ×  first, then it aggregates the dimension corresponding to  . Similarly,
∀(,  ) P2(,  ) returns a scalar in [0, 1], namely the aggregation is performed across the
dimensions of both variables  and  . In the case of diagonal quantification, the framework behaves
diferently. For instance, ∀ Diag( ,  ) P2( ,  ) , where  and  are two variables with the same
number of individuals   =   , returns a scalar in [0, 1], which is the result of the aggregation
of   truth values, namely P2( 1,  1),P2( 2,  2), … ,P2(   ,    ). Without diagonal
quantification (i.e., ∀( ,  ) P2( ,  ) ), the framework performs an aggregation across the dimensions of
both variables, involving  2 values, namely P2( 1,  1),P2( 1,  2), … ,P2(   ,    −1),P2(   ,    ).
Intuitively, ∀( ,  ) aggregates all the values in [0, 1 ]  ×  , while ∀ Diag( ,  ) aggregates only the
values in the diagonal.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rokach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shapira</surname>
          </string-name>
          ,
          <source>Recommender Systems: Introduction and Challenges</source>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>US</given-names>
          </string-name>
          , Boston, MA,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          . URL: https://doi.org/10.1007/978-1-
          <fpage>4899</fpage>
          -7637-
          <issue>6</issue>
          _1. doi:
          <volume>10</volume>
          .1007/978-1-
          <fpage>4899</fpage>
          -7637-
          <issue>6</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>X.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Khoshgoftaar</surname>
          </string-name>
          ,
          <article-title>A survey of collaborative filtering techniques, Adv</article-title>
          . in Artif. Intell.
          <year>2009</year>
          (
          <year>2009</year>
          ). URL: https://doi.org/10.1155/
          <year>2009</year>
          /421425. doi:
          <volume>10</volume>
          .1155/
          <year>2009</year>
          /421425.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bell</surname>
          </string-name>
          , Advances in Collaborative Filtering, Springer, Boston, MA,
          <year>2011</year>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>186</lpage>
          . URL: https://doi.org/10.1007/978-0-
          <fpage>387</fpage>
          -85820-
          <issue>3</issue>
          _5. doi:
          <volume>10</volume>
          .1007/ 978-0-
          <fpage>387</fpage>
          -85820-3\_5.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Aiolli</surname>
          </string-name>
          ,
          <article-title>Eficient top-n recommendation for very large scale binary rated datasets</article-title>
          ,
          <source>in: Proceedings of the 7th ACM Conference on Recommender Systems</source>
          , RecSys '13,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2013</year>
          , p.
          <fpage>273</fpage>
          -
          <lpage>280</lpage>
          . URL: https://doi.org/10. 1145/2507157.2507189. doi:
          <volume>10</volume>
          .1145/2507157.2507189.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Volinsky</surname>
          </string-name>
          ,
          <article-title>Collaborative filtering for implicit feedback datasets</article-title>
          , in: 2008
          <source>Eighth IEEE International Conference on Data Mining</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>272</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ICDM.
          <year>2008</year>
          .
          <volume>22</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ning</surname>
          </string-name>
          , G. Karypis, Slim:
          <article-title>Sparse linear methods for top-n recommender systems</article-title>
          ,
          <source>in: 2011 IEEE 11th International Conference on Data Mining</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>497</fpage>
          -
          <lpage>506</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ICDM.
          <year>2011</year>
          .
          <volume>134</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Aiolli</surname>
          </string-name>
          ,
          <article-title>Boolean kernels for collaborative filtering in top-n item recommendation</article-title>
          ,
          <source>Neurocomput</source>
          .
          <volume>286</volume>
          (
          <year>2018</year>
          )
          <fpage>214</fpage>
          -
          <lpage>225</lpage>
          . URL: https://doi.org/10.1016/j.neucom.
          <year>2018</year>
          .
          <volume>01</volume>
          .057. doi:
          <volume>10</volume>
          .1016/j.neucom.
          <year>2018</year>
          .
          <volume>01</volume>
          .057.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. G.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Hofman</surname>
          </string-name>
          , T. Jebara,
          <article-title>Variational autoencoders for collaborative filtering</article-title>
          ,
          <source>in: Proceedings of the 2018 World Wide Web Conference</source>
          , WWW '18,
          <string-name>
            <given-names>International</given-names>
            <surname>World Wide Web Conferences Steering Committee</surname>
          </string-name>
          , Republic and Canton of Geneva, CHE,
          <year>2018</year>
          , p.
          <fpage>689</fpage>
          -
          <lpage>698</lpage>
          . URL: https://doi.org/10.1145/3178876.3186150. doi:
          <volume>10</volume>
          .1145/3178876.3186150.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Shenbin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alekseev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tutubalina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Malykh</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. I. Nikolenko</surname>
          </string-name>
          ,
          <article-title>RecVAE: A new variational autoencoder for top-n recommendations with implicit feedback</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on Web Search and Data Mining, ACM</source>
          ,
          <year>2020</year>
          . URL: https://doi.org/10.1145%
          <fpage>2F3336191</fpage>
          .3371831. doi:
          <volume>10</volume>
          .1145/3336191.3371831.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Steck</surname>
          </string-name>
          ,
          <article-title>Embarrassingly shallow autoencoders for sparse data</article-title>
          ,
          <source>in: The World Wide Web Conference</source>
          , WWW '19,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>3251</fpage>
          -
          <lpage>3257</lpage>
          . URL: https://doi.org/10.1145/3308558.3313710. doi:
          <volume>10</volume>
          .1145/3308558. 3313710.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          , T.-S. Chua,
          <article-title>Neural collaborative filtering</article-title>
          ,
          <source>in: Proceedings of the 26th International Conference on World Wide Web, WWW '17, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE</source>
          ,
          <year>2017</year>
          , p.
          <fpage>173</fpage>
          -
          <lpage>182</lpage>
          . URL: https://doi.org/10.1145/3038912.3052569. doi:
          <volume>10</volume>
          .1145/3038912.3052569.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Carraro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Aiolli</surname>
          </string-name>
          ,
          <article-title>Conditioned variational autoencoder for top-n item recommendation</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2004</year>
          .11141. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>2004</year>
          .
          <volume>11141</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>