<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Gradient-Based Supported Model Computation in Vector Spaces</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Akihiro Takemura</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katsumi Inoue</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Informatics</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Graduate University for Advanced Studies</institution>
          ,
          <addr-line>SOKENDAI</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose an alternative method for computing supported models of normal logic programs in vector spaces using gradient information. To this end, we first translate the program into a definite program and embed the program into a matrix. We then define a loss function based on the implementation of the immediate consequence operator  by matrix-vector multiplication with a suitable thresholding function. We propose an almost everywhere diferentiable alternative to the non-linear thresholding operation, and incorporate penalty terms into the loss function to avoid undesirable results. We report the results of several experiments where our method shows improvements over an existing method in terms of success rates of finding correct supported models of normal logic programs.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Logic Programming</kwd>
        <kwd>Supported Model Computation</kwd>
        <kwd>Diferentiable Logic Programming</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Performing logical inference with linear algebraic methods has been studied as an attractive
alternative to traditional symbolic inference methods [1, 2, 3]. Linear algebraic approaches to
logical inference ofer promising characteristics compared to the traditional methods. For
example, eficiency and scalability: since linear algebra is at the heart of many scientific applications,
there are existing highly-eficient optimized algorithms for performing basic operations in linear
algebra, and these new methods can benefit from high computational power ofered in modern
systems in the form of multi-core CPUs and GPUs. In this work, we present a gradient-based
method for computing supported models in vector spaces.</p>
      <p>Sakama et al. introduced the matrix representations of definite, disjunctive and normal logic
programs, and a linear algebraic method for computing the least models and stable models [2].
More recently Nguyen et al. proposed an improved version of the algorithm by taking advantage
of sparsity [4]. Sato et al. proposed a method for computing supported models in vector spaces
via 3-valued completion models of a normal logic program [5]. The matrix encoding method
used in this work is influenced by the aforementioned works, and we extend the previous works
by proposing an encoding that is more suitable towards our goal of computing supported models
as fixed points in vector spaces. Another diference is that while our method uses gradient
14th Workshop on Answer Set Programming and Other Computing Paradigms
" atakemura@nii.ac.jp (A. Takemura); inoue@nii.ac.jp (K. Inoue)
0000-0003-4130-8311 (A. Takemura); 0000-0002-2717-9122 (K. Inoue)</p>
      <p>© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CPWrEooUrckResehdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CEUR Workshop Proceedings (CEUR-WS.org)
information to find supported models, previous works use non-diferentiable operations to find
models.</p>
      <p>Gradient-based methods have been considered, for example, for abduction [6], SAT and
probabilistic inference [7]. Our design of the loss function follows the direction of previous
works by Sato [8, 7], and we introduce additional terms for computing supported models.
Although one should be able to compute supported models by obtaining a completion of the
program and solving it with a SAT solver e.g. using the matrix-based SAT solver in [7], our
method does not require completion nor SAT solver. Blair et al. introduced an extension of
2-valued semantics in vector spaces using a polynomial representation of normal programs,
and used gradient descent to solve the dynamical system where the solutions correspond to
the supported models [9]. In contrast we employ a matrix representation of the program,
thresholding function and gradient descent for minimizing the loss function.</p>
      <p>The closest to our work is Aspis et al.’s method [10], as both our algorithm and theirs use
gradient-based method to find supported models of normal logic programs in vector spaces.
Their method uses a matrix representation of the program reduct and the Newton’s method
for root finding to find fixed points. Some diferences are: (i) While our method uses a normal
to definite program translation and a matrix representation of the program, their method
incorporates the notion of reduct into the program matrix. (ii) Our loss function includes a
thresholding term and additionally regularization terms for dealing with fractional assignments
and facts, while in their method the convergence depends solely on the fixed point criterion. The
use of regularization to penalize fractional-valued interpretations is critical to avoid converging
to unwanted middle points. Our goal in this paper is to improve upon Aspis et al.’s method, by
extending Sato et al.’s earlier work on using loss functions for logical inference, and incorporating
new terms for supported model computation.</p>
      <p>The structure of this paper is as follows: Section 2 covers the necessary background and
definitions. Section 3 presents a method for representing logic programs with matrices. Section
4 introduces the thresholding function and loss function, as well as the gradient-based search
algorithm for finding supported models. Section 5 presents the results of experiments designed
to test the ability of the algorithm and compare against known results in the literature. Finally,
Section 6 presents the conclusion.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>We consider a language ℒ that contains a finite set of propositional variables defined over a
ifnite alphabet and the logical connectives ¬, ∧, ∨ and ← . The Herbrand base,  , is the set of
all propositional variables that appear in a logic program  .</p>
      <p>
        A definite program is a set of rules of the form (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) or (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), where ℎ and  are propositional
variables (atoms) in ℒ. We refer to (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) as an OR-rule, which is a shorthand for  rules: ℎ ←
1, ℎ ← 2, . . . , ℎ ← . For each rule  we define ℎ() = ℎ and () = {1, . . . , }.
A rule  is referred to as a fact if () = ∅.
ℎ ← 1 ∨ 2 ∨ · · · ∨
      </p>
      <p>( ≥ 0)</p>
      <p>
        A normal program is a set of rules of the form (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) where ℎ and  are propositional variables
in ℒ.
      </p>
      <p>∧ ¬+1 ∧ ¬+2 ∧ · · · ∧ ¬ ( ≥  ≥ 0)
We refer to the positive and negative occurrences of atoms in the body as +() = {1, . . . , }
and − () = {+1, . . . , }, respectively. A normal program is a definite program if
− () = ∅ for every rule  ∈  .</p>
      <p>
        An Herbrand interpretation , of a normal program  is a subset of  . A model  of 
is an interpretation of  where for every rule  ∈  of the form (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), +() ⊆  and
− () ∩  = ∅ imply ℎ ∈  . A program is called consistent if it has a model. A supported
model  is a model of  where for every  ∈  there exists a rule  ∈  such that  = ℎ,
+() ⊆  and − () ∩  = ∅ [11, 12].
      </p>
      <p>As we shall show later, in this paper we transform normal logic programs into definite
programs for searching supported models. Thus we use the following definition of the immediate
consequence operator  .  : 2 → 2 is a function on Herbrand interpretations. For
a definite program  , we have:  () = {ℎ|ℎ ← 1 ∧ · · · ∧  ∈  and {1, . . . , } ⊆
} ∪ {ℎ|ℎ ← 1 ∨ · · · ∨  ∈  and {1, . . . , } ∩  ̸= ∅}. It is known that a supported model
 of a program  is a fixed point of  , i.e.,  ( ) =  [11].</p>
      <p>Definition 1 (Singly-Defined (SD) program) . A normal program P is called an SD program if
ℎ(1) ̸= ℎ(2) for any two rules 1 and 2 in P where 1 ̸= 2.</p>
      <p>Any normal program  can be converted into an SD-program  ′ in the following manner.
If there are more than one rule with the same head (ℎ ← (1), . . . , ℎ ← (), where
 &gt; 1), then replace them with a set of new rules {ℎ ← 1 ∨ ... ∨ , 1 ← (1), ...,  ←
()} containing new atoms {1, ..., }. This is a stricter condition than the multiple
definitions condition (MD-condition) [2]: for any two rules 1 and 2 in the program, ℎ(1) =
ℎ(2) implies |(1)| ≤ 1 and |(2)| ≤ 1. Consequently all SD-programs satisfy the
MD condition. Unless noted otherwise, we assume all programs in this paper are SD-programs.</p>
      <p>
        Given a normal program  , it is transformed into a definite program by replacing the negated
literals in rules of the form (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) and rewriting:
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
 ∧ +1 ∧ +2 ∧ · · · ∧
( ≥  ≥ 0)
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
where  are new atoms associated with the negated . A collection of rules of the form (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) is
referred to as the positive form  + where  + =  ∪ { |  ∈  }. For transformed rules
of the form (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ), we refer to {1, . . . , } as the positive part and {+1, . . . , } as the negative
part. After transformation, the program should contain rules of the forms (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), or (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ).
      </p>
      <p>By an interpretation + of  +, we mean any set of atoms + ⊆  + that satisfies the
condition for any atom  ∈  + , precisely one of either  or  belongs to +.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Representing Logic Programs with Matrices</title>
      <p>In this section, we first describe the relationship between the positive forms of normal logic
programs and its supported models. Then we describe the matrix encoding of the positive forms,
which in turn represents the original normal logic programs in vector spaces.</p>
      <sec id="sec-3-1">
        <title>3.1. Relationship between Positive Forms and Supported Models</title>
        <p>Consider a (inconsistent) program  ← ¬ , and its positive form  ← .  + is a definite
program but it has no models in this case due to the restriction we place on the interpretation:
if  ∈ + then  ̸∈ + and vice versa. Then in this case, the implication is that there are no
ifxed points of  + for  + that satisfy the condition  ∈ + if  ̸∈ +. On the other hand,
when a model  of  exists, we can show that the corresponding  + is a model of  +.
Proposition 1. Let  be a normal program, and let  + be its positive form. If  is a model of
 , then  ′ =  ∪ { |  ∈  + ∖  } is a model of  +. Conversely, if  + is a model of  +,
then  + ∩  is a model of P.</p>
        <p>Proof. Follows from the definition of  ′ and  +. Consider  ′. Since  ̸∈  ′ if  ∈  ′
and vice versa, for each rule  ∈  +, () ⊆  ′ implies ℎ() =  ∈  ′. Thus,  ′
is a model of  +. Now consider  +. Let  =  + ∩  . Given that  + is a model of  +
and  ∈  if  ∈  +, for each rule  ∈  , +() ⊆  and − () ∩  = ∅ implies
ℎ() =  ∈ . Thus,  is a model of  .</p>
        <p>Proposition 2. Let  be a supported model of  , and put  ′ =  ∪{ |  ∈  + ∖  }. Then,
 + ( ′) =  .</p>
        <p>Proof. Suppose  ∈  . Since  is a supported model, there exists a rule  ∈  such that
ℎ() = , +() ⊆  and − () ∩  = ∅. Correspondingly, there exists a rule
′ ∈  + such that ℎ(′) = , +(′) ⊆  ′ and − (′) ⊆  ′. That is,  ∈  + ( ′).
Hence,  ⊆  + ( ′).</p>
        <p>Conversely, suppose  ∈  + ( ′). Then, there exists a rule ′ ∈  + such that ℎ(′) = 
and (′) ⊆  ′. Since  ′ is a model of  + by Proposition 1, (′) ⊆  ′ implies
ℎ(′) =  ∈  ′. Because  is a positive literal,  ∈  holds. Hence,  + ( ′) ⊆  .
Therefore,  =  + ( ′).</p>
        <p>Proposition 3. Let  + be an interpretation of  +. If  + ( +) =  + ∩  holds, then
 =  + ∩  is a supported model of  .</p>
        <p>Proof. Suppose  + ( +) =  + ∩  . Because  + ∩  recovers the positive literals
from  +, for each  ∈ ( + ∩  ), there exists a rule  ∈  such that ℎ() = ,
+() ⊆ ( + ∩  ) and − () ∩ ( + ∩  ) = ∅. Thus,  =  + ∩  is a
supported model of  .</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Matrix Encoding of Logic Programs</title>
        <p>We first introduce the matrix and vector notations used in this paper, before showing our
method for encoding normal programs into matrices. Matrices and vectors are represented as
bold upper-case e.g. M and lower-case letters e.g. v, respectively. A 1-vector with length  is
represented by 1 . The indices of the entries of matrices and vectors appear in the subscript,
for example, M refers to the element at -th row and -th column of a matrix M and v refers
to the -th element of a column vector v. Let M: and M: denote the -th row slice and -th
column slice of M, respectively. We denote the horizontal concatenation of matrices M1 and
M2 as [M1 M2], and denote the vertical concatenation of column vectors v1 and v2 as [v1; v2].
Some matrices and vectors in this paper have a superscript to distinguish them from others, e.g.
M, and it should not be confused with indices appearing in the subscript; for example, M
refers to the , -th entry of M.</p>
        <p>Let  be a normal program with size | | =  ,  + its positive form and  + the Herbrand
base of  +. Then we have | + | = 2 since for every  ∈  we add its negated version
. To encode  + into a program matrix Q, we encode atoms appearing in the bodies of the
rules with one-hot encoding. After performing this encoding for all rules in  +, we obtain an
( × 2 ) binary program matrix Q, which is equivalent to a vertical concatenation of row
vectors each encoding a rule  ∈  + with one-hot encoding.</p>
        <p>Definition 2 (Program Matrix). Let  be a normal program with | | =  and  + its positive
form with | + | = 2 . Then  + is represented by a matrix Q ∈ Z× 2 such that for each
element Q (1 ≤  ≤ , 1 ≤  ≤ 2 ) in Q,
• Q = 1 if atom  ∈  + (1 ≤  ≤ 2 ) appears in the body of the rule (1 ≤  ≤  );
• Q = 0, otherwise.</p>
        <p>The -th row of Q corresponds to the atom  appearing in the head of the rule , and the
-th column corresponds to the atom  (1 ≤  ≤ 2 ) appearing in the body of the rules
(1 ≤  ≤  ). Atoms that do not appear in the head of any of the rules in  + are encoded as
zero-only row vectors in Q.</p>
        <p>This definition is diferent from [ 2] or [10], in that we do not explicitly include ⊤ and ⊥
in the program matrix, and we do not use fractional values to encode long rules. In fact, our
encoding method is similar to that of [5], except that we do not use (2 × 2 ) space for the
program matrix since we do not encode rules with  ∈  + in the head.1
Definition 3 (Interpretation Vector). Let  be a definite program and  = {1, . . . ,  }.
Then an interpretation  ⊆  is represented by a vector v = (v1, . . . , v )⊺ ∈ Z where each
element v (1 ≤  ≤  ) represents the truth value of the proposition  such that v = 1 if
 ∈ , otherwise v = 0. We assume propositional variables share the common index such that
v corresponds to , and we write var(v) = .</p>
        <p>
          Recall that the positive form  + of a normal program is a definite program, and all negated
literals in the body are replaced by new atoms, e.g. in (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) ¬+1 is replaced by +1. We now
extend the definition of interpretation vectors to include relations between the positive and
negative occurrences of atoms, in order to maintain whenever we have 1 ∈ , +1 ̸∈  and
vice versa.
        </p>
        <p>Definition 4 (Companion Vector). Let   + ⊆  +
 + ⊆  + denote the positive part of  , 
denote the negative part of  , with size |+ | = |+ | =  . Let v ∈ Z be a vector
representing truth assignments for  ∈ + such that v = 1 if  ∈  and v = 0 otherwise.
Define a companion vector w ∈ Z2 representing an interpretation + ⊆  + as follows: w =
[v ; 1 − v ].</p>
        <p>1In [5], the program matrix encodes the completion of  +.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Gradient Descent For Computing Supported Models</title>
      <p>In this section, we first show how the evaluation of the  operator can be carried out in
vector spaces. Then we introduce our loss function and its gradient, and finally we introduce
an algorithm for finding supported models in vector spaces.
4.1. Computing the  operator in Vector Spaces
Sakama et al. [2] showed that the  operator can be computed in vector spaces using a
thresholding function, which they called  -thresholding. Here we introduce a variant of 
thresholding to accommodate the new program encoding method as well as the diferentiability
requirement. First we initialize the parameter vector t.</p>
      <p>
        Definition 5 (Parameter Vector t). A set of parameters to the  -thresholding is represented by a
column vector t ∈ Z such that for each element t(1 ≤  ≤  ) in t,
• t = |()| if the rule  ∈  + is a conjunctive rule, e.g. (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), (
        <xref ref-type="bibr" rid="ref4">4</xref>
        );
• t = 1 if the rule  ∈  + is a disjunctive rule e.g. (
        <xref ref-type="bibr" rid="ref2">2</xref>
        );
• t = 0, otherwise.
      </p>
      <p>In some previous works [2, 13], the information about the nature of the rules was also stored
in the program matrix Q alongside the atom occurrences; conjunctive rules with |()| &gt; 1
had fractional values Q = 1/|()| and disjunctive bodies had integer values Q = 1.
Here we follow the approach taken by Sato et al. [5], and only store information about the
occurrences of atoms in  + and keep supplementary information in the parameter vector t to
recover the original program.</p>
      <p>Definition 6 (Parameterized  -thresholding). Let w ∈ Z2 be a companion vector representing
+ ⊆  + . Given a parameter vector t ∈ Z , a program matrix Q ∈ Z× 2 , and a vector
y = Qw where y ∈ Z , we apply the thresholding function element-wise as follows:
 t(y) =</p>
      <p>
        0
{︃min(max(0, y − (t − 1)), 1) (t ≥ 1)
(t &lt; 1)
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
      </p>
      <p>
        This thresholding function resembles ℎℎ which is an activation function developed
for use in natural language processing [14]. In the original ℎℎ function, the range of the
linear region is [− 1, 1] (Figure 1(a)), but here we define the linear region between [t − 1, t]
(Figure 1(b)). This function is almost everywhere diferentiable except at y = t − 1 and
y = t. The special case t &lt; 1 in Equation (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) corresponds to the case t = 0 where the head
does not appear in the program  + and is assumed to be  .
      </p>
      <p>Intuitively, for the head of a conjunctive rule to be  it is suficient to check whether all
literals in the body hold, otherwise the rule evaluates to  . Similarly for a disjunctive rule
it is suficient to check whether at least one of the literals in the body holds for the head to hold.
This is formalized in Proposition 4.
(a) ℎℎ</p>
      <p>(b) Parameterized thresholding with t = 1
Proposition 4 (Thresholded  Operator). Let  + be the positive form of a normal program
 and Q ∈ Z× 2 its matrix representation. Let v ∈ Z be a subset of an interpretation vector
representing  ∈  ⊆ + if  = 1 for {1, . . . ,  }. Let w ∈ Z2 be a companion vector
to v. Then z = [u; 1 − u] ∈ Z2 is a vector representing  =  () satisfying the condition
( ∈  if  ̸∈  ), if u =  t(Qw).</p>
      <p>Proof. Consider u =  t(Qw). For u = (u1, . . . , u )⊺, by the definition of the thresholding
function, u = 1 (1 ≤  ≤  ) if u′ ≥ t in u′ = Qw. Take a row slice Q:, then u′ =
Q:w = Q1w1 + · · · + Q2 w2 , and u = 1 if u′ ≥ t. Both Q: and w are 0-1 vectors,
then it follows that there are at least t elements where Q = w = 1 (1 ≤  ≤ 2 ). The first
 elements of w represent  ∈  ⊆ + if w = 1, and if  ∈  then  ̸∈  ⊆ +
which is maintained through the definition of the companion vector w. 1) For a conjunctive
rule  ← 1 ∧ · · · ∧  (1 ≤  ≤ 2 ), {1, . . . , 2 } ∈  implies  ∈  (). 2) For an
OR-rule  ← 1 ∨ · · · ∨  (1 ≤  ≤ 2 ), {1, . . . , 2 } ⊆  implies  ∈  ().  ∈ 
is represented by z = 1 (1 ≤  ≤ 2 ). Then by putting  = {var(z)|z = 1},  =  ()
holds.</p>
      <p>Consider  =  (). For v = (v1, . . . , v )⊺ representing  ⊆ + , w = (v1, . . . , v , 1−
v1, . . . , 1 − v )⊺ is a vector representing  ⊆  + if we set  = {var(w)|w = 1}. u′ = Qw
is a vector such that u′ ≥ t (1 ≤  ≤  ) if var(u′) ∈  (). Define u = (u1, . . . , u )⊺
such that u = 1 (1 ≤  ≤  ) if u′ ≥ t in Qw, and u = 0 otherwise. Define an
interpretation  ⊆  + such that it can be partitioned into subsets of positive and negative
occurrences of atoms (  ∪   ) =  ⊆  + . Since only positive atoms occur in the head, u
represents a positive subset of interpretation   ⊆  ⊆  + by setting   = { var(u)|u =
1} (1 ≤  ≤  ). If  ∈  () then u = 1, and  ̸∈  () is represented by 1 − u = 0.
Conversely, if  ̸∈  () then u = 0, and 1 − u = 1 represents  ∈  (). Thus 1 − u
represents   ⊆  ⊆  + . z = [u; 1 − u] is then a vector representing   ∪   =  if we
set  = {var(z)|z = 1} (1 ≤  ≤ 2 ). Thus z = [u; 1 − u] represents  =  () if
u =  t(Qw).</p>
      <p>
        Example 1. Consider the following program:
 ← 
This program has one supported (stable) model: {}. We have  = {, , } and  + =
{, , , , , } and the matrix representation Q and parameter vector t are:
Suppose an assignment v{} = (0 0 1)⊺ is given. Then the companion vector w is:
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
(
        <xref ref-type="bibr" rid="ref10">10</xref>
        )
Compute the matrix multiplication product Qw and apply the parameterized thresholding:
Let z be a companion vector to u, i.e. z = [u; 1 − u], then we have
u =  t(Qw) =  t((︀ 0 1 1︀) ⊺) = (︀ 0 0 1︀) ⊺ = v{}
Define  = {var(z)|z = 1}, then we have  = {, , }, and  ∩  = {}.
Proposition 5 (Supported Model Computation with Thresholded  ). Let v ∈ Z be a
01 vector representing a subset of interpretation  ⊆  ⊆  + , and z = [v; 1 − v] be its
companion vector representing  ⊆  + satisfying ( ∈  if  ̸∈ ). Given a program matrix
Q representing a program  + and a thresholding function  t parameterized by a vector t, the
ifxed points of  + are represented by 0-1 binary vectors z  = [v  ; 1 − v  ] ∈ Z2 where
v  =  t(Qz  ). Then z  are vectors representing models  + of  + satisfying ( ∈  + if
 ̸∈  +) if  t(Qz  ) = v  . When such 0-1 binary vector z  exists,  + ∩  =  is a
supported model of  .
      </p>
      <p>Proof. Let  ⊆  + be a model of  +, represented by z  . Consider two cases (i)  () = 
and (ii) v  =  t(Qz  ). In both cases, by Propositions 2, 3 and 4, if a supported model of 
exists, the results hold.</p>
      <p>
        Example 2. Consider Program (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ) in Example 1, and interpretation vector z (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ) representing
an interpretation  . We show that  = {, , } is indeed a model of the positive form, and
corresponds to the supported model of  . Construct a vector v{} and companion vector w as
in (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ). By comparing (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) and (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ), we see that w = z and this is consistent with the fixed point
definition  =  ( ).  ∩  results in {} which is the supported model of  .
      </p>
      <sec id="sec-4-1">
        <title>4.2. Loss Function for Computing Supported Models</title>
        <p>t(Q[v ; 1 −
By the fixed point definition of supported models, a supported model
 satisfies
v =
v ]). We now use this definition to design a loss function which can be
minimized using gradient descent. Gradient descent is a method for minimizing an objective
function (loss function), by updating the parameters in the opposite direction of the gradient of
the objective function with respect to the parameters. The size of the update is determined by
the gradient and the step size  . Intuitively this algorithm follows the direction of the slope of
the loss surface until it reaches the bottom of a valley (local minimum).
w = [u; 1 −
 ∈  ⊆</p>
        <p>Let u be an ( × 1) vector representing an assignment to the heads of the rules in  +. Define
u] as the (2 × 1) interpretation vector that stores assignments to each atom
facts in the program  +. This vector will be used later during the minimization step to ensure
 + . We define an</p>
        <p>( × 1) vector f which stores information about occurrences of
that facts are not forgotten.</p>
        <p>Definition 7 (Fact Vector f ). The set of facts in the program  + is represented by a column
vector f ∈ Z such that for each element f(1 ≤  ≤  ) in f ,
• f = 0 otherwise.</p>
        <p>
          • f = 1 if the rule  is a fact:  ←
Definition 8 (Loss Function). Given a program matrix Q, a candidate vector x, thresholding
function  t, and constants  1 and  2, define the loss function as follows:
(x) =
21 (︁ ‖ t(Q[x; 1 − x]) − x‖2F +  1‖x ⊙ (x − 1 )‖2F +  2‖f − (x ⊙ f )‖2F
︁)
(
          <xref ref-type="bibr" rid="ref11">11</xref>
          )
where ‖x‖F denotes the Frobenius norm and ⊙ element-wise product.
        </p>
        <p>The first term is derived directly from the fixed point definition of supported models, and
should be 0 if x is a supported model of  +. The second term, which resembles a regularization
term often used in the machine learning literature, is added to penalize candidate vectors x that
contain fractional values, and is 0 if and only if x is a 0-1 vector. The third term will be 0 if and
only if the facts are preserved, and will be positive non-zero if any part of the assignment is
lost, i.e. by assigning 0 ( ) to a fact where f = 1.</p>
        <p>We introduce submatrices of Q, Q and Q that correspond to the positive bodies and
negative bodies of the matrix, respectively, such that Q = [Q Q] (horizontal concatenation
of submatrices). Both Q and Q have the shape ( ×  ).</p>
        <p>Definition 9 (Gradient of the Loss Function). The gradient of the loss function with respect to x
is given by:
(x)
x
=
︂(
(Q −</p>
        <p>Q) ·  t(Qzx) ⊙
 t(Qzx) )︂
x</p>
        <p>−  t(Qzx − x)
+  1(x ⊙ (1 − x) ⊙ (1 − 2x)) +  2(x ⊙ f
− f )
where zx ∈ R2 = [x; 1 − x] and
 t(w) =
x
{︃1 if (t ≥ 1) and (t − 1) ≤</p>
        <p>
          w ≤ t
0 otherwise
(
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
(
          <xref ref-type="bibr" rid="ref13">13</xref>
          )
        </p>
        <p>
          We can update x iteratively using, for example, gradient descent or quasi-Newton’s method,
to reduce the loss to 0. Here we shown an example of update rule for gradient descent. Let  be
the step size, then the gradient descent update rule is given by:
(
          <xref ref-type="bibr" rid="ref14">14</xref>
          )
Using this update rule we can design an algorithm to find supported models, as shown in
Algorithm 1.
        </p>
        <p>The convergence characteristics of the gradient descent algorithm are well-known [15].
Assuming at least one 0-1 vector representing a supported model exists for Q, all we require for
Algorithm 1 to converge to the supported model is that the initial vector xinit to be in the region
surrounding the supported model where the slope points towards the model. When there are
multiple supported models (correspondingly multiple local minima), we expect the algorithm to
converge to diferent models depending on the choice of initial vector xinit. However, it is often
not known apriori which particular values or regions of xinit lead to which models. Therefore
we implement a uniform sampling strategy, where we sample uniformly from [0, 1], and make
no assumptions with regard to the existence of supported models for Q.</p>
        <p>Depending on the program, an optimal 0-1 vector interpretation may not exist, so we limit the
number of iterations to max_iter before assuming non-convergence. With gradient descent, it
is often time consuming to reduce the loss function completely to zero. We therefore implement
a "peeking at a solution" heuristic, similar to the one presented in [7], where while updating
x we round x a to 0-1 vector to see whether the resulting x is a solution (Lines 7-11). The
output is sensitive to the choice of initial vector xinit, and a poor choice of xinit may result
in non-convergence to optimal solutions. We alleviate this dependency on the initial vector
by introducing the max_retry parameter and changing the initial vector on each try. This
algorithm declares failure to find any supported models (returns FALSE) when both max_retry
and max_iter are exhausted.</p>
        <p>An initial implementation of the proposed method is made using Python 3.7. Our experimental
environment is a Linux container limited to 8 virtual CPU cores and 16GB RAM hosted on a
desktop with Ubuntu 18.04, Intel Core i9-9900K 3.6GHz and 64GB RAM.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>In this section, we first compare the performance of our method with the one presented
by Aspis et al. [10] in terms of success rate of finding correct supported models. Then we
qualitatively show the efect of the rounding heuristic by plotting the trajectories of the candidate
interpretation vector.</p>
      <sec id="sec-5-1">
        <title>5.1. Comparing Success Rates with Aspis et al.</title>
        <p>Aspis et al. [10] encode the program reduct into a matrix and employ the Netwon’s method for
root finding to find fixed points of the program. The gradient is calculated by a Jacobian matrix,
and the thresholding operation is carried out with a parameterized sigmoid. They present two
types of sampling methods for setting the initial vector; uniform sampling, similarly to our
1,  &gt;
0, step size  &gt;
if (loss ≤  ) then</p>
        <p>x
x ←
x ←
return x
gradient ←
method, where the values are sampled uniformly from [0, 1], and semantic sampling2, where the
values are sampled uniformly from [︀ 0,  ⊥]︀
vector that is semantically equivalent to an interpretation.</p>
        <p>∪  ⊤, 1]︀ . Semantic sampling results in an initial</p>
        <p>︀[</p>
        <p>
          Firstly we consider the “N -negative loops” task, which involves programs in the following
form: for 1 ≤  ≤  ,
 ←
 ←
not 
not 
(
          <xref ref-type="bibr" rid="ref15">15</xref>
          )
For our algorithm, we disabled automatic retry with max_retry = 1, so that the result depends
solely on a single application of the algorithm. Other parameters were set as follows: max_iter =
103,  = 10− 4,  1 =  2 = 1,
        </p>
        <p>
          = 10− 1. For Aspis et al.’s algorithm, we used the following
settings: max_iter = 103,  = 10− 4,  = 0.5,  = 0.087.3 We generated programs of the form
Program (
          <xref ref-type="bibr" rid="ref15">15</xref>
          ) with  up to 50, and then applied the algorithms exactly once for 1000 times. We
measured the rate of success of converging to supported models. Figure 2(a) shows our method
with uniform initialization outperforms their uniform sampling method, although improvement
over their semantic sampling method is marginal.
        </p>
        <p>Secondly we consider the “choose 1 out of  ” task, where the task is choose exactly 1 out of
2 ⊥ is an upper bound on false values that variables can take, and  ⊤ is a lower bound on true values.
3We used  = 0.087 which was their best performing setting resulting in 89% success and near 100% success
for uniform and semantic sampling, respectively, for  = 1 loop.</p>
        <p>Ours
(1-try)
Ours
(n-try)
Aspis+
semantic
Aspis+
uniform
8
(16)
1.0
 options. The programs in this class have the following form:
1 ←
2 ←
not 2, not 3, , .., not 
not 1, not 3, , .., not 
.
.</p>
        <p>.
 ←</p>
        <p>not 1, not 2, , .., not − 1
We generated programs for  between 2 and 9, and applied the algorithms using the same
parameters as the " -negative loops" task, and repeated the process for 1000 times for each  .4
For our algorithm, we consider the following scenarios: when the algorithm is allowed (i) only
1 try (max_retry = 1, 1-try in Fig. 2(b)), and (ii) retry up to 1000 times (max_retry = 1000,
n-try in Fig. 2(b)).</p>
        <p>From Figure 2(b), we observe that the success rate of our algorithm (1-try) starts to drop
rapidly past  = 5, and approaches the same level of success rate (10-20% range) as Aspis et al.
by  = 7. We can allow multiple retries to help the algorithm find the supported models, as
indicated by the high success rate of the n-try case, however this means that potentially the run
time will be significantly longer compared to max_retry = 1. In this example, there is a global
constraint that exactly one of {1, ...,  } is chosen, but neither our method nor Aspis et al.’s
method can capitalize on this information. Handling of global constraints is left for future work.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Efect of the Rounding Heuristic on the Trajectories</title>
        <p>
          Here we qualitatively demonstrate the efect of the rounding heuristic (Lines 7-11 in Algorithm
1) by plotting the trajectories of the interpretation vector. We consider a simple negative loop
program, i.e., program (
          <xref ref-type="bibr" rid="ref15">15</xref>
          ) with  = 1, where we have exactly two supported models {} and
{}. The contours in the following figures represent the value of the loss function at each
4For each program whose maximum rule length is  − 1, we estimate the upper bound of  according to their
method, and subtract 10− 3 from this value and use it as  . This ensures that  is smaller than the upper bound.
(, )-coordinate. The dots and arrows represent the trajectories of the algorithm starting from
various initial vectors. Note that the lowest points on this landscape are at (, ) = (
          <xref ref-type="bibr" rid="ref1">1, 0</xref>
          ) ({})
and (, ) = (
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ) ({}).
1.0
        </p>
        <p>From Figure 3(a) we observe that once the algorithm reaches the bottom of the valley, the
update becomes smaller and the convergence is slow. From Figure 3(b) we see that the rounding
heuristic (peeking at solution) avoids the bottom of the valley entirely and after several iterations
the algorithm jumps straight to the 0-1 interpretation, which makes it converge faster.</p>
        <p>In both cases we see the algorithm converging to the middle point { = 0.5,  = 0.5} when
starting from initial vectors with ( = ) Currently we do not break this symmetry in the
algorithm, for example, by adding a very small noise to the vector during update, and instead
use a retry function which resets the initial vectors to a random value on restart.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this work we presented a new method for computing supported models of a normal logic
program, using a matrix representation of the program and a gradient-based search algorithm.
We defined a loss function based on the implementation of the  -operator by matrix-vector
multiplication with a suitable thresholding function. We incorporated penalty terms into the
loss function to avoid undesirable results. We showed the results of several experiments where
our method showed improvements over an existing method in terms of success rates of finding
correct supported models of normal logic programs.</p>
      <p>Raw computational performance is not the focus of this paper and we leave thorough
benchmarking for future work. Our method does not compute stable models except cases where
supported and stable models coincide. For computing stable models, we may consider using our
method as a preprocessor to the method by [4], where our method can be used to find supported
models as promising candidates, which can then be refined for better performance.
1.0</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <article-title>Embedding tarskian semantics in vector spaces, in: The Workshops of the the Thirty-</article-title>
          <source>First AAAI Conference on Artificial Intelligence, Saturday, February 4-9</source>
          ,
          <year>2017</year>
          , San Francisco, California, USA, volume WS-17
          <source>of AAAI Workshops</source>
          , AAAI Press,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sakama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Inoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <article-title>Linear Algebraic Characterization of Logic Programs</article-title>
          , in: G. Li,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          , M. Blumenstein (Eds.),
          <string-name>
            <surname>Knowledge</surname>
            <given-names>Science</given-names>
          </string-name>
          ,
          <source>Engineering and Management, Lecture Notes in Computer Science</source>
          , Springer International Publishing, Cham,
          <year>2017</year>
          , pp.
          <fpage>520</fpage>
          -
          <lpage>533</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -63558-3_
          <fpage>44</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Aspis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Broda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <article-title>Tensor-based abduction in horn propositional programs</article-title>
          ,
          <source>in: ILP 2018 - 28th International Conference on Inductive Logic Programming</source>
          ,
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>68</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T. Q.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Inoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sakama</surname>
          </string-name>
          ,
          <article-title>Enhancing linear algebraic computation of logic programs using sparse representation</article-title>
          , in: F.
          <string-name>
            <surname>Ricca</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Russo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Greco</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Leone</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Artikis</surname>
            , G. Friedrich,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Fodor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kimmig</surname>
            ,
            <given-names>F. A.</given-names>
          </string-name>
          <string-name>
            <surname>Lisi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maratea</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mileo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Riguzzi</surname>
          </string-name>
          (Eds.),
          <source>Proceedings 36th International Conference on Logic Programming (Technical Communications)</source>
          ,
          <source>ICLP Technical Communications</source>
          <year>2020</year>
          ,
          <article-title>(Technical Communications) UNICAL, Rende (CS), Italy</article-title>
          ,
          <fpage>18</fpage>
          -24th
          <source>September</source>
          <year>2020</year>
          , volume
          <volume>325</volume>
          <source>of EPTCS</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>192</fpage>
          -
          <lpage>205</lpage>
          . doi:
          <volume>10</volume>
          .4204/EPTCS.325.24. arXiv:
          <year>2009</year>
          .10247.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sakama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Inoue</surname>
          </string-name>
          , From 3
          <article-title>-valued Semantics to Supported Model Computation for Logic Programs in Vector Spaces</article-title>
          ,
          <source>in: 12th International Conference on Agents and Artificial Intelligence</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>758</fpage>
          -
          <lpage>765</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Inoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sakama</surname>
          </string-name>
          , Abducing Relations in Continuous Spaces,
          <source>in: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence</source>
          ,
          <source>International Joint Conferences on Artificial Intelligence Organization</source>
          , Stockholm, Sweden,
          <year>2018</year>
          , pp.
          <fpage>1956</fpage>
          -
          <lpage>1962</lpage>
          . doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2018</year>
          /270.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kojima</surname>
          </string-name>
          ,
          <article-title>Logical Inference as Cost Minimization in Vector Spaces</article-title>
          , in: A.
          <string-name>
            <surname>El Fallah Seghrouchni</surname>
          </string-name>
          , D. Sarne (Eds.),
          <source>Artificial Intelligence. IJCAI 2019 International Workshops, Lecture Notes in Computer Science</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>239</fpage>
          -
          <lpage>255</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -56150-5_
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <article-title>A linear algebraic approach to Datalog evaluation</article-title>
          ,
          <source>Theory and Practice of Logic Programming</source>
          <volume>17</volume>
          (
          <year>2017</year>
          )
          <fpage>244</fpage>
          -
          <lpage>265</lpage>
          . doi:doi:10.1017/S1471068417000023.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Blair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dushin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Jakel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Rivera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sezgin</surname>
          </string-name>
          ,
          <article-title>Continuous Models of Computation for Logic Programs: Importing Continuous Mathematics into Logic Programming's Algorithmic Foundations</article-title>
          , in: K. R.
          <string-name>
            <surname>Apt</surname>
            ,
            <given-names>V. W.</given-names>
          </string-name>
          <string-name>
            <surname>Marek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Truszczynski</surname>
          </string-name>
          , D. S. Warren (Eds.),
          <source>The Logic Programming Paradigm: A</source>
          25
          <string-name>
            <surname>-Year</surname>
            <given-names>Perspective</given-names>
          </string-name>
          ,
          <source>Artificial Intelligence</source>
          , Springer, Berlin, Heidelberg,
          <year>1999</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>255</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -60085-2_
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Aspis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Broda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lobo</surname>
          </string-name>
          ,
          <article-title>Stable and Supported Semantics in Continuous Vector Spaces</article-title>
          ,
          <source>in: Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>68</lpage>
          . doi:
          <volume>10</volume>
          .24963/kr.2020/7.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Marek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Subrahmanian</surname>
          </string-name>
          ,
          <article-title>The relationship between stable, supported, default and autoepistemic semantics for general logic programs</article-title>
          ,
          <source>Theoretical Computer Science</source>
          <volume>103</volume>
          (
          <year>1992</year>
          )
          <fpage>365</fpage>
          -
          <lpage>386</lpage>
          . doi:
          <volume>10</volume>
          .1016/
          <fpage>0304</fpage>
          -
          <lpage>3975</lpage>
          (
          <issue>92</issue>
          )
          <fpage>90019</fpage>
          -C.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Apt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Blair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <article-title>Towards a theory of declarative knowledge, in: Foundations of Deductive Databases and Logic Programming</article-title>
          , Elsevier,
          <year>1988</year>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H. D.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sakama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Inoue</surname>
          </string-name>
          ,
          <article-title>An eficient reasoning method on logic programming using partial evaluation in vector spaces</article-title>
          ,
          <source>Journal of Logic and Computation</source>
          <volume>31</volume>
          (
          <year>2021</year>
          )
          <fpage>1298</fpage>
          -
          <lpage>1316</lpage>
          . doi:
          <volume>10</volume>
          .1093/logcom/exab010.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Collobert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karlen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kuksa</surname>
          </string-name>
          ,
          <string-name>
            <surname>Natural Language</surname>
          </string-name>
          <article-title>Processing (Almost) from Scratch</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2493</fpage>
          -
          <lpage>2537</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Boyd</surname>
          </string-name>
          , L. Vandenberghe, Convex Optimization, Cambridge University Press, USA,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>