<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Symbolic Reinforcement Learning Framework with Incremental Learning of Rule-based Policy</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kinjal Basu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elmer Salazar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huaduo Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joaquín Arias</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Parth Padalkar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gopal Gupta</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CETINIA, Universidad Rey Juan Carlos</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Texas at Dallas</institution>
          ,
          <addr-line>Richardson</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In AI research, Relational Reinforcement Learning (RRL) is a vastly discussed domain that combines reinforcement learning with relational learning or inductive learning. One of the key challenges of inductive learning through rewards and action is to learn the relations incrementally. In other words, how an agent can closely mimic the human learning process. Where we, humans, start with a very naive belief about a concept and gradually update it over time to a more concrete hypothesis. In this paper, we address this challenge and show that an automatic theory revision component can be developed eficiently that can update the existing hypothesis based on the rewards the agent collects by applying it. We present a symbolic reinforcement learning framework with the automatic theory revision component for incremental learning. This theory revision component would not be possible to build without the help of a goal-directed execution engine of answer set programming (ASP) - s(CASP). The current work has demonstrated a proof of concept about the RL framework and we are still working on it.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Logic Programming</kwd>
        <kwd>Reinforcement Learning</kwd>
        <kwd>Incremental Learning</kwd>
        <kwd>Goal Directed Execution (GDE)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        One of the key goals of AI is to teach machines how to mimic human learning techniques.
Learning through examples is one of those techniques that human employs in their day-to-day
life. We often generate beliefs by seeing a very small amount of examples and, gradually, when
we encounter more examples, we try to reason using our existing beliefs. If we succeed, our
beliefs get stronger, however, if we fail, we update our beliefs to accommodate the new example.
For instance, let’s say a child, who lives in a tropical area, holds a glass of hot water. Holding
a glass of hot water in hot weather increases the child’s discomfort. With this experience,
the child quickly learns to not hold a glass of hot water in hot weather. However, when the
child experiences very cold weather, then the same hot water glass may feel comforting to
hold. He/she then may update the prior belief to “do not hold a glass of hot water unless
the surrounding weather is cold”. This action reward-based learning technique of humans
closely relates to Reinforcement Learning (RL) in the realm of Machine Learning. In RL, while
exploring an environment, an agent gets positive/negative rewards based on its actions. From
these rewards, the agent may learn a policy to take better actions in its operating environment.
Symbolic policy is one of the types of policies that are learned as a set of logic rules that can
be applied next time by the agent if the same situation is encountered. A way of learning
these policies is Inductive Logic Programming (ILP) which uses the environment state and
agent’s action along with the rewards received from the environment to learn the symbolic rules
inductively. Then, with the help of a reasoner/solver, the agent applies the learned policies to
another state in the environment and collects rewards. Again, after receiving multiple rewards,
the agent employs ILP to learn a new set of policies. In this way, the cycle of learning policies
and applying them goes on, and the agent’s understanding about the environment improves
with more experience. Recent works [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] using this idea show really good performance on
text-based games. A key issue here is about the ILP algorithms. Most of the algorithms can
learn new rules but are not capable of updating the existing rules. Like a human, we expect an
RL agent to learn the policies incrementally by updating the existing beliefs and learning new
policies.
      </p>
      <p>In this paper, we introduce a symbolic reinforcement learning framework that can perform
inductive learning using ILP to learn new rules as well as use incremental learning to update the
existing rules (policy). We believe our framework closely mimics the human learning mechanism
where we can learn new concepts in parallel with updating the existing beliefs. Note also that
humans perform non-monotonic reasoning with the help of default rules and exceptions. In
other words, we quickly come to a conclusion by seeing only a few sets of examples and later
correct our beliefs after encountering an exceptional scenario. As we can easily model these
defaults and exceptions using Answer Set Programming (ASP), a logic programming paradigm,
we heavily rely on ASP for symbolic policy representation. By representing the policy in ASP,
an agent can also reason using an ASP solver to get a possible action given a state. The novelty
of our work is the incremental learning by revising the existing hypothesis and that is only
possible if we use a goal-directed implementation of ASP solver—s(CASP). With the s(CASP)
system, we can get a proof tree of any reasoning task it performs. This tree can be analyzed and
used to update the rules, in a manner very similar to humans. We discuss more details about
our approach in the later sections.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. Answer Set Programming</title>
        <sec id="sec-2-1-1">
          <title>An answer set program is a collection of rules of the form</title>
          <p>0 ← 1, ... , ,  +1, ... ,  .</p>
          <p>
            Classical logic denotes each  is a literal [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]. In an ASP rule, the left hand side is called the
head and the right-hand side is the body. Constraints are ASP rules without head, whereas
facts are without body. The variables start with an uppercase letter, while the predicates and
the constants begin with a lowercase. We will follow this convention throughout the paper.
The semantics of ASP is based on the stable model semantics of logic programming [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. ASP
supports negation as failure [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ], allowing it to elegantly model common sense reasoning, default
rules with exceptions, etc.
2.2. s(CASP)
s(CASP) [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] is a query-driven, goal-directed implementation of ASP that includes constraint
solving over reals. Goal-directed execution of s(CASP) is indispensable for automating
commonsense reasoning, as traditional grounding and SAT-solver based implementations of ASP may
not be scalable. There are three major advantages of using the s(CASP) system: (i) s(CASP) does
not ground the program, which makes our framework scalable, (ii) it only explores the parts of
the knowledge base that are needed to answer a query, and (iii) it provides natural language
justification (proof tree) for an answer [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. The key component of the paper - automatic theory
revision has been developed by exploiting the justification from s(CASP). So, the s(CASP) is
indispensable for our work.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. ILP: FOLD Family of Algorithms</title>
        <p>Inductive Logic Programming (ILP) [? ] is a sub-field of machine learning that learns models in
the form of logic programming clauses comprehensible to humans. This problem is formally
defined as:
Given
1. A background theory , in the form of an extended logic program, i.e., clauses of the
form ℎ ← 1, ..., ,  +1, ...,  , where ℎ, 1, ...,  are positive literals and not
denotes negation-as-failure (NAF) as described in [? ]. For reasons of eficiency, we restrict
 to be stratified [ ? ].
2. Two disjoint sets of ground target predicates +, − known as positive and negative
examples, respectively.
3. A hypothesis language of function free predicates , and a refinement operator  under
 -subsumption [? ] (for more details see [? ]). The hypothesis language  is also assumed
to be stratified.</p>
        <sec id="sec-2-2-1">
          <title>Find a set of clauses  such that:</title>
          <p>1. ∀ ∈ +,  ∪  |= .
2. ∀ ∈ − ,  ∪  ̸|= .
3.  ∧  is consistent.</p>
          <p>The target predicate is the predicate whose definition we want to learn as a stratified normal
logic program. The positive and negative examples are grounded target predicates, i.e., suppose
we want to learn the concept of which creatures can fly , then we will give positive examples
+ = {fly(tweety), fly(sam), . . . } and negative examples − = {fly(kitty),
fly(polly), . . . }, where tweety, sam, . . . , are names of creatures that can fly, and kitty,
polly, . . . , are names of creatures that cannot fly.</p>
          <p>
            The FOIL algorithm by [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] is a popular top-down inductive logic programming algorithm for
classification. The FOLD algorithm by [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] is a novel top-down algorithm inspired by FOIL that
learns default rules along with exceptions that closely model human thinking. The FOLD-R++
algorithm by Wang and Gupta is a new scalable ILP algorithm that builds upon the FOLD
algorithm to deal with the eficiency and scalability issues of the FOIL and FOLD algorithms.
It can deal with mixed type (numerical and categorical) data and generate much simpler rule
sets compared to its predecessors. The FOLD-R++ algorithm by [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] is also competitive in
performance with the widely-used XGBoost and Multi-Layer Perceptron(MLP) algorithm. The
FOLD-RM algorithm by [9] is built upon FOLD-R++ to deal with multi-class classification tasks
while keeping all the features.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. RL Framework</title>
      <p>In this section we present our Symbolic Reinforcement Learning framework that can learn the
hypothesis incrementally. Figure 1 illustrates our framework showing how an agent learns the
rules by interacting with the environment. In the diagram, State () represents an instance
of the environment given a time stamp ‘t’, Action () means the agent’s act at time ‘t’, and
Reward () shows the award the agent achieves from the environment at time ‘t’ by taking
the action . With iterative interaction between the environment and agent, the agent learns
the world rules and performs better next time. In the given framework, the rule learning is
divided into two categories - (i) learning new rules, and (ii) updating the existing rules. The
Action Generator component utilize the power of s(CASP) engine to generate the actions based
on a action strategy. The agent’s actions can be of two types - (i) actions that helps the agent to
explore the environment, and (ii) exploit the learned rules to perform better in the environment.
In the framework, the exploration actions are symbolized as , where ‘R’ stands for random.
Using random action generator we can built a very naive explorer, however, we can train a
neural agent to do the exploration very eficiently.  are the actions that exploits the learned
hypothesis (here ‘S’ stands for symbolic). Based on these two types of actions we have two
diferent rule learning strategies. Using the &lt; , , &gt;, the agent learns new policy by
employing the FOLD family of ILP algorithms. Additionally, when we apply the learned rules
to generate an action ( ) and not get the reward that was expected, we update the existing
rules by revising them and that is done inside the Theory Revision component. Incorporating
this automatic Theory Revision component (discussed in the next Section) is the main novelty of
the paper.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Finding Defeaters</title>
      <p>As discussed earlier, we used s(CASP), an ASP solver, to build our ‘Action Generator’ module.
The s(CASP) system is a top-down, goal-directed system. This means that for each successful
query, it finds a proof. If we expected this query to fail (for example, because the reward for the
previously chosen action using the existing hypothesis is negative), then we can try to figure
out how to change the rules such that the query will fail instead of succeeding. This change
essentially defeats the argument that led to the query’s success and is called a defeater. Now, we
give an example to elaborate the problem statement followed by the solution that can perform
automatic theory revision [10].</p>
      <sec id="sec-4-1">
        <title>4.1. An Example</title>
        <p>Consider a house that has sensors installed to protect it from fires and floods. To protect from
ifre, a fire sensor is installed that will automatically turn on water sprinklers (also installed in the
house) if fire is detected. Likewise, a water leak detection sensor in the house will automatically
turn of water supply, if no one is present in the house and water is sensed on the floor/carpet.
The following answer set programs models these rules:
fireDetected :- fire.
turnSprinklerOn :- fireDetected.
sprinklerOn :- turnSprinklerOn.
water :- sprinklerOn.
sprinklerOff :- waterSupplyOff.
waterSupplyOff :- turnWaterSupplyOff.
turnWaterSupplyOff :- houseEmpty, waterLeakDetected.
waterLeakDetected :- water.
houseFloods :- water, not waterSupplyOff.
houseBurns :- fireDetected, SprinklerOff.</p>
        <p>houseSafe :- not houseFloods, not houseBurns.</p>
        <p>The program is self-explanatory. It models fluents ( sprinklerOn, sprinklerOff,
waterLeakDetected, fireDetected, fire, water, houseEmpty) and actuators
(turnWaterSupplyOff, turnSprinklerOn). For simplicity, time is not considered, though
it must be taken into account if we want to model the system faithfully. This is because there
will always be a time lag as actuators are activated and fluents change in response to them.
Note that ‘fire’ means fire broke out in the house and ‘water’ means that a water leak occurred
in the house.</p>
        <p>Given the theory above, if we add the fact fire. to it, we will find that the property
houseBurns defined above will succeed. This is because occurrence of fire eventually leads to
sprinklers being turned on, which causes water to spill on the floor, which, in turn, causes the
lfood protection system to turn on and turn of the water supply. We want houseBurns to fail.
To ensure that it fails, we have to recognize that water supply should indeed be turned of due
to a water leak in an empty house unless fire is present:
turnWaterSupplyOff :- houseEmpty, waterLeakDetected,</p>
        <p>not fireDetected.</p>
        <p>So, a simple patch to the theory shown above will ensure that houseBurns fails in all
situations. By adding not fireDetected we are subtracting knowledge preventing houseBurns from
succeeding. Likewise, note that the house can be flooded if water leaks (“water." is added as a
fact) and people are present in the house. Analysis of the proof tells us the ofending conjuct is
(water, not houseEmpty). Since neither water nor not houseEmpty can be forced to be
false, we will have to detect such a situation and raise an alarm. So here the idea will be to sound
an alarm that will alert people present in the house. Thus, in situations where a successful proof
cannot be falsified by altering a theory, we may want to identify critical components of the
proof (via Craig interpolation perhaps) and identify their conjunction as a distinct condition.</p>
        <p>If we want our house to be safe against theft as well, then we need to augment the theory
with rules that will automatically lock down the house if the sensor indicates that no one is
present in the house. Note that to solve the theory revision problem we should be able to reason
with success as well as failure of proofs. Answer set programming provides a good mechanism
for doing so, as it allows actions to be taken if a proof fails within the theory itself.</p>
        <p>We would like to detect revisions to a theory automatically. We outline methods to do so
below.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Automatic Theory Revision</title>
        <p>The “system” we want to check needs to first be encoded as a s(CASP) program. Currently, only
propositional programs are supported. Once the “system” is encoded, we must identify state
knowledge that the “system” does not control. This knowledge will become abducibles. An
abducible is a proposition for which we have the choice of making it true or false, as needed.
Finally, any requirements or constraints we wish to enforce need to be encoded.</p>
        <p>Figure 2 shows the ASP program of the example discussed above. We can run the program
with s(CASP), querying what we want to fail. In this example, that would be “?- burndown.”.
We will use s(CASP)’s “--tree” option to get the proof tree, and look for propositions with
rules that we can change. In this example those propositions are “turn_sprinkler_on” and
“turn_water_off”.</p>
        <p>Once the subtrees we can use are identified, we look for a proposition (or its negation) that
is false in the model associated with the tree, provided by s(CASP). If the subtree or any of
it’s ancestors are not dependent on that proposition, it becomes a suggested defeater. So, for
our example (figure 3 presents the s(CASP) proof tree of the program) we have the following
possible defeaters:</p>
        <p>– not fire
• For rule: turn_water_off :- leak, not home.
• For rule: turn_sprinkler_on :- fire.</p>
        <p>– not water_off OR
– not leak OR
– home</p>
        <p>Each suggestion, when used to modify the program, will ensure that the proof cannot succeed.
This makes no guarantee the query will not still succeed, nor that the changes will make sense
according to our interpretation of the program. To combat the former case, we run the above
algorithm for every tree the query causes. Each of these trees represent a diferent way the
query can succeed. The generated suggestions are grouped together with their proof tree and
model and used to generate a knowledgebase to be used with s(CASP). This knowledgebase can
then be combined with some common sense and domain specific knowledge to reason about the
“best” defeater. Once we have the “best” defeaters, we can modify the program and try again.
After all, the change itself may introduce a new way for the query to succeed.</p>
        <p>The second case, of suggestions that do not make sense, can be easily encountered. In the
example above, we can ensure this proof fails by ensuring turn_sprinkler_on fails. This is
counterproductive. We know that by not turning on the sprinklers when there is a fire, the
house will always burn down – regardless of the rest of the state. To make a more intelligent
choice, the possible defeaters, along with the associated model and the original program, are
combined, as data, with another s(CASP) program. The purpose of this program is to filter out
the irrelevant defeaters. For the above example, we may add a rule that does not keep the second
set of defeaters. There are three parts to this stage. First, is the defeaters, models, and program
as stated above. These are presented as data in the program to be analysed and processes.
Secondly, a driver program that contains non-domain specific knowledge that contains the
following query: ?- suggest(A). As of this writing, the driver program provides a default
implementation for suggest that simply provides each defeater, as is. The third part is the
domain specific logic. This program is defined for the specific system and defines suggest/1.
For the above example, we can provide the following implementation for suggest/1:
suggest(A) :- defeaters(_,Defs), find_suggestions(Defs,A).
find_suggestions([H|T], H) :- H=defeater(proof(neg,_,_),_).
find_suggestions([H|T], H) :- H=defeater(proof(pos,Pred,_),_),</p>
        <p>Pred\=turn_sprinkler_on.
find_suggestions([_|T], S) :- find_suggestions(T,S).</p>
        <p>The first rule defines suggest/1. The call to defeaters/2, provided when generating
the knowledgebase, gets the list of defeaters for a proof tree. The find_suggestions/2 call
loops through these defeaters binding A to a valid defeater. Each binding of A is a possible
way of making the proof fail. The other three rules define what we think are valid suggestions.
The third rule is a simple recursive case that allows us to traverse the list of defeaters. The
decision is being made by the first two rules. The first rule for find_suggestions/2 allows
suggestions for duals (the negation of a proposition). The second rule considers positive literals
(propositions without negation). However, It only accepts such defeaters if the proposition
is not turn_sprinkler_on. Since we know that turn_sprinkler_on can only be true if
there is a fire, by disallowing it from being made false we are enforcing the constraint “The
sprinkler must turn on when there is a fire". Using this definition of suggest/1, the second set
of defeaters (from the output given above) will not be given. This gives us only the suggestion:
• For rule: turn_water_off :- leak, not home.</p>
        <p>– not fire
This answer makes the most sense according to our interpretation of the code.</p>
        <p>The above example illustrates how the system behaves when a proposition needs to be made
false. However, it is possible that instead of a falsifying a proposition, we will want to falsify its
dual.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. An Example</title>
      <p>Next, we provide an example to elaborate the process of rule learning by an agent and then
revising them through more experience. Let us assume, an RL agent is trying to learn the
concept of “which animal can fly? ”. Figure 4 shows a set of examples about diferent animals
including feature details and the rules learned by the FOLD algorithm.</p>
      <p>After learning a set of rules, the agent applies them in the environment and discovers that
Charlie the ostrich is a type of bird that cannot fly and this is not covered by the rules. Now,
our theory revision module will correct the existing hypothesis by learning a new exception
(shown in figure 5).</p>
      <p>While this is a simple example, it illustrates our logic-based symbolic reinforcement learning
framework. The goal here is to emulate humans who can learn quite efectively from a small
amount of data by forming an initial hypothesis and then correcting it over time as more
instances are encountered.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future Work and Conclusion</title>
      <p>With the current proof-of-concept, we believe, we are one step closer to building a fully
explainable symbolic reinforcement learning framework that can generate the relations (that the agent
learns by interacting with the environment) in a human understandable way in first-order logic.
As discussed above, the current implementation of our theory revision efort is in propositional
logic, and our next task is to generalize it to first-order logic. This should allow us to perform
end-to-end testing on diferent state-of-the-art reinforcement learning datasets. We believe, our
work will not only perform well in terms of accuracy but will also be explainable.
[9] H. WANG, F. SHAKERIN, G. GUPTA, FOLD-RM: A scalable, eficient, and explainable
inductive learning algorithm for multi-category classification of mixed data, Theory and
Practice of Logic Programming (2022) 1–20. doi:10.1017/S1471068422000205.
[10] E. Salazar, Theory revision with goal-directed asp., in: ICLP Workshops, 2021.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Basu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murugesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzeni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kapanipathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Talamadupula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Klinger</surname>
          </string-name>
          , M. Campbell,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sachan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>A hybrid neuro-symbolic approach for text-based games using inductive logic programming</article-title>
          ,
          <source>in: Combining Learning and Reasoning: Programming Languages, Formalisms, and Representations</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gelfond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <article-title>Knowledge representation, reasoning, and the design of intelligent agents: The answer-set programming approach</article-title>
          , Cambridge University Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gelfond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lifschitz</surname>
          </string-name>
          ,
          <article-title>The stable model semantics for logic programming</article-title>
          .,
          <source>in: ICLP/SLP</source>
          , volume
          <volume>88</volume>
          ,
          <year>1988</year>
          , pp.
          <fpage>1070</fpage>
          -
          <lpage>1080</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Arias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Salazar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Marple</surname>
          </string-name>
          , G. Gupta,
          <article-title>Constraint answer set programming without grounding</article-title>
          ,
          <source>TPLP</source>
          <volume>18</volume>
          (
          <year>2018</year>
          )
          <fpage>337</fpage>
          -
          <lpage>354</lpage>
          . doi:
          <volume>10</volume>
          .1017/S1471068418000285.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Arias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Gupta,
          <article-title>Justifications for goal-directed constraint answer set programming</article-title>
          ,
          <source>arXiv preprint arXiv:2009</source>
          .
          <volume>10238</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Quinlan</surname>
          </string-name>
          ,
          <article-title>Learning logical definitions from relations</article-title>
          ,
          <source>Machine Learning</source>
          <volume>5</volume>
          (
          <year>1990</year>
          )
          <fpage>239</fpage>
          -
          <lpage>266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Shakerin</surname>
          </string-name>
          , E. Salazar,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>A new algorithm to automate inductive learning of default theories</article-title>
          ,
          <source>TPLP</source>
          <volume>17</volume>
          (
          <year>2017</year>
          )
          <fpage>1010</fpage>
          -
          <lpage>1026</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , G. Gupta, FOLD-R+
          <article-title>+: A scalable toolset for automated inductive learning of default theories from mixed data</article-title>
          ,
          <source>in: Functional and Logic Programming</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>224</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>