<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A comparison of exploration strategies used in reinforcement learning for building an intelligent tutoring system</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jezuina Koroveshi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ana Ktona</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Tirana, Faculty of Natural Sciences</institution>
          ,
          <addr-line>Tirana</addr-line>
          ,
          <country country="AL">Albania</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>Reinforcement learning is a form of machine learning where an intelligent agent learns to make decisions by interacting with some environment. The agent may have no prior knowledge of the environment and discovers it through interaction. For every action that the agent takes, the environment gives a reward signal that is used to measure how good or bad that action was. In this way, the agent learns which are more favorable actions to take in every state of the environment. There are different approaches to solve a reinforcement learning problem, but one drawback that arises during this process is the tradeoff between exploration and exploitation. In this work we focus on studying different exploration strategies and compare their effect in the performance of an intelligent tutoring system that is modeled as a reinforcement learning problem. An intelligent tutoring system is a system that helps in the process of teaching and learning by adapting to student needs and behaving differently for each student. We train this system using reinforcement learning and different exploration strategies and compare the performance of training and testing to find which is the best strategy.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Reinforcement learning</kwd>
        <kwd>exploration strategies</kwd>
        <kwd>intelligent tutoring system</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Intelligent educational systems are systems</title>
        <p>
          that apply techniques from the field of
Artificial Intelligence to provide better support
for the users of the system [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Web-based
Adaptive and Intelligent Educational Systems
provide intelligence and student adaptability,
inheriting properties from Intelligent Tutoring
Systems (ITS) and Adaptive Hypermedia
Systems (AHS) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] defines an Intelligent
Tutoring System (ITS) as computer-aided
instructional system with models of
instructional content that specify what to
teach, and teaching strategies that specify how
to teach.
        </p>
        <p>Traditional tutoring systems use the
one-tomany way of presenting the learning materials
to the students.</p>
        <p>_________________________________
Proccedings of RTA-CSIT 2021, May 2021, Tirana, Albania
EMAIL: jezuina.koroveshi@fshn.edu.al (A.1);
ana.ktona@fshn.edu.al (A.2);</p>
        <p>2021 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).</p>
        <p>
          CEUR Workshop Proceedings (CEUR-WS.org)
In this approach every student is given the
same materials to learn regardless of his/her
needs and preferences. These systems are not
well suited for all students because they may
come from different backgrounds, may have
different learning styles and do not absorb the
lessons with the same peace. An intelligent
tutoring system customizes the learning
experience that the student perceives by taking
into consideration factors such as pre-existing
knowledge, learning style and student
progress. According to [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] an intelligent
tutoring system usually has the following
modules: the student module that manages all
the information related to the student during
the learning process; the domain module that
contains all the information related to the
knowledge to teach, such as topics, tasks,
relation between them, difficulty.; the
pedagogical module, also called tutor module
that decides what, how and when to teach the
learning materials.; the graphical user interface
module that facilitates the communication
between the system and the student. Different
techniques from artificial intelligence can be
applied in order to make these systems more
“intelligent”, but our study is focused
use of reinforcement learning (RL).
        </p>
        <p>
          Reinforcement learning is a form of
machine learning that is based on learning
from experience. The learner is exposed to
some environment, for which he may or may
not have information, starts making decisions
and gets some feedback that gives information
telling how good or bad that decision was.
Based on the feedback from the environment
the learner learns which decisions are more
favorable to take. This class of machine
learning has been used in modeling and
building intelligent tutoring systems such as in
the works from [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>The remainder of this paper is organized as
follows: in section 2 we give an overview on
reinforcement learning, in section 3 we
describe the model that we have used to build
an intelligent tutoring system, in section 4 we
give the experimental results of training the
model using different exploration strategies
and is section 5 we give the conclusions of our
work.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Reinforcement learning</title>
      <p>
        Reinforcement learning is a form of
machine learning in which the learner learns
some sequence of actions by interacting with
the environment. The learner is in a state of the
environment, takes some action that moves it
from that state to another and after each action
the environment gives a reward signal. This
reward signal is used to learn which are the
best states to be in, and therefore learn which
action to take in order to go in those states. A
reinforcement learning problem can be
modeled as a Markov Decision Process
(MDP). A MDP is a stochastic process that
satisfies the Markov Property. In a finite MDP,
the set of states, actions and rewards have a
finite number of elements. Formally, a finite
MDP can be defined as a tuple M = (S, A, P,
R, γ), where:
 S is the set of states: S
sn).
 A is a set of actions: A
an).
 γ ∈ [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ] is the discount factor and is
used to control the weight of the future
= (a1, a2,
      </p>
      <sec id="sec-2-1">
        <title>One challenge of reinforcement learning is</title>
        <p>the tradeoff between exploration and
exploitation [11]. As given by [11]: “To obtain
a lot of reward, a reinforcement learning agent
must prefer actions that it has tried in the past
and found to be effective in producing reward.</p>
        <p>But to discover such actions, it has to try
actions that it has not selected before. The
agent has to exploit what it has already
= (s1, s2, ex…pe,rienced in order to obtain reward, but it</p>
        <p>also has to explore in order to make better
…, action selections in the future”. There are
different strategies that can be used to handle
this problem:
on the
reward in comparison to immediate
rewards.
 P defines the probability of transitions
from s to s’ when taking action a in state s:</p>
        <p>Pss’ = Pr{st+1 = sts=’ s|, at = a}
 R defines the reward function for each
of the transitions, the reward we get if we
take action a in state s and end up in state
s’: Rss’ = E{rt+1 | st = s, at = a, st+1 = s’}
The goal of the agent is to maximize the total
reward it receives. The agent should maximize the
total cumulative reward it receives in the long run,
not just the immediate reward [11]. The expected
discounted reward is defined as follows by [11]:
Gt = Rt+1 + γ t+R2 + 2γRt+3 +
… =</p>
        <p>γk Rt+k+1
The sequence of states that end up in a
terminal state is called an episode. The general
process of RL may be defined as follows:
1. At each time step t, the agent is in a
state s(t).
2. The agent choses one of the possible
actions in this state, a(t), and applies that
action.
3. After applying the action, the agent
transitions in a new state s(t+1) and gets a
numerical reward r(t) from the
environment.
4. If the new state is not terminal, the
agent repeats the step 2, otherwise the
episode is finished.
2.1 Exploration
dilemma
and
exploitation
1. Random policy: during the training concepts, student knowledge and how they are
process the agent always chooses related to each other. The student starts
random actions. This means that it learning the course material. The system gives
always explores and does not exploit the student a lesson that teaches some
what it has already learned. concepts. Depending on the student ability to
2. Greedy policy: during the training learn, he/she may learn these concepts or not.
process the agent always chooses the If the student does not learn all the concepts
given by the current lesson, the system cannot
action that gives the best reward. In give him/her a new lesson. So, the system
this way, it is always exploiting the should make sure that the student has absorbed
knowledge that has gained and uses it all the material given by the current lesson
to choose the action that gives the best before giving the next one. We propose the use
reward. of reinforcement learning to train the
3. Epsilon-greedy: this method balances pedagogical module that based on student
the tradeoff between exploration and knowledge and the concepts that are taught by
exploitation. With probability epsilon each lesson to decide what lesson to give
ε it chooses a random action, and withhim/her. The system will start by giving the
probability 1- ε it chooses the best first lesson, and then following the student
action. The epsilon value decreases progress will give every other lesson until the
with time reducing exploration and end of the course. To model this as a
reinforcement learning problem, we need to
increasing exploitation in order to define the set of states, actions and rewards. In
make use of the knowledge is gained. [15] we have given a definition of those
4. Boltzmann (soft-max) exploration: elements that create a framework for doing the
one problem of the epsilon-greedy training using reinforcement learning
method is that the exploration action is approach. One problem that arises when
selected uniform randomly from the dealing with reinforcement learning is the fact
set of actions. This means that it is that in order to do the training, it is required a
equally likely to choose the worst relatively large number of iterations and data.
appearing action and the second-best This cannot be achieved using real students,
appearing action. The Boltzmann because the process would be very long. In
exploration uses the Boltzmann [15] we have proposed the use of a simulated
distribution [12] to assign a student that can be used during the training
probability to the actions Pt(a): process. The student has some ability to learn
which is given in the form of a learning
probability, and this defines his/her ability to
learn every concept that is taught by the
lessons of the course.</p>
      </sec>
      <sec id="sec-2-2">
        <title>T is a temperature parameter. When T=0 the agent does not explore at all, and when T → ∞ the actions randomly.</title>
        <p>agent selects4. Experimental results</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed model</title>
      <p>The model that we propose focuses on the
pedagogical module of the intelligent tutoring
system. This is a system for teaching lessons
of Python programming language based on
concepts and student knowledge. The learning
material is composed of lessons. Every lesson
teaches some concepts and may require some
previous concepts to be known by the student.
In [14] we give a definition of lessons,
We have done the training in a simulated
environment by simulating the behavior of the
student. For every episode the student starts
with knowing random concepts, and the
system tries to learn what is the next lesson to
give. We have used the DQN algorithm as
given by [13], using memory replay and target
network. Figure 1 gives the architecture of the
target and train networks.
The hyper parameters used during the training
are given in the Figure 2.
The training is done using different
exploration strategies for the same number of
episodes. For each of the strategies we give the
total reward received for every episode during
the training process in figures 3, 4, 5, 6.
After we performed the training, we have
tested the performance of each of the
models learned by using them in
simulations, for 100 episodes with a student
that knows random concepts and learning
probability the same as the one used during
the training process. For each of the tests,
we show the total reward received and the
length for each episode of the training in
figures 7 to 14.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>In this work we have compared the
performance of different exploration strategies
used in training an intelligent tutoring system
using reinforcement learning. We took into
consideration 4 strategies: random, greedy,
epsilon-greedy and Boltzmann (soft-max). For
each of the strategies used, we have considered
the reward gained for every episode during the
training and testing, to evaluate which one
performed better. We saw that during the
training phase, random and greedy strategies
performed worse.</p>
      <p>The reward was negative for every episode,
which means that they chose the worst action
for most of the time. For the random policy
this means that it always explores and never
exploits the knowledge. For the greedy policy
this means that it always tries to exploit its
knowledge, but it never explores for new
actions that may be more profitable. On the
other hand, the epsilon-greedy and Boltzmann
strategies performed best during the training
phase, with Boltzmann strategy getting slightly
higher rewards. These strategies use a
combination of exploration and exploitation,
which makes them perform better.</p>
      <p>During the testing phase we see that
greedy policy performs worse than every other
policy. This shows that the system has not
learned anything during the training phase.
Random and epsilon-greedy policies
performed well during the testing phase with
almost the same reward gained. Even though
random policy performed poorly during the
training phase, it did quite well during testing,
meaning that the high level of exploration
learned some good actions. The Boltzmann
policy was the best during the testing phase,
getting the highest reward values. This shows
that this policy learned better which are the
best actions to take. Also, comparing the
episode length during the testing phase,
Boltzmann strategy has the shortest episode
lengths. This shows that it finishes each
episode without reaching the episode length
limit, meaning that it finishes the episode
faster because it takes the right actions.</p>
    </sec>
    <sec id="sec-5">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Brusilovsky</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Peylo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Adaptive and Intelligent Web-based Educational Systems</article-title>
          . Inter-national
          <source>Journal of Artificial Intelligence in Education (IJAIED)</source>
          ,
          <volume>13</volume>
          , pp.
          <fpage>159</fpage>
          -
          <lpage>172</lpage>
          . {hal00197315}
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Iglesias</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>An Experience Applying Reinforcement Learning in a Web-Based Adaptive and Intelligent Educational System</article-title>
          .
          <source>Informatics in Education</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ),
          <fpage>223</fpage>
          -
          <lpage>240</lpage>
          . https://doi.org/10.15388/infedu.
          <year>2003</year>
          .17
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Wenger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>1987</year>
          ).
          <source>Artificial Intelligence and Tutoring Systems</source>
          . Morgan Kaufman
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Burns</surname>
            ,
            <given-names>H. L.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Capps</surname>
            ,
            <given-names>C. G.</given-names>
          </string-name>
          (
          <year>1988</year>
          )
          <article-title>Foundations of intelligent tutoring systems: an introduction</article-title>
          .
          <source>In Foundations of Intelligent Tutoring</source>
          Systems (eds
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Polson</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>J. J. Richardson</surname>
          </string-name>
          <article-title>)</article-title>
          .
          <source>Lawrence Erlbaum</source>
          , London, pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Malpani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravindran</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Murthy</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Personalized Intelligent Tutoring System using Reinforcement Learning</article-title>
          .
          <source>In Florida Artificial Intelligence Research</source>
          Society Conference. Retrieved from https://aaai.org/ocs/index.php/FLAIRS /FLAIRS11/paper/view/2597/3105
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>K. N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Arroyo</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>AgentX: Using Reinforcement Learning to Improve the Effectiveness of Intelligent Tutoring Systems</article-title>
          .
          <source>Intelligent Tutoring Systems</source>
          ,
          <volume>564</volume>
          -
          <fpage>572</fpage>
          . https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -30139-4_
          <fpage>53</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Nasir</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Fellus</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Pitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>SPEAKY Project: Adaptive Tutoring System based on Reinforcement Learning for Driving Exercizes and Analysis in ASD Children</article-title>
          . ICDL-EpiRob Workshop on “Understanding Developmental Disorders:
          <article-title>From Computational Models to Assistive Technologies"</article-title>
          . Tokyo, Japan. ⟨
          <fpage>hal</fpage>
          -01976660⟩
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Sarma</surname>
            ,
            <given-names>B. H. S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ravindran</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Intelligent Tutoring Systems using Reinforcement Learning to teach Autistic Students. Home Informatics and Telematics: ICT for The Next Billion</article-title>
          ,
          <volume>241</volume>
          ,
          <fpage>65</fpage>
          -
          <lpage>78</lpage>
          . https://doi.org/10.1007/978-0-
          <fpage>387</fpage>
          - 73697-
          <issue>6</issue>
          _
          <fpage>5</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Shawky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Badawi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>A Reinforcement Learning-Based Adaptive Learning System</article-title>
          .
          <source>The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018)</source>
          ,
          <fpage>221</fpage>
          -
          <lpage>231</lpage>
          . https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -74690-6_
          <fpage>22</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Reinforcement Learning in a POMDP Based Intelligent Tutoring System for Optimizing Teaching Strategies</article-title>
          .
          <source>International Journal of Information and Education Technology</source>
          ,
          <volume>8</volume>
          (
          <issue>8</issue>
          ),
          <fpage>553</fpage>
          -
          <lpage>558</lpage>
          . Sutton,
          <string-name>
            <given-names>R. S.</given-names>
            and
            <surname>Barto</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          (
          <year>2018</year>
          )
          <article-title>Reinforcement Learning: An Introduction (2nd Edition, in preparation)</article-title>
          . MIT Press. Barto,
          <string-name>
            <given-names>A. G.</given-names>
            ,
            <surname>Bradtke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            , and
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. P.</surname>
          </string-name>
          , (
          <year>1991</year>
          )
          <article-title>Real-time learning and control using asynchronous dynamic programming</article-title>
          . University of Massachusetts at Amherst, Department of Computer and Information Science. Mnih,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Rusu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            ,
            <surname>Veness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Bellemare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            ,
            <surname>Graves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Riedmiller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Fidjeland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            ,
            <surname>Ostrovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Beattie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Sadik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Antonoglou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Kumaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Wierstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Legg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            , &amp;
            <surname>Hassabis</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Human-level control through deep reinforcement learning</article-title>
          .
          <source>Nature</source>
          ,
          <volume>518</volume>
          (
          <issue>7540</issue>
          ),
          <fpage>529</fpage>
          -
          <lpage>533</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          https://doi.org/10.1038/nature14236 Koroveshi,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Ktona</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          (
          <year>2020</year>
          ).
          <article-title>MODELLING AN INTELLIGENT TUTORING SYSTEM USING REINFORCEMENT LEARNING</article-title>
          .
          <source>Knowledge International Journal</source>
          ,
          <volume>43</volume>
          (
          <issue>3</issue>
          ),
          <fpage>483</fpage>
          -
          <lpage>487</lpage>
          . Retrieved from https://ikm.mk/ojs/index.php/KIJ/articl e/view/4745 Koroveshi,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ,
          <source>Ana Ktona.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          (
          <year>2021</year>
          ).
          <article-title>Training an Intelligent Tutoring System Using Reinforcement Learning</article-title>
          .
          <source>International Journal of Computer Science An Information Technolgy</source>
          ,
          <volume>19</volume>
          (
          <issue>3</issue>
          ),
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          , http://doi.org/10.5281/zenodo.466145 5
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>