<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Delhi, India
EMAIL: mwadea@purdue.edu (M. Haliem); vaneet@purdue.edu
(V. Aggarwal); bbshail@purdue.edu (B.Bhargava)
ORCID:</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Novelty Detection and Adaptation: A Domain Agnostic Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marina Haliem</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vaneet Aggarwal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bharat Bhargava</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Purdue University</institution>
          ,
          <addr-line>West Lafayette, IN</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Novelties are surprises that a system encounters. System must learn about the characteristics, detect, understand, and adapt to novelty in not only the environment but in agents that interact with it. The context, timing, duration, extent, duration of novelty must be considered in agent's adaption and accommodation. This research contributes towards building AI/ML system that can adapt to fluid novelties in open world. Many real-world problems are stochastic and encounter sudden novelties, which results in highly dynamic environments. Therefore, a robust framework is needed to identify various novelties that can occur, recognize the changes in the underlying environment, and adapt policies to maximize the long-term cumulative rewards. To achieve this, we propose ideas to adopt a change point detection algorithm to detect the changes in the distribution of experiences, and to develop an agent that is capable of recognizing novelties and making informed decisions according to the changes in the underlying environment. These ideas can be adapted in various domains by tuning the agent's objective function, where it will still capture the changes in the corresponding underlying environment. This research contributes to the SAIL-ON effort [Ted Senator 2019].</p>
      </abstract>
      <kwd-group>
        <kwd>1 Novelties</kwd>
        <kwd>Novelty Generation</kwd>
        <kwd>Decision-Making</kwd>
        <kwd>Change Point Detection</kwd>
        <kwd>Dirichlet Processes</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Novelties occur in many systems and
environments and agents must learn about them
and accommodate them. We list a few
examples to provide understanding and a basis
for research ideas.</p>
      <p>• A car going on a steep hill in dark, rain.</p>
      <p>The car is not on main road. Main road
on flat terrain, good weather will be
normal. Steep hill, darkness, rain/snow
and road with weak soil, vegetation
will be novelties.
• A person from USA driving in India.</p>
      <p>Many novelties occur: no stop and
yield signs, left hand drive, mix of
traffic vehicles (bicycle, rickshaw,
horse/bullock/oxen driven carts,
scooters, three-wheeler, along with
trucks, buses) and narrow single lane
roads and unpaved roads. How can the
drivers (or driver) visiting from USA
train, learn, and adjust to drive safely in
India.</p>
      <p>A cheating or sudden change in the
rules of playing games such as chess,
basketball, monopoly while game is
being played. In addition, a novelty
may be that the objective of a player
changes: instead of winning or losing,
objective becomes a tie decision
An attack, malicious activities and
threats cyber or otherwise. How can a
child or older person deal with
novelties of pickpockets, scoundrels,
thieves, purse snatchers, etc. The
objective may become a survival of a
person. How can a system continue to
•
operate in unknown adverse conditions
and situations such as collaborative
attacks in cyberspace?
A man walking a cat or rhinoceros or
hippopotamus (walking a dog, elephant
or horse is not a novelty)
We consider the scenario of dynamic
environments, where a novelty occurs that
alters the dynamics of a system and its model
and transforms itself to incorporate the novelty.
The environment changes between the
prenovelty model and the post-novelty model
dynamically as shown in Figure 1 below.</p>
      <p>The implication of the non-stationary
environment [Kaplanis et al., 2019] is as
follows. When the agent exercises a control 
at time , the next state " as well as the
reward  are functions of the active
environment model dynamics. We assume the
knowledge that the environment switches from
a pre-novelty model to a post-novelty model
due to an unexpected change in the world state.
However, neither the context information of
each model nor the change points when the
change occurs, are known to the agent. In the
Open World, environments are characterized
by their high dynamicity where novelties can
occur and alter the representation of the world
, and thus the state space S and the action
space A. We assume the environment is
partially observable by our agent, so the agent
has the knowledge of the state
representation  ∈ , that is part of the
surrounding world . Different types of
novelties impose different levels of difficulty
when it comes to the ability to detect and adapt
to these changes as well as the time consumed
until detection and adaptation. We investigate
the different types of novelties, and discuss
approaches that allow the agent to detect and
adapt to these changes.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Decision-making memory with replay</title>
      <p>In decision-making, the task is for the
agent, at each time step t, to select an action
 ∈ () based on the current state of the
environment  ∈  where  is the state
representation of the environment that is
observable by our agent, and () is the finite
set of possible actions in state . The agent
selects the action that maximizes its objective
function at each time step. After an action is
executed, the agent receives a reward , and
state of the environment is updated to ". The
transitions of the form (, , , ") are
stored in a cyclic buffer, known as the “replay
buffer” [Lin, 1992]. This buffer enables the
agent to randomly sample from and train on
prior observations.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Novelty Types</title>
      <p>The main components that affect the
decision-making process of an agent are the
State Space, Action Space and the transition
probabilities that it learns in order to reach the
optimal policy. Novelties can be categorized to
deal with ethe following:
1. State space changes: novelties that
alter the environment representation
will require the state space to be
expanded so it can accommodate these
changes. For example, a new state in
the environment that is different from
every state in the agent’s experience
memory.
2. Action space changes: novelties in the
dynamic interactions or context may
lead to a different action space, that will
be modified and fed back to the agent.
3. No state/action set change, transition
probability changes: novelties that</p>
      <p>change the set of rules that govern the
environment dynamics, for instance:
getting a 6 on dice is giving additional
turn rather than stopping there, or
getting a 1 on the dice moves 3 steps
rather than 1.</p>
      <p>No state/action set change, Reward
function changes: Goal-related
novelties will require a re-design of the
reward function to reflect the new
objectives of the system. For example,
forcing a draw in a game is equivalent
to a win.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Model-Free</title>
      <p>Adaptation</p>
    </sec>
    <sec id="sec-5">
      <title>Detection and</title>
      <p>•</p>
      <p>Some novel events that occur in the
environment may lead to combinations of
the types of novelties mentioned above, for
example: in a game of Monopoly - adding
credit score to player profiles, impacting
ability to set up real estate on properties or
adding a risk attribute to hotels and
properties affecting rent collecting
potential. To handle such dynamicity, a
model-free approach is proposed where the
agent learns the dynamics of the
environment through real-time interactions
with the environment rather than having a
rigid pre-defined model fed to the agent.
According to the consequences of the
occurrence of a novelty, the agent might
take longer/shorter time to detect the
change and adjust to it which result in
higher/lower difficulty level. To achieve
this goal, the agent collects experience
tuples while simultaneously following a
model-free learning algorithm to learn an
approximately optimal policy. Instead of
assuming any specific structure, the
modelfree approach allows the agent to
dynamically learn the change. Two of the
approaches that the agent can follow are:
•</p>
      <p>Environment Observation: For
novelties that lead to 1 and 2 types as
discussed earlier, a modification is
needed in the state and action spaces.
This can be done online as the agent
detects the new states/actions; but
might cause a delay in adaption until
the agent learns the new space. The
change can be only from the agent’s
perspective, not a universal change in
the world state , where at time step t,
the agent gets a representation of the
environment that is identified as to be
unknown. This can be caused by a new
state in the environment that is
different from every state in the agent’s
experience memory, or a change in the
set of rules that affect the set of
allowable actions/moves that the agent
can take.</p>
      <p>Change Point Detection: For
novelties that lead to types 3 and 4, the
proposed method works in tandem with
a change point detection algorithm, to
get information about the changes in
the environment [Haliem et al., 2020].
The learning begins by obtaining
experience tuples  according to the
dynamics and reward function of
current active model. The state and
reward obtained are stored as
experience tuples, since model
information is not known. The samples
can be analysed for context changes in
batch mode or online mode. We adopt
the online parametric Dirichlet
changepoint (ODCP) detection
algorithm proposed in [Singh et al.,
2019] to examine the data consisting of
experience tuples. This algorithm
transforms any discrete or continuous
data into compositional data and
utilizes Dirichlet parameter likelihood
testing to detect change points.
Although ODCP requires the
multivariate data to be i.i.d samples
from a distribution. The justification in
[Padakandla et al., 2019] explains the
utilization of ODCP in the Markovian
setting, where the data obtained does
not consist of independent samples.
The full algorithm for the Dirichlet
change point detection algorithm is
shown in Algorithm 1 below, where the
input is the data consisting of
experience tuples  that are stored by
our agent. In this algorithm, the
maximum likelihood estimation of
Dirichlet distribution parameters is
calculated for the cumulative data
stored through experience tuples using
equation 1 below:
!∗ = # log Γ (.</p>
      <p>$)
− .
+ .</p>
      <p>$
$
ℎ 6$</p>
      <p>1
=</p>
      <p>.

$
 Γ ($)
4($ − 1)(log 6$)7,
(!!)</p>
      <p>[1]
!
Then, the log likelihood given distribution $
is calculated using equation 2 below:</p>
      <p>%
∏$)&amp;' Γ($)
Γ(∑$)&amp;' $)
,
ℎ  = |!|   !, $
)
≥ 0,  .</p>
      <p>$&amp;'
= 1
[2]</p>
      <p>Then, at each time step t, that is seen as a
potential change point, we split the data into
two parts (prior and after this time step t), and
we estimate the maximum likelihood as well as
the sum of log likelihood for both partitions
using the equations above. Finally, the
algorithm returns the point in time ∗
associated with the maximum log likelihood to
be the potential change point. If the difference
between this value and the log likelihood of our
unsplit original data turns out to be greater than
our threshold, then we declare that a change has
been detected at time ∗.</p>
      <p>After the agent detects that a change has
occurred, it restarts the decision-making
process accounting for that change. At every
time step , it obtains a representation for the
environment,  and calculates a reward
 associated with each possible action in the
action space according to the dynamics and
reward function of current active model
(whether it is the pre/post novelty model).
Based on this information, the agent takes an
action where the expected discounted future
reward is maximized. One approach for dealing
with novelty can be seen in [Haliem et al., 2020
- 2] where it is applied to a multi-agent
ridesharing system.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>In this paper, we established a theory of
novelties occurring in an Open World setting.
In our method, we propose utilizing a Change
Point Detection algorithm in addition to
environment observation to allow the agent to
detect the change that occurred in the
underlying environment and recognizes what
types of novelties took place. This is a crucial
step that will then allow the agent to adjust
accordingly and start learning and performing
well in this modified environment. Our
approach can be tailored to fit various domains
by designing the objective function of the agent
to reflect the specific goals of this domain. For
example, such setting could be utilized in a
Ride-Sharing system as in [Haliem et al., 2020
- 2] by utilizing the agent’s reward function
proposed by the authors. In [Boult et al., 2020],
authors are developing a unifying framework
for formal theories of novelty and more
information about the thought process to
understand novelties is available in their paper
in AAAI-2021. Terry is a speaker in the
workshop on Novelties in Open World during
the ISIC conference organized by Prof. Bharat
Bhargava on Feb 25, 2021.</p>
      <p>There are many universities and
organizations working on the characterization
of novelties and languages to express them and
novelties hierarchy and evaluation in the
SAILON effort. Some of these ideas will be
discussed in the workshop.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Acknowledgements</title>
      <p>This research is supported, in part, by the
Defense Advanced Research Projects Agency
(DARPA) and 3the Air Force Research
Laboratory (AFRL) under the contract number
W911NF2020003. The views and conclusions
contained herein are those of the authors and
should not be interpreted as necessarily
representing the official policies or
endorsements, either expressed or implied, of
DARPA, AFRL, or the U.S. Government. We
thank our team members on this project for all
the discussions to develop this paper. Some of
the ideas in this paper are based on our learning
from the SAIL-ON meetings.</p>
    </sec>
    <sec id="sec-8">
      <title>7. References</title>
      <p>[Ted Senator, 2019] https://www.darpa.
mil/program/science-of-artificialintelligence-and-learning-for-openworld-novelty</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Kaplanis et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Christos</given-names>
            <surname>Kaplanis</surname>
          </string-name>
          , Murray Shanahan, and
          <string-name>
            <surname>Claudia</surname>
            <given-names>C.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Policy consolidation for continual reinforcement learning</article-title>
          .
          <source>arXiv preprint arXiv:1902</source>
          .
          <volume>00255</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Mnih et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Volodymyr</given-names>
            <surname>Mnih</surname>
          </string-name>
          , Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare,
          <article-title>Alex Graves, Martin Riedmiller</article-title>
          .
          <article-title>Human-level control through deep reinforcement learning</article-title>
          .
          <source>nature</source>
          ,
          <volume>518</volume>
          (
          <issue>7540</issue>
          ):
          <fpage>529</fpage>
          -
          <lpage>533</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Lin</source>
          ,
          <year>1992</year>
          ]
          <string-name>
            <surname>Long-Ji Lin</surname>
          </string-name>
          .
          <article-title>Self-improving reactive agents based on reinforcement learning, planning and teaching</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          - 4):
          <fpage>293</fpage>
          -
          <lpage>321</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Haliem et al.,
          <year>2020</year>
          - 1]
          <string-name>
            <given-names>Marina</given-names>
            <surname>Haliem</surname>
          </string-name>
          , Ganapathy Mani, Vaneet Aggarwal, and
          <string-name>
            <given-names>Bharat</given-names>
            <surname>Bhargava</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>A Distributed Model-Free Ride-Sharing Algorithm with Pricing using Deep Reinforcement Learning</article-title>
          . Computer Science in Cars Symposium. Association for Computing Machinery, New York, NY, USA, Article
          <volume>5</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . DOI:https://doi.org/10.1145/3385958 .3430484
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Haliem et al.,
          <year>2020</year>
          - 2]
          <string-name>
            <given-names>Marina</given-names>
            <surname>Haliem</surname>
          </string-name>
          , Vaneet Aggarwal, and
          <string-name>
            <given-names>Bharat</given-names>
            <surname>Bhargava</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>AdaPool: An Adaptive Model-Free Ride-Sharing Approach for Dispatching using Deep Reinforcement Learning</article-title>
          .
          <source>Proceedings of the 7th ACM International Conference on Systems for Energy-Efficient Buildings</source>
          , Cities, and
          <string-name>
            <surname>Transportation</surname>
          </string-name>
          (BuildSys
          <year>2020</year>
          ).
          <article-title>Association for Computing Machinery</article-title>
          (ACM),
          <fpage>304</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Padakandla et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Sindhu</given-names>
            <surname>Padakandla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Shalabh</given-names>
            <surname>Bhatnagar</surname>
          </string-name>
          , et al.
          <year>2019</year>
          .
          <article-title>Reinforcement learning in nonstationary environments</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>03970</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>[Singh</surname>
          </string-name>
          et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Nitin</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pankaj Dayama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Vinayaka</given-names>
            <surname>Pandit</surname>
          </string-name>
          , et al.
          <year>2019</year>
          .
          <article-title>Change Point Detection for Compositional Multivariate Data</article-title>
          . arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>04935</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Boult et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Boult</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Grabowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Prijatelj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Holder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Alspector</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jafarzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Dhamija</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shrivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vondrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Scheirer</surname>
          </string-name>
          ,
          <article-title>Towards a Unifying Framework for Formal Theories of Novelty</article-title>
          ,
          <source>in Proceedings of The ThirtyFifth AAAIConference on Artificial Intelligence (AAAI-21) February 2- 9</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>