<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Stanford
University, Palo Alto, California, USA, March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Batch-like Online Learning for More Robust Hybrid Artificial Intelligence: Deconstruction as a Machine Learning Process</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Schmid</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lancaster University Leipzig</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universität Leipzig</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>2</volume>
      <fpage>2</fpage>
      <lpage>24</lpage>
      <abstract>
        <p>Continuous streams of data are a common, yet challenging phenomenon of modern information processing. Traditional approaches to adopt machine learning techniques to this setting, like ofline and online learning, have demonstrated several critical drawbacks. In order to avoid known disadvantages of both approaches, we propose to combine their complementary advantages in a novel machine learning process called deconstruction. Similar to supervised and unsupervised learning, this novel process provides a fundamental learning functionality modeled after human learning. This functionality integrates mechanisms for partitioning training data, managing learned knowledge representations and integrating newly acquired knowledge with previously learned knowledge representations. A prerequisite for this concept is that learning data can be partitioned and that resulting knowledge partitions may be accessed by formal means. In the proposed approach, this is achieved by the recently introduced Constructivist Machine Learning framework, which allows to create, exploit and maintain a knowledge base. In this work, we highlight the design concepts for the implementation of such a deconstruction process. In particular, we describe required subprocesses and how they can be combined.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Artificial Intelligence</kwd>
        <kwd>Online Learning</kwd>
        <kwd>Constructivist Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>new data and therefore increases computational costs. Online learning allows for a
straightforward handling of data streams, but may be slow and subject to unintended semantic shifts in
the underlying model. New data, e.g., may induce bias into classification data and may slowly
alter the model. Moreover, the question arises how to deal with data that only partially lead to
good results, e.g, if training is only successful for data from a certain time span.</p>
      <p>
        With such partial efects, overall performance decreases. Traditionally, learning would still
be continued, and performance may even further decrease. But what if splitting the training
data or omitting parts of it would increase prediction quality? In order to assess such
alternative scenarios and avoid performance drawback from continuous online learning, we have
recently introduced the concept of Constructivist Machine Learning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Following both the
philosophical paradigm of constructivism and the technological paradigm of hybrid intelligent
systems, this framework allows to evaluate and automatically adjust training data partitioning
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. As a consequence, semantic building blocks - or Stachowiak-like models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], respectively
- can be identified, related to each other in a hierarchical scheme and updated when necessary.
      </p>
      <p>Here, we focus on the handling of data streams by implementing deconstruction as a
machine learning process. We point out that for semantic updating in an online learning scenario,
metadata is required and present an algorithmic scheme that allows to resolve disambiguity,
generalize models or create abstracted models. This scheme combines supervised and
unsupervised machine learning techniques and exploits temporal and other metadata. As a result, this
algorithmic deconstruction process allows not only to create a hierarchical knowledge base,
but in particular to handle streams of data with respect to temporal validity.</p>
    </sec>
    <sec id="sec-2">
      <title>Principles of Constructivist Machine Learning</title>
      <p>
        According to dominating modern educational concepts, human learning takes place through
construction, reconstruction or deconstruction processes [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Following this paradigm, we have
introduced concepts to implement such learning processes [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. To put all three processes into
practice, a corresponding knowledge base is required that consists of Stachowiak-like models
and employ a data management process in order to organize for eficient learning.
      </p>
      <p>
        Data Management. Assuming a data stream as input, the starting point for Constructivist
Machine Learning is a set or batch of samples possessing pragmatic metadata (Fig. 1). Such
metadata describe timely validity ( ) and validity regarding subjects (Σ) and purposes ( ) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
From this initial batch, subsets are identified and re-grouped into learn blocks. Depending on
whether samples match regarding Σ and  ,  and Σ,  and  or all metadata ( Σ ), Σ -,  Σ-,
  - or completely related learn blocks may be identified. For eficiency, only the largest learn
block of a given batch will be used. Not all forms of relationship, however, are equally suitable
for a model construction. Constructions from   - or  Σ -related learn blocks, e.g., ofer little
added value. Learn blocks of  Σ -related samples that are divergent even represent a source of
serious error. Learn blocks of   -related models allow the generation of new models, which,
however, will represent an intersubjective reconstruction rather than a construction process.
For constructions, learn blocks of Σ -related vector models are therefore preferred. Once a
learn block is selected, it will undergo a sequence of learning processes. After these processes
have terminated, the knowledge base is updated accordingly, which may imply storing a newly
select
      </p>
      <sec id="sec-2-1">
        <title>DRaatwa MDaettaa</title>
        <p>select +
learn</p>
      </sec>
      <sec id="sec-2-2">
        <title>MModLel MDaettaa</title>
        <p>integrate
modify
Data Set or Stream</p>
        <p>Batch</p>
        <p>Knowledge
Representation</p>
        <p>Knowledge Base
reconstructed model as well as modifying or deleting existing models from the knowledge base.
If further batches exist, this sequence of selecting and processing data is repeated.</p>
        <p>Representation Learning. Various combinations of learning processes are possible for a
given learn block. If the knowledge base is still empty and target values are defined for the
learn block, e.g., only a reconstruction process is carried out and the resulting model is stored
in the knowledge base. In an educational context, reconstruction implies in general
application, repetition or imitation, in particular the search for order, patterns or models [9, p. 145].
Similarly, the reconstruction of a machine model is here understood as supervised learning
from given examples. In contrast to classical supervised learning, however, competing models
are generated and evaluated for intersubjective validity. If target values are undefined for the
learn block, such targets are produced in a construction process preceding the reconstruction
process. In an educational context, construction is in general associated with creativity,
innovation and production, and in particular with the search for new variations, combinations or
transfers [9, p. 145]. For machine models, this is interpreted as an unsupvervised learning that
identifies or defines alternative  -dimensional outputs to a set of incomplete vector models.
Thereby, competing model candidates are created that are evaluated in a following
reconstruction process. Rationale behind this is that it is a priori unclear which of the models constructed
from a learn block can be reconstructed with best accuracy and intersubjectivity.</p>
        <p>Knowledge Integration. After successful reconstruction, further mechanisms are applied
to the respective knowledge representation in order to manage integration into the knowledge
base. In particular, a deconstruction process is carried out to avoid redundancies and
contradictions, if related models exist in the knowledge base. In an educational context, deconstruction
in general means the re-assessment of an already existing construct regarding incompleteness,
the unforeseen and the unconscious, and in particular the search for possible omissions,
simpliifcations, additions and criticism [9, p. 145]. In Constructivist Machine Learning,
deconstruction is in particular associated with automated re-training of models and creating abstracted
models, which may result in modifying or discarding models of the knowledge base.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Deconstruction as a Machine Learning Process</title>
      <p>The aim of the deconstruction process is to integrate new and old knowledge representations
in a way that not only avoids ambiguity but also allows to abstract novel knowledge
representations automatically. The key subprocesses of deconstruction are representation modification
and representation generation (Fig. 2), of which at maximum one will be carried out for a
given batch. Prerequisite for these subprocesses to be executed is that an existing knowledge
representation or model, respectively, has been identified from the corresponding knowledge
base that exhibits a pragmatic relationship to a newly reconstructed model. In the event that
two or more related models are identified for a newly reconstructed model, they may either be
deconstructed consecutively or the deconstruction process is aborted as soon as a complete,
Σ ,   or  Σ deconstruction was successful.</p>
      <p>Whether a representation modification or generation is applied to a given batch, depends on
the type of relationship between the new model  and a model from the existing knowledge
base. The initial task of the deconstruction process is therefore to determine this relationship.
If no reationship can be identified, the new model  is integrated unaltered into the knowledge
base. In case of completely and Σ -related models, however, this relationship is assessed by
model re-training, which makes use of the reconstruction process. In case of  Σ-related
models, deconstruction is carried out in terms of a knowledge abstraction procedure, which makes
use of the construction process. The case of   -related models would reflect that models with
the same purpose and same temporal validity but difering subjects have been identified, which
under a fixed intersubjective reconstruction scheme is not possible; therefore, this relationship
is not explicitly handled in the following. If a newly reconstructed model shows a complete
relationship to an existing model from the knowledge base, this may introduce error and
contradiction into the knowledge base. Therefore, this is handled with priority.

</p>
      <p />
      <sec id="sec-3-1">
        <title>Representation Modification.</title>
        <p>With Σ -related models, the aim of deconstruction is to
extend or replace the existing model from the knowledge base. In particular, it is assessed
whether the temporal validity of the existing model can be expanded according to the temporal
validity of the new model. Both models are fused into a new model that is re-trained via the
reconstruction process. If successful, the old model is replaced by the fused model, otherwise
the fused model are discarded. For  Σ -related models, re-training may be initiated by model
fusion as well as by model diferentiation. Model diferentiation means that it is tested whether
the fused model may be split in two or more submodels of more limited temporal validity.
but also falsify the validity of these models. If the model fusion is falsified, the diferentiation of</p>
        <p>In contrast to Σ relationships, deconstruction of  Σ -related models can not only extend
the fused model is executed or, if necessary, one of the contradicting models is discarded. The
disposal of models is carried out according to a user-defined regime, which makes a distinction
between a conservative (
retained, 
discarded) and an integrative (</p>
        <p>discarded,


added to knowledge base) regime. Alternatively, if 
is based on a larger set of vector
is retained and </p>
        <p>discarded. This regime is referred to as opportunistic.
models than 
 , 
is added to the knowledge base and 
is discarded; otherwise,</p>
        <p>Representation Generation. A  Σ relationship provides the basis to construct a new
model on the next higher level of the knowledge base. In this case, both models share a
congruent temporal validity and a common set of model subjectives while difering in their model
purpose. First, the newly reconstructed model is stored to the knowledge base. The old model
from the knowledge base is left unaltered. Using the outputs, or target values respectively, of
the  Σ-related models, a new learn block without target values is formed. This learn block is
assigned a higher level than the underlying models possess in the knowledge base and
transferred to a construction process, from which all further learning processes may be passed.
Thereby, repeated abstraction from a single learn block is possible. Knowledge abstraction
may be limited by a user-defined maximum of knowledge levels.


a) Model Disambiguation
A</p>
        <p>Σ -based deconstruction, or model disambiguation respectively, is the most critical
subprocess of deconstruction. Whenever a newly reconstructed model  is identified to be related
to a model 
ambiguation will eliminate conflicting models and assure that the knowledge base is kept free
 within the knowledge base with respect to  as well as to Σ and  , model
disfrom ambiguity. In order to achieve this,  or 
 may even be disposed from the knowledge
base or from being integrated to it, respectively. The three main operations of the  Σ -based
deconstruction are model fusion, model diferentation and the application of a user-defined
disambiguation policy. Fig. 3 gives an overview of how they are connected with each other.</p>
        <p />
        <p>Model Fusion. With the model fusion, options are sought for merging the underlying data
of the new model  and the  Σ -related </p>
        <p>from the knowledge base. The outcome may
be a fused model that is assessed by re-training, the return of the unaltered  to the general
deconstruction process (with </p>
        <p>
          left untouched) or the necessatity to resolve contradictions
arrows indicate that this deconstruction subprocess employs a reconstruction process as defined by [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>
          between  and  . As a first step, input features are sought that are used by both
 and
 . If two or more such features exist, both models are combined by reducing the underlying
datasets to match the identified feature intersection and concatenating these new samples from
both models to a new, fused model; this fused model will then undergo re-training through a
reconstruction process as described by Schmid [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and the originating models  and 
well as all models of the knowledge base depending on them will be removed. Else, a time-based
intersection is sought by identifying samples from both models with matching  or an
identical timestamp, respectively. If the size of such a time-based intersection does not match the
minimal size requirement to form a new learn block, the new model  is returned untouched
to the general deconstruction process (Fig. 2). If enough samples with matching timestamp
are identified, re-sampling may be applied to form an alternative model. Prerequisite for this,
in good agreement, which is quantified by determining Krippendorf’s
however, is that  of  and  of 
 – for the given timestamps of the matched samples – are
 . In case the
agreement is not suficient, the ambiguity inherent to these two contradicting models is resolved
by a user-defined disambiguation strategy. Otherwise, re-sampling is applied and model
retraining will be carried out by employing a reconstruction process. The originating models 
and 
 as well as all models depending on them will be removed from the knowledge base
 as
before the fused model enters the reconstrucion process.
        </p>
        <p>Disambiguation. By disambiguation, it is decided which of the contradicting models 

and 
ambiguation strategies are applied: conservative, integrative or opportunistic. A conservative
 will be part of the knowledge base. Depending on user preferences, one of three
disstrategy will keep  , which is already part of the knowledge base, and will dispose the new
. An integrative strategy will proceed the other way round and dispose 
(and all models
depending on it hierarchically) from the knowledge base while integrating the new  into

the knowledge base before returning to the general deconstruction process. Using an
opportunistic strategy, the less intersubjective model, i.e. the model yielding a lower value for the
inter-rater reliability coeficient Krippendorf’s
 , will be disposed while the larger model will
be part of the knowledge base when returning to the general deconstruction process (Fig. 2).</p>
      </sec>
      <sec id="sec-3-2">
        <title>Model Diferentiation.</title>
        <p>
          If model re-training by reconstruction fails, the fused model (as
well as  and  ) is regarded falsified. It will, however, not be disposed immediately. Instead,
it is assessed whether it is possible to cluster the underlaying data into two or more temporally
defined sublearnblocks. Following the previously defined ideas of learnblock identification [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
the number of clusters for  is determined via density estimation. Actual clusters are then
identified via one-dimensional clustering, which is known to allow for optimal convergence
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. If two or more learnblocks are identified (n
&gt; 1), re-sampling is applied to form new
model candidates that will undergo re-training by reconstruction. Where only one temporal
cluster is identified, the fused model (or the previously split model, respectively) is disposed;
in particular, it is not returned to the general deconstruction process (Fig. 2).
b) Model Generalization
A Σ -based deconstruction, or model generalization respectively, is a deconstruction
subprocess that aims at extending existing models regarding their temporal validity. If a newly
reconstructed model  is identified to be related to a model

 within the knowledge base with
respect to Σ and  , the model generalization will try to combine both models. In particular,
it is assessed whether they can be fused into a single model with a timespan  covering the
timespans of  and  . The main operations of the Σ -based deconstruction are model
fu
sion, disposal of the fused model and replacement of the old model with the fused model. Fig.
4 gives an overview of how these operations are connected with each other.
        </p>
        <p>Model Fusion. During model fusion, the possibility of merging the underlying data of the
old model </p>
        <p>
          and the new model  along the temporal dimension  is investigated. The
outcomes of this operation may be either a novel, larger model representing more samples (or
a larger timespan, respectively) or the return to the general deconstruction process (Fig. 2) with
 being integrated into the knowledge base and 
left unaltered. Prerequisite for merging
 and 
this intersection should contain more than one feature in order to avoid trivial models. If no
 is that an intersection of the respective input spaces  and   is possible. Moreover,

appropriate feature intersection can be identified,
 and 
base. If the model fusion can be completed, on the other hand, the fused model will be assessed
by employing a reconstruction process [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. At the end of this model fusion, 
is still part of
the knowledge base, while  and the fused model are still kept in competition to each other,

until the reconstruction process is completed and the winner model is selected from  and the
fused model. Depending on the success of the reconstruction process, either  or the fused
 will be part of the knowledge
model will be integrated into the knowledge base.
        </p>
        <p>Disposal of the Fused Model. If the reconstruction of the fused model cannot be
comunaltered and as part of the knowledge base.
pleted succesfully, it will be disposed. In particular, it will not be integrated into the knowledge
base. Instead,  will be integrated into the knowledge base. The old, Σ -related model is left
Replacement of the Old Model. If the reconstruction of the fused model is successful, it
yes
Re-Sampling
(Merging)
Reconstruction∗</p>
        <p>
          Replace
This subprocess is embedded within the general deconstruction process (Fig. 2). Dashed arrows
indicate that this deconstruction subprocess employs a reconstruction process as defined by [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
will be integrated into the knowledge base while  and 
only  , but also all models on higher hierarchy levels that depend on 
 will be disposed.
 will be disposed. In particular, not
c) Model Abstraction
A  Σ-based deconstruction, or model abstraction respectively, is a subprocess of the
deconstruction process aiming at creating novel models of a higher hierarchy order or higher
abstraction level, respectively. If a newly reconstructed model  is identified to be related to a
model 
to create novel models on the next higher level of abstraction. In particular, this new model or
 within the knowledge base with respect to  and Σ, the model abstraction will try
models are fed as model candidates into the general process of Constructivist Machine
Learning [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], which constitutes from interconnected processes of construction, reconstruction and
deconstruction. The main operation of the  Σ-based deconstruction is the creation of a novel
learnblock that subsequently undergoes a construction process as described by [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Fig. 5 gives
an overview of this subprocess and its operations.
        </p>
        <p />
        <p>Learnblock Generation. Combining two  Σ-related models difers significantly from the
model fusion operations for 
structed model  and the existing 
Σ - and Σ -related models. First of all, both the recently
recon</p>
        <p>from the knowledge base are under no circumstances
novel learnblock, the number of underlying data for  and 
altered. 
ing models, a novel learnblock is generated - if the hierarchy level  of  matches the hierarchy
level   of  . As a user-defined minimum number of matching samples is required to form a

 is left within the knowledge base,  is integrated unaltered into it. Instead of
fus with identical timestamps is
determined. If this number of matching samples is suficient, the similarity of the
corresponding outputs of  and 
 for the identified timestamps is assessed by Krippendorf’s
 . If
they yield no significant match, these output features are combined with the timestamps into a
novel learnblock with unknown Σ and unknown  . In particular, Σ and  are filled with
place</p>
        <p>
          yes
 =   ? yes
 ∩   ? yes
subprocess is embedded within the general deconstruction process (Fig. 2). Dashed arrows indicate
that this deconstruction subprocess invokes a cycle of construction, reconstruction and deconstruction
processes (as defined by [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]) for a newly generated learnblock.
holders and treated as regular learnblock in the following. The level of hierarchy, however,
will of course be not  , but  +1 where  and 
 are located on level  .
        </p>
        <p>
          New Learn Sequence. The  Σ-based deconstruction is basically completed successfully
when a new learn block has been generated. Yet, this new learn block will be further processed
in a newly entered general deconstruction process (Fig. 2). In particular, the newly
generated learn block undergoes the sequence of construction, reconstruction and deconstruction
processes described by Schmid [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Processing the new learn block in a new cycle of learning
processes may be implemented as an independent learnblock cycle or in a recursive manner.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>We review advances and challenges of the concept of deconstruction as an alternative to
traditional online learning. In particular, we discuss handling concept drifts and out-of-distribution
efects. Finally, we sketch the relationship of this approach to knowledge engineering.</p>
      <p>
        Concept Drift and Out-of-Distribution Detection. Data for given machine learning task
changing over time may render a model built on old data inconsistent regarding new data. This
phenomenon is known as concept drift and has been known among for decades [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Even more
challenging than sudden concept drifts are so-called gradual drifts that evolve slowly [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. To
this end, the batch-like online learning implemented by Constructivist Machine Learning
provides a more robust approach. In particular, consistency is sought not only within all individual
batches being processed, but also in terms of temporal generalization. If a gradual concept drift
does not allow for a long-term generalization, the approach will keep short-term or even
batchwise knowledge representations only. This approach follows the ideas of classical concepts of
instance selection in order to avoid concept drifts [
        <xref ref-type="bibr" rid="ref11 ref13 ref14">13, 11, 14</xref>
        ]. More recently, the goal to avoid
concept drift in practice has been approached by so-called out-of-distribution detection. The
underlying idea is to assess test data before the actual testing and reject testing if the data does
not match this data distribution of the model. This approach has be widely investigated for
neural networks [
        <xref ref-type="bibr" rid="ref15 ref16 ref17">15, 16, 17</xref>
        ], where many limitations for this have been found, too [18]. Due
to the fact that Constructivist Machine Learning employs metadata to determine feasibility of
testing data, out-of-distribution detection may be omitted.
      </p>
      <p>Automated Knowledge Base Management. In the context of knowledge engineering, a
knowledge representation is typically a mathematical formalization like a logic, rule, frame or
semantic net related to real-world aspects [19]. Organizing such representations in a
knowledge base is a central, yet time-consuming task in knowledge engineering. Managing
knowledge bases may be described by typical life cycle phases. Following Martinez-Gil (2015), a
creation phase is characterized by acquisition, representation, storage and manipulation of
knowledge, while an exploitation phase focuses on knowledge reasoning, retrieval and
sharing; the maintenance phase is concerned with integration, validation and meta-modeling of
knowledge. Most work on operating knowledge bases use a semi-automated approach,
leaving much space for more efective and eficient automated management strategies [20]. To this
end, Constructivist Machine Learning provides a scalable approach for automatic generation
of knowledge bases. Moreover, with the implementation of an algorithmic deconstruction
process, it also provides automatic selection, combination and/or tuning of maintenance strategies.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        After introducing our constructivist approach to the machine learning and knowledge
engineering community [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and outlining how to operationalize this idea [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we have now further
specified the implementation of the deconstruction process. This process is central to
Constructivist Machine Learning as it is lays the foundation of interpretable and modular machine
learning. Deconstruction employs existing machine learning techniques in order to integrate
knowledige representations into or modify knowledge representations from a knowledge base.
By combining machine learning and knowledge engineering concepts, Constructivist Machine
Learning pursues ideas of hybrid intelligent systems. Yet, the ability to operate a knowledge
base automatically is a significant step ahead. To this end, the deconstruction process is a key
concept, which will be further investigated and assessed on real-world datasets.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Download</title>
      <p>In order to facilitate the application of Constructivist Machine Learning in practice, we
implemented this concept as a multi-language framework called conML. The current version is
available as open-source software at www.constructivist.ml/download</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>I would like to thank Florian Große, Dmitrij Denisenko, Dennis Carrer and Michael
Hermelschmidt for reviewing and discussing implementation details and working on Python, R
and Julia re-implementations of the original prototype implementation.
Processing Systems, 2019, pp. 14707–14718.
[18] P. Kirichenko, P. Izmailov, A. G. Wilson, Why normalizing flows fail to detect
out-ofdistribution data, Advances in neural information processing systems 33 (2020).
[19] R. Davis, H. Shrobe, P. Szolovits, What is a knowledge representation?, AI Magazine 14
(1993) 17–33.
[20] J. Martinez-Gil, Automated knowledge base management: A survey, Computer Science
Review 18 (2015) 1–9.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Makridakis</surname>
          </string-name>
          ,
          <article-title>A survey of time series</article-title>
          ,
          <source>International Statistical Review</source>
          <volume>44</volume>
          (
          <year>1976</year>
          )
          <fpage>29</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>A. L'Heureux</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Grolinger</surname>
            ,
            <given-names>H. F.</given-names>
          </string-name>
          <string-name>
            <surname>Elyamany</surname>
            ,
            <given-names>M. A. M.</given-names>
          </string-name>
          <string-name>
            <surname>Capretz</surname>
          </string-name>
          ,
          <article-title>Machine learning with big data: Challenges and approaches</article-title>
          ,
          <source>IEEE Access 5</source>
          (
          <year>2017</year>
          )
          <fpage>7776</fpage>
          -
          <lpage>7797</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Parker</surname>
          </string-name>
          ,
          <article-title>Unexpected challenges in large scale machine learning</article-title>
          ,
          <source>in: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Pérez-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Fontenla-Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Guijarro-Berdiñas</surname>
          </string-name>
          ,
          <article-title>A review of adaptive online learning for artificial neural networks</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>49</volume>
          (
          <year>2018</year>
          )
          <fpage>281</fpage>
          -
          <lpage>299</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmid</surname>
          </string-name>
          ,
          <article-title>Deconstructing the final frontier of artificial intelligence: Five theses for a constructivist machine learning</article-title>
          , in: A.
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Hinkelmann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gerber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Lenat</surname>
            ,
            <given-names>F. van Harmelen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          Clark (Eds.),
          <source>Proceedings of the AAAI</source>
          <year>2019</year>
          <article-title>Spring Symposium on Combining Machine Learning with Knowledge Engineering (AAAI-MAKE</article-title>
          <year>2019</year>
          ),
          <year>2019</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2350</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmid</surname>
          </string-name>
          ,
          <article-title>Using learning algorithms to create, exploit and maintain knowledge bases: Principles of constructivist machine learning</article-title>
          , in: A.
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Hinkelmann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gerber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Lenat</surname>
            ,
            <given-names>F. van Harmelen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          Clark (Eds.),
          <source>Proceedings of the AAAI</source>
          <year>2020</year>
          <article-title>Spring Symposium on Combining Machine Learning with Knowledge Engineering (AAAI-MAKE</article-title>
          <year>2020</year>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Stachowiak</surname>
          </string-name>
          , Allgemeine Modelltheorie, Springer,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Reich</surname>
          </string-name>
          ,
          <article-title>Systemisch-konstruktivistische didaktik. eine allgemeine zielbestimmung, Die Schule neu erfinden (</article-title>
          <year>1996</year>
          )
          <fpage>70</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Reich</surname>
          </string-name>
          , Konstruktivistische Didaktik.
          <source>Lehren und Lernen aus interaktionistischer Sicht</source>
          , 2nd ed.,
          <string-name>
            <surname>Luchterhan</surname>
          </string-name>
          , Munich,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Song</surname>
          </string-name>
          , Ckmeans. 1d.
          <article-title>dp: optimal k-means clustering in one dimension by dynamic programming</article-title>
          ,
          <source>The R journal 3</source>
          (
          <year>2011</year>
          )
          <fpage>29</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Widmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kubat</surname>
          </string-name>
          ,
          <article-title>Learning in the presence of concept drift and hidden contexts</article-title>
          ,
          <source>Machine Learning</source>
          <volume>23</volume>
          (
          <year>1996</year>
          )
          <fpage>69</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Stanley</surname>
          </string-name>
          ,
          <article-title>Learning concept drift with a committee of decision tree</article-title>
          ,
          <source>Technical Report UTAI-TR-03-302</source>
          , Department of Computer Sciences, University of Texas at Austin, Austin, Texas, USA,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kubat</surname>
          </string-name>
          , G. Widmer,
          <article-title>Adapting to drift in continuous domains</article-title>
          ,
          <source>Technical Report ÖFAITR-94-27</source>
          , Austrian Research Institute for Artificial Intelligence, Vienna, Austria,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Salganicof</surname>
          </string-name>
          ,
          <article-title>Tolerating concept and sampling shift in lazy learning using prediction error context switching</article-title>
          , in: Lazy learning, Springer,
          <year>1997</year>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bastien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bergeron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Boulanger-Lewandowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Breuel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chherawala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cisse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Côté</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eustache</surname>
          </string-name>
          , et al.,
          <article-title>Deep learners benefit more from outof-distribution examples</article-title>
          ,
          <source>in: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>164</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>T. DeVries</surname>
            ,
            <given-names>G. W.</given-names>
          </string-name>
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          ,
          <article-title>Learning confidence for out-of-distribution detection in neural networks</article-title>
          , arXiv preprint arXiv:
          <year>1802</year>
          .
          <volume>04865</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fertig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Snoek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Poplin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Depristo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lakshminarayanan</surname>
          </string-name>
          ,
          <article-title>Likelihood ratios for out-of-distribution detection</article-title>
          ,
          <source>in: Advances in Neural Information</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>