=Paper=
{{Paper
|id=Vol-2600/paper11
|storemode=property
|title=Using Learning Algorithms to Create, Exploit and Maintain Knowledge Bases: Principles of Constructivist Machine Learning
|pdfUrl=https://ceur-ws.org/Vol-2600/paper11.pdf
|volume=Vol-2600
|authors=Thomas Schmid
|dblpUrl=https://dblp.org/rec/conf/aaaiss/Schmid20
}}
==Using Learning Algorithms to Create, Exploit and Maintain Knowledge Bases: Principles of Constructivist Machine Learning==
<pdf width="1500px">https://ceur-ws.org/Vol-2600/paper11.pdf</pdf>
<pre>
   Using Learning Algorithms to Create, Exploit and Maintain Knowledge Bases:
                  Principles of Constructivist Machine Learning
                                                          Thomas Schmid
                                                        Universität Leipzig
                                                     Machine Learning Group
                                            Augustusplatz 10, D-04109 Leipzig, Germany
                                                schmid@informatik.uni-leipzig.de


                           Abstract                                    In fact, the idea of a hybrid artificial intelligence has been
                                                                    discussed for more than 30 years (Gallant 1988; Hendler
  Recently, interest has grown in connecting modern machine         1989; Skeirik 1990; Levey 1991; Morik et al. 1993). So
  learning approaches with traditional expert systems. This         far, however, most research in this field focuses on spe-
  can mean, e.g, to identify patterns with neural networks and
                                                                    cific knowledge or application domains like medical diag-
  integrate them with knowledge graphs. While such com-
  bined systems offer a variety of advantages, few domain-          nosis (Hudson, Cohen, and Anderson 1991; Karabatak and
  independent approaches are known to make a hybrid arti-           Ince 2009; Herrmann 1995). This is to a large extent due
  ficial intelligence applicable without human interaction. To      to the fact that knowledge bases are typically created manu-
  this end, we present the implementation of a constructivist       ally, which is a highly time-consuming task that requires de-
  machine learning framework (conML). This novel paradigm           tailled knowledge of the domain (Kidd 2012). No less time-
  uses machine learning to manage a knowledge base and              consuming are exploitation and maintenance of knowledge
  thereby allows for both raw data-based and symbolic infor-        bases, which are typical follow-up phases within the life
  mation processing on the same internal knowledge represen-        cycle of a knowledge base. While some progress has been
  tation. Based on axioms for a constructivist machine learning,    made in employing algorithms for these tasks, several ma-
  we describe which operations are required to create, exploit
                                                                    jor challenges for an automated management of knowledge
  and maintain a knowledge base and how these operations may
  be implemented with machine learning techniques. The major        bases are still considered unresolved (Martinez-Gil 2015).
  practical obstacle in this approach is to implement an auto-         Considering recent performance advancements in ma-
  mated deconstruction process that avoids ambiguity, handles       chine learning, manually managed knowledge bases obvi-
  continuous learning and allows knowledge abstraction. As we       ously constitute a serious bottleneck in creating efficient hy-
  demonstrate, however, these obstacles can be overcome and         brid systems. For truely automated systems, however, an im-
  constructivist machine learning can be put into practice.         plementable semantic interface between inductive machine
                                                                    learning and deductive expert systems is required. To this
Combining machine learning and knowledge engineering is             end, we have introduced a constructivist machine learning
currently considered a potential game changing advance-             paradigm (Schmid 2019) based on the concept of learn-
ment in artificial intelligence. Neural networks and other          able models and their storage in a knowledge base. While
machine learning techniques have proven strength in adapt-          machine learning is currently dominated by neuro-inspired
ing to highly complex patterns and relationships, but are un-       approaches, constructivist theories root in educational re-
able to represent existing knowledge explicitly and in an ab-       search (Fox 2001) and, so far, few actual implementations
stract fashion as expert systems can. Expert systems, on the        have been proposed for a constructivist machine learning
other hand, operate on human-understandable knowledge               (Drescher 1989; Quartz 1993). Central challenge for putting
representations but are highly domain-specific and, more-           this into practice is the implementation of an automated de-
over, unable to process real-world data directly as machine         construction process, which to the best of our knowledge has
learning can. Therefore, it is expected that joining both fields    only once been addressed successfully (Schmid 2018).
will produce a hybrid artificial intelligence that is “explain-        Based on this paradigm, we designed a prototype for a
able, compliant and grounded in domain knowledge” (Mar-             constructivist machine learning that employs a meta data-
tin et al. 2019). Such systems may, e.g., be able to iden-          based knowledge base. Here, we present the underlying op-
tify patterns with neural networks and integrate them with          erationalizations and concepts required to put constructivist
knowledge graphs (Subasic, Yin, and Lin 2019).                      machine learning into practice. The rest of the paper is or-
                                                                    ganized as follows: In section I, we lay out guidelines for
Copyright 2020 held by the author(s). In A. Martin, K. Hinkel-
mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen   automated knowledge base management. In section II, we
(Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com-       define Stachowiak-like models as building blocks for knowl-
bining Machine Learning and Knowledge Engineering in Practice       edge representations. In section III, we introduce principles
(AAAI-MAKE 2020). Stanford University, Palo Alto, California,       for constructivist machine learning processes. In section IV,
USA, March 23-25, 2020.                                             we summarize our approach and point out future goals.
     Raw    Meta
     Data   Data


                                                                                     integrate
                       select        Raw    Meta        learn        ML     Meta
                                     Data   Data                    Model   Data

                                                                                      modify


 Data Set or Stream                    Block                      Representation                          Knowledge Base


Figure 1: Transformation of real-world data into an abstract knowledge base. Using a constructivist machine learning approach,
real-world data is processed block-wise by learning algorithms with the aim of identifying an optimal representation for a given
block. Each representation is then integrated into an existing knowledge base consisting of previously identified representations.


              I. Knowledge Management                                  Automated Knowledge Base Management. Managing
In the context of knowledge engineering, a knowledge rep-           knowledge bases may be described by typical life cycle
resentation is typically a mathematical formalization like a        phases. Following Martinez-Gil (2015), a creation phase is
logic, rule, frame or semantic net related to real-world as-        characterized by acquisition, representation, storage and ma-
pects (Davis, Shrobe, and Szolovits 1993). We have recently         nipulation of knowledge, while an exploitation phase fo-
argued that any such formalization should be regarded as a          cusses on knowledge reasoning, retrieval and sharing; the
model in the sense of Stachowiak’s General Model Theory             maintenance phase is concerned with integration, valida-
(Schmid 2019). This implies that a formalization is not only        tion and meta-modeling of knowledge. Issues raising within
a representation and an abstraction, but also limited to cer-       these phases have been recognized and discussed (Richard-
tain temporal constraints, certain subjects and a certain pur-      son and Domingos 2003; Guisado-Gámez, Dominguez-
pose (Stachowiak 1973). Here, we represent and use these            Sal, and Larriba-Pey 2013; Falkner and Haselböck 2013).
three-dimensional limitations explicitly by employing meta          Most work on operating knowledge bases use a semi-
data acquired together with raw data (Fig. 1).                      automated approach, leaving much space for more effective
   Hierarchical Knowledge. By a basic definition, a knowl-          and efficient automated management strategies (Martinez-
edge base can be simply viewed as “a set of formulas” (Lifs-        Gil 2015). Important issues to be addressed include, on one
chitz, Morgenstern, and Plaisted 2008). In the present work,        hand, automatic generation of large knowledge bases as well
we use an extended definition and regard a set of meta data-        as automatic selection, combination and/or tuning of main-
enriched models as a knowledge base (Schmid 2019). We               tenance strategies. On the other hand, efficiency and explain-
further assume a meta data-based hierarchical ordering of           ablity of knowledge exploiting should be improved, too.
this set, as human knowledge is from an educational per-               Employment of Machine Learning. Here, machine
spective assumed to be organized in distinct levels (Bloom          learning techniques will be used for automatic generation of
1956). Findings from neurobiology also indicate a hierar-           knowledge bases as well as for automatic maintenance. In
chical organization for cognitive brain areas (Markov and           creation phases, machine learning algorithms are employed
Kennedy 2013). Using machine learning-based models as               to identify and/or manipulate optimal knowledge represen-
knowledge representations, we reflect a hierarchical order-         tations. In maintenance phases, machine learning algorithms
ing by using the output of other such models as input.              are used for validating such knowledge representations and
   Knowledge Domains. A revision of Bloom’s taxonomy                for supporting their integration into the knowledge base. To
suggests that apart from levels also domains of human cog-          this end, a major objective of maintenance is to keep the
nition should be distinguished (Anderson and Krathwohl              knowledge base ambiguity-free. For knowledge exploita-
2001). Conceptual knowledge, e.g., may be described as              tion, machine learning-based models of such a knowledge
knowledge about classifications, categories and structures.         base may be applied on new input data. This is due to the
Procedural knowledge, in contrast, may be described as              design aspect that each model is represented by a super-
knowledge about subject-specific abilities, algorithms or se-       vised learning algorithm, i.e. a classifier or regressor. Con-
lection criteria. We suggest to use individual knowledge            sequently, the underlying classifier or regressor may be used
bases for individual knowledge domains, i.e., factual, con-         on new data after training. Matching and mismatching new
ceptual, procedural and metacognitive models. In the present        data to a model can be achieved by the respective meta data.
work we wil focus on the conceptual knowledge domain and            In particular, application of the knowledge base can be re-
the mechanisms involved with this type of knowledge.                jected if no knowledge is available for a given input.
   II. Models as Knowledge Representations                        b) Machine Models
In the following, models will be used to represent acquired       If a finite set of j complete vector models is approximated by
knowledge. A model here is understood to be a pragmatic           a machine learning algorithm, the resulting approximation is
model in the sense of Stachowiak’s General Model The-             referred to as a machine model M:
ory (Stachowiak 1973). This includes mathematical func-
tions as well as their representation or approximation by                           M ∼ {V0 , ..., Vj−1 }                     (8)
machine learning techniques. More importantly, however,             A machine model M with given TM , ΣM and ZM is
Stachowiak-like models feature meta data about the valid-         called pragmatically defined machine model M∗ :
ity of the model regarding subject, purpose and time.
   The author, user or subject σ, of a model may in natural                       M∗ = (M, TM , ΣM , ZM )                     (9)
sciences be a sensor or a measuring device, and in observa-          The temporal validity TM of a machine model M can
tional studies or content analyses typically a human evalua-      only be assumed to be hypothetical and defined by means of
tor. The set of all model subjects σi for which a given model     hypothetical interval limits. These interval limits are derived
M is valid, is called ΣM and defined as the subset of the         from the underlying n vector models V ∗ (Schmid 2018),
(infinite) set Σ of all possible subjects:                        which were used to train the machine learning algorithm:
                           ΣM ⊂ Σ                          (1)               h                                               i
                                                                      TM = min(TV0∗ , ..., TVn−1∗  ), max(TV0∗ , ..., TVn−1
                                                                                                                        ∗   ) (10)
   The target parameter of a model is referred to as purpose
ζ. The set of all purposes ζi , for which a given model M is         ΣM defines the machine learning algorithms involved in
valid, is called ZM and defined as subset of the (infinite) set   creating and applying M. In order to allow for automated
Z of all possible model purposes:                                 model creation, we use generic descriptors. For a standard
                                                                  machine model, ΣM will be a set containing only one ele-
                           ZM ⊂ Z                          (2)    ment. If |ΣM∗ | > 1 holds true for a given M∗ , i.e., if the
  The temporal validity of a given model M is in general          model is valid for more than one machine learning algo-
represented by a time span TM or a minimum limit τmin             rithm, M∗ is called an intersubjective machine model.
and a maximum limit τmax , respectively:                             ZM defines the target parameters of a machine model M.
                                                                  In most cases, ZM will be a set containing only one ele-
                     TM = [τmin , τmax ]                   (3)    ment. In order to allow for automated model creation, we
                                                                  use generic descriptors that are a combination of the corre-
  In contrast to Stachowiak’s model concept, we limit our
                                                                  sponding knowledge domain, knowledge level and type of
approach to two types of models: to vector models on the
                                                                  task (e.g. binary classification). If M∗ is abstracted from
one hand and to algorithmically generated machine models
                                                                  machine models M0 , ..., Mn−1 , a higher level of knowl-
on the other. For both, a distinction is made between models
                                                                  edge is defined for M∗ than for M0 , ..., Mn−1 .
with and without explicitly defined pragmatic properties.

a) Vector Models                                                  c) Model Relationships
                                                                  The pragmatic features T , Σ, Z of Stachowiak-like models
In supervised machine learning, a training vector consists of
                                                                  may be employed to match and discriminate models auto-
an m-dimensional input vector I = (i0 , ..., im−1 ) and an n-
                                                                  matically. With vector models, e.g., this allows to identify
dimensional output vector O = (o0 , ..., on−1 ). Moreover, a
                                                                  sets of pragmatically related vector models and define ap-
mapping between I and O is implicitly assumed. Such vec-
                                                                  propriate learning strategies for each relationship.
tors are referred to as (complete) vector model V:
                                                                     The degree of relationship between two given
             V   =    (I, O)                               (4)    Stachowiak-like models Ma and Mb is termed
                 =    (i0 , ..., im−1 , o0 , ..., on−1 )   (5)    1. complete (T ΣZ),
                                                                     if TMa = TMb , ΣMa = ΣMb , ZMa = ZMb .
   If a given I is assigned an empty output vector, O = ∅,
the corresponding V = (I, ∅) is termed an incomplete vector       2. subjective-intentional (ΣZ),
model. Typical incomplete vector models are training vec-            if TMa 6= TMb , ΣMa = ΣMb , ZMa = ZMb ;
tors used for an unsupervised machine learning process.           3. temporal-intentional (T Z),
   If the pragmatic properties T , Σ and Z are explicitly de-        if TMa = TMb , ΣMa 6= ΣMb , ZMa = ZMb ;
fined for a complete vector model V, the resulting represen-      4. temporal-subjective (T Σ),
tation is called a pragmatically defined vector model V*:            if TMa = TMb , ΣMa = ΣMb , ZMa 6= ZMb ;
                   V ∗ = (V, TV , ΣV , ZV )                (6)       Such matching and discriminating is also a prerequisite
  Note that the time span TV , within which V is valid, is        for automating a deconstruction process for machine mod-
defined by the time of data collection. In the following, we      els. Depending on the underlying pragmatic relationship,
assume that error tolerances during data collection are negli-    procedures for a ΣT , T Z, T Σ or complete deconstruction
gible and that minimum and maximum borders are identical:         can be defined (section III).
                                                                     When applying existing machine models, pragmatic fea-
                  TV = τmin = τmax                         (7)    tures also indicate applicability for a given task or input.
                        End


                       no                                                                          KNOWLEDGE INTEGRATION

                                                                                              ΣZ, T ΣZ
                        next                   Update                            related
                       block?              Knowledge Base                        model?
                                                                                                           Deconstruction
               yes                                                         no               yes

                                                                                                          ΣZ,      TΣ
                                                                                                         T ΣZ


                                                                            Reconstruction                  Construction


                       no                                                            yes


                       learn     yes           Select                             known     no
      Block                                                                      targets?
                      blocks?                Learn block


                                             DATA MANAGEMENT                                      REPRESENTATION LEARNING


Figure 2: Principle of Constructivist Machine Learning. A given block of data and meta data is used to select and learn an optimal
representation. This representation is then integrated into a knowledge base (for simplicity not depicted here) and/or modified accordingly.


      III. Constructivist Machine Learning                               blocks are discarded. After the learning processes for this
                                                                         learn block have terminated, the knowledge base is updated
According to modern educational concepts, human learning
                                                                         according to the results of the learning processes. This may
takes place through construction, reconstruction or decon-
                                                                         imply storing a newly reconstructed model as well as modi-
struction of models. Following this paradigm, we develop
                                                                         fying or deleting existing models from the knowledge base.
concepts to implement such learning processes. To put con-
                                                                         As long as further blocks exist, this sequence of selecting
struction, reconstruction and deconstruction into practice,
                                                                         and processing data is repeated.
we require a corresponding knowledge base consisting of
Stachowiak-like models (section II) and employ a data man-                   Representation Learning. Various combinations of
agement process in order to organize for efficient learning.             learning processes are possible for a given learn block. In
   Data Management. As Fig. 2 depicts, starting point for                the most simple case, e.g., if the knowledge base contains no
a constructivist machine learning procedure is an arbitrary              models yet and target values are defined for the learn block,
set of pragmatically defined vector models (called block).               only a reconstruction is carried out and the resulting machine
From these samples, subsets of pragmatically related vec-                model is stored in the knowledge base. In an educational
tor models are identified and re-grouped into learn blocks.              context, reconstruction implies in general application, repe-
Dependending on the sample relationship, ΣZ-, T Σ-, T Z-                 tition or imitation, in particular the search for order, patterns
or completely related learn blocks may be found. The size                or models (Reich 2004, p. 145). Similarly, the reconstruction
of these learn blocks determines the following learning. Not             of a machine model is here understood as supervised learn-
all forms of relationship, however, are equally suitable for a           ing from given examples. In contrast to classical supervised
model construction. Especially constructions based on com-               learning, however, competing machine models are generated
pletely and T Z-related learn blocks offer little added value.           and evaluated with regard to their intersubjective validity.
Learn blocks of completely related vector models that are                    If no target values are defined for the learn block, such
not redundant but divergent even represent a serious source              targets are produced in a construction process, before the re-
of error. Learn blocks of T Z-related models basically al-               sulting model candidates enter the reconstruction process. In
low the generation of new models, which then, however,                   an educational context, construction is in general associated
does not represent a construction process but an intersub-               with creativity, innovation and production, and in particular
jective reconstruction process. Therefore, for constructions             with the search for new variations, combinations or transfers
learn blocks of ΣZ-related vector models are preferred.                  (Reich 2004, p. 145). For machine models this is interpreted
   If at least one learn block exceeds a user-defined mini-              as an unsupvervised learning that identifies or defines alter-
mum number of samples, the largest learn block is selected               native n-dimensional outputs to a set of incomplete vector
to undergo one or more learning processes. All other learn               models. Thereby, competing model candidates are created
                                                                                       Z
                                                                                k02
                                                                                       T


                                          k-Means                                ...

                                                                                       Z
                                                                                k0k
                                                                                       T

             Learn                           ...
             Block
                                                                                       Z
                                                                                s02
                                                                                       T


                                           SOMk                                  ...

                                                                                       Z
                                                                                s0k
                                                                                       T


                                    UNSUPERVISED LEARNING                   CANDIDATE FILTERING


Figure 3: Construction process for conceptual knowledge. A given learn block is analyzed by k-Means and self-organizing
maps (SOM) for cluster numbers 2,...,k. The resulting clusterings (s02,...,s0k,k02,...,k0k) are filtered on a user-defined basis.


that are evaluated in a following reconstruction process. Ra-      it is desirable to identify as many different machine models
tionale behind this is that it is a priori unclear which of the    as possible in as many different ways as possible. In a ba-
models constructed from a learn block can be reconstructed         sic conceptual construction setting, the well-known k-Means
with best accuracy and intersubjectivity.                          clustering as well as the neuro-inspired self-organizing map
   Knowledge Integration. After successful reconstruction,         (Baçao, Lobo, and Painho 2005) are used as alternative ap-
mechanisms are needed to manage integration into the               proaches. In a basic procedural construction setting, feature
knowledge base. In particular, a deconstruction process is         clustering (Chavent et al. 2012) and autoencoders (Hinton
carried out to avoid redundancies and contradictions, if prag-     and Salakhutdinov 2006) may be employed. If the algo-
matically related models exist in the knowledge base. In an        rithm requires to define in advance the number k of clus-
educational context, deconstruction in general means the in-       ters to be identified, all possible clusterings between 2 and
vestigation of an already existing construct for incomplete-       k are being tested. This maximum number of clusters is
ness, for the unforeseen and the unconscious, and in partic-       called maximum categorical complexity κk in the following.
ular the search for possible omissions, simplifications, ad-       With each clustering method κk −1 machine models with
ditions and criticism (Reich 2004, p. 145). In constructivist      k = {2, ..., κk } clusters or categories are generated.
machine learning, deconstruction is in particular associated           Candidate Filtering. Prerequisite for many clustering
with automated re-training of models and creating abstracted       methods is the prior definition of a number of clusters to
models. Deconstruction may result in modifying or discard-         be determined. Usually, several runs with different cluster
ing models of the knowledge base.                                  numbers are carried out with the same procedure and the
                                                                   clusterings obtained are evaluated with an external proce-
a) Construction                                                    dure (Jain 2010). Here, optimal clustering is determined by
The aim of the construction process is to provide alternative      reconstructing model candidates. Before entering the recon-
interpretations, or model candidates, with alternative model       struction process, however, clusterings are filtered by user-
purposes for a given learn block. In particular, more than         defined settings for minimal cluster size and minimal clus-
one model candidate is created for the same data during            tering error (e.g. minimal intra cluster error, if applicable).
construction and sent to a following reconstruction process.
The key components of the construction process are unsu-           b) Reconstruction
pervised learning and candidate filtering (Fig. 3).                The aim of the reconstruction process is to validate model
   Unsupervised Learning. Depending on the knowledge               candidates, assign model subjects and guarantee intersubjec-
domain under consideration, different types of unsupervised        tivity. The key components of this process are preprocessing,
algorithms are employed. For conceptual knowledge in the           supvervised learning and intersubjectivity evaluation (Fig.
sense of Bloom’s taxonomy (section I), or knowledge about          4). If more than one model candidate enters the reconstruc-
classifications, categories and structures, respectively, clus-    tion process from the construction process, only one model
tering algorithms are employed. The purpose of clustering          is selected as optimal learn block representation and trans-
in this case is to identify distinguishable categories within      ferred into the subsequent deconstruction process. All other
a learn block. In order to create diverse model candidates,        reconstructed models are discarded.
                                                  Neural Network
         Z
                        complexity   no                                   Krippendorffs            level       yes              Z
   s02                                                                                                                   M      Σ
         T
                         >κ?                                                    α                   ok?                         T

                                                  Random Forest
  Model
                       yes                                                                            no             Intersubjective
 Candidate
                                                                                                                         Model
                                                       ...
                    Feature Selection                                                          Discard Model


                     PREPROCESSING          SUPERVISED LEARNING              INTERSUBJECTIVITY ASSESSMENT


Figure 4: Reconstruction process. After initial preprocessing, supervised learning algorithms are applied on a given model
candidate. The predictions of these algorithms for test data is then used to evalute intersubjetivity of the model candidate.


   Preprocessing. For a given learn block or set of vector          statistically motivated learning procedure are applied in par-
models, respectively, first the number of input variables is        allel. Considering these criteria as well as the availability
assessed. If this number is greater than a user-defined max-        of suitable implementations, the methods used for the re-
imum model complexity κ, an algorithmic feature selection           construction process are multi-layer perceptrons and random
is carried out in order to reduce the complexity of the model       forests. In addition, further methods like support vector ma-
to ≤ κ. In principle, filter as well as wrapper and embedded        chines could be used to increase to diversity of methods.
methods could be used for this. The most convenient deci-               Intersubjectivity. Each supervised learning yields indi-
sion criterion for this is the amount of data to be processed or    vidual target values for each input vector. Consequently,
the selection processes to be performed. For large amounts          the question arises to what extent these competing meth-
of data, filter methods are preferable due to their efficiency      ods agree. Analogously to empirical studies, this is quan-
and computation effort, but they do not always provide op-          tified and evaluated with the interrater reliability coefficient
timal feature subsets. Embedded feature selection methods,          Krippendorf’s α. This coefficient can be calculated for both
on the other hand, promise a particularly careful selection of      nominal and metric scales and can therefore be used for
features, but require considerably more computational effort        both classification and regression (Krippendorff 1970). In
than filters for large amounts of data.                             contrast to other reliability coefficients, it can also be ap-
   For this reason, a hybrid approach is used here. As a ba-        plied to any number of raters. An α value of 1 indicates
sic principle, if an input data set consists of a smaller set of    optimal reliability, while a value less than or equal to 0
low-dimensional vector models, an embedded method is ap-            implies that there is no match between scores. Krippen-
plied. Conversely, a filter is used if an input data set consists   dorf’s α was repeatedly proposed as a standard measure
of a particularly large set of particularly high-dimensional        for quantifying interrater reliability (Krippendorff 2004;
vector models. Whether an embedded method is used or not            Hayes and Krippendorff 2007). Those reconstructed models
is decided by means of user-defined auxiliary parameters            that have been trained successfully, but whose α value does
for the maximum allowed number of input dimensions and              not exceed a user-defined threshold value, are discarded. If
the maximum allowed number of vector models. By default,            none of the reconstructed models exceeds this α threshold,
Correlation-based Feature Selection (CFS) is used as filter,        the current total reconstruction process is aborted.
and a Random Forest as embedded method.                                 Model Selection. If more than one model candidate
   Supervised Learning. Preprocessing is followed by ap-            passes the reconstruction process, it must be decided which
plication of at least two alternative supervised learning al-       of these competing models will be integrated into the knowl-
gorithms that are carried out in parallel. It is important to       edge base. In order to identify the learn block representation
note that these algorithms do act independently, but aim            that is least dependent on specific methods, these models are
to achieve agreement. In order to enable broad application,         ranked in descending order using Krippendorff’s α. Since
these methods should be able to solve both classification and       the maximum α value implies the maximum degree of inter-
regression tasks. In order to facilitate automated configura-       subjectivity, the model with the maximum α value can be in-
tion, they should also be non-parametric, i.e. they should          terpreted as the clearest model in the sense of Heinrich Hertz
not require a priori assumptions about the density distribu-        (Hertz 1894, p. 2f). If two or more models within the se-
tion of the data. Furthermore, a fundamental diversity of the       quence have an identical α value, the model with the small-
procedures is desirable in the sense of different procedural        est original image space is selected from this new subset.
approaches. For this purpose, a biologically inspired and a         This can be interpreted as the choice of the simplest model.
  Update Know-
   ledge Base


                            yes


                                            Store                    Replace                    Model
                        TΣ?                                                                    Disposal
                                  no      New Model                 Old Model


                                                                         yes                        no
                            no

                        ΣZ?                                          success?                   T ΣZ?
                                  yes                                           no


   Old Model                                                                                        yes
           Z
                            no
     M     Σ
           T
                                            Model                                               Model                  Learn Block
                        T ΣZ?
                                  yes       Fusion                                          Differentiation            Generation
           Z
     Mx    Σ
           T

   New Model

                                                                 Reconstruction∗                                      Construction


                   RELATIONSHIP                                                                                            KNOWLEDGE
                   MANAGEMENT                                                           MODEL RE-TRAINING                 ABSTRACTION


Figure 5: Deconstruction process. For deconstruction, two pragmatically related models are considered pairwise. They can undergo a
re-training procedure, which makes use of the reconstruction process, or be used to abstract knowledge by undergoing a construction process.


c) Deconstruction                                                        newly reconstructed model shows a complete relationship to
The aim of the deconstruction process is to combine new and              an existing model from the knowledge base, this may intro-
old knowledge in a way that avoids ambiguity and allows to               duce error and contradiction into the knowledge base. There-
abstract knowledge automatically. The key components of                  fore, this relationship is handled with high priority.
this process are relationship management, model re-training                 Model Re-training. With ΣZ-related models, the aim of
and knowledge abstraction (Fig. 5). Prerequisite is that an              deconstruction is to extend or replace the existing model
existing model has been identified from the corresponding                from the knowledge base. In particular, it is assessed
knowledge base that exhibits a pragmatic relationship to a               whether the temporal validity of the existing model can
newly reconstructed model. In the event that two or more re-             be expanded according to the temporal validity of the new
lated models are identified for a newly reconstructed model,             model. Both models are fused into a new model that is re-
these can either be deconstructed consecutively or the de-               trained via the reconstruction process. If successful, the old
construction process is aborted as soon as a complete, ΣZ,               model is replaced by the fused model, otherwise the new
T Z or T Σ deconstruction was successful.                                model and the fused model are discarded. For completely re-
   Relationship Management. What procedures are carried                  lated models, re-training is initiated by model fusion as well
out during deconstruction depends on the type of relation-               as by model differentiation. Model differentiation means
ship (section II) between the two models entering the decon-             that it is tested whether the fused model may be split in two
struction process together. The decision on what measures to             submodels of more limited temporal validity.
undertake is the initial task of the deconstruction process. In             In contrast to ΣZ relationships, deconstruction of com-
case of completely and ΣZ-related models, this relationship              pletely related models can not only extend but also falsify
is assessed by model re-training, which makes use of the re-             the validity of these models. If the model fusion is falsified
construction process. In case of T Σ-related models, decon-              in this case, the differentiation of the fused model is exe-
struction is carried out in terms of a knowledge abstraction             cuted or, if necessary, one of the contradicting models is dis-
procedure, which makes use of the construction process. The              carded. The disposal of models is carried out according to
case of T Z-related models would reflect that models with                a user-defined regime, which makes a distinction between a
the same purpose and same temporal validity but differing                conservative (Mold retained, Mnew discarded) and an inte-
subjects have been identified, which under a fixed intersub-             grative (Mold discarded, Mnew added to knowledge base)
jective reconstruction scheme is not possible; therefore, this           regime. Alternatively, if Mnew is based on a larger set of
relationship is not explicitly handled in the following. If a            vector models than Mold , Mnew is added to the knowledge
                             5


                             4
           Knowledge level


                             3


                             2


                             1
                                                                                                                             1.,52
                                                                                                                     1.,51
                                                                                                              1.,5
                             0                                                                        1.,49
                                                                                                                        Timestamp [109s]
                                                                                              1.,48
                                                                                      1.,47
                                          Features


Figure 6: Three-dimensional visualization of a hierarchical knowledge base acquired by constructivist machine learning. Nodes
represent machine models, except for level 0 whether nodes indicate which of the 349 features of the input data are used. Gray
bars at the x-axis indicate sets of similar features of the input data. Graph edges indicate connections betweens machine models.


base and Mold is discarded; otherwise, Mold is retained and             of a given model on a given input. High intersubjectivity
Mnew discarded. This regime is referred to as opportunistic.            and low ambiguity can be achieved for learned models and
   Knowledge Abstraction. A T Σ relationship provides the               knowledge bases by implementing consent-oriented multi-
basis to construct a new model on the next higher level of              algorithm supervised learning. The suggested deconstruc-
the knowledge base. In this case, both models share a con-              tion mechanisms allow to update a knowledge base automat-
gruent temporal validity and a common set of model subjec-              ically. Moreover, the deconstruction process defined even fa-
tives while differing in their model purpose. First, the newly          cilitates automated knowledge abstraction based on existing
reconstructed model is stored to the knowledge base. The                models of the knowledge base (Fig. 6). Given these features,
old model from the knowledge base is left unaltered. Using              constructivist machine learning is an ideal framework for
the outputs, or target values respectively, of the T Σ-related          applications in which diverse data sources need to be inte-
models, a new learn block without target values is formed.              grated, knowledge needs to be both assessible and automat-
This learn block is assigned a higher level than the underly-           ically updated, and where ambiguity has to be resolved.
ing models possess in the knowledge base and transferred to                Based on the presented principles, we will extend our ap-
a construction process, from which all further learning pro-            proach of combining machine learning and knowledge en-
cesses may be passed. Thereby, repeated abstraction from a              gineering in the future. While using generic descriptors for
single learn block is possible. Knowledge abstraction may               model purposes already allows to create a generic ontol-
be limited by a user-defined maximum of knowledge levels.               ogy, e.g., it is in practice desirable to match automatically
                                                                        learned knowledge representations with existing ontologies.
         IV. Conclusions and Future Work                                By this, it may become possible to transform existing knowl-
With this work, we have defined implementation princi-                  edge bases at least partially into automatically managed sys-
ples for a constructivist machine learning framework1 . We              tems. Further, we need to emphasize that our approach is
have demonstrated that by combining Stachowiak’s Gen-                   currently focusing on conceptual knowledge, but not limited
eral Model Theory and constructivist learning theories, ma-             to this domain. Future work will in particular include work
chine learning algorithms can be used to create, exploit                on procedural knowledge and on ways to combine concep-
and maintain hierarchical knowledge bases. In contrast to               tual and procedural knowledge in a meta-cognitive domain.
classical machine learning, this allows for an explicit rep-
resentation of acquired knowledge. Further, the employed                                      Acknowledgements
meta data structures support decisions on the applicability             I would like to thank Dmitrij Denisenko, Florian Große,
   1                                                                    Dennis Carrer and Michael Hermelschmidt for reviewing
    Currently, we are implementing conML for Python and other
                                                                        and discussing implementation details and working on re-
languages. The project is available online from our Git repository at
http://git.informatik.uni-leipzig.de/ml-group                           implementations of the original conML prototype.
                       References                               Krippendorff, K. 1970. Bivariate agreement coefficients for
                                                                reliability of data. In Bortatta., ed., Sociological Methodol-
Anderson, L. W., and Krathwohl, D. R. 2001. A taxonomy
                                                                ogy. 139–150.
for learning, teaching and assessing: a revision of Bloom’s
taxonomy of educational objectives. New York: Longman.          Krippendorff, K. 2004. Reliability in content analysis:
                                                                some common misconceptions and recommendations. Hu-
Baçao, F.; Lobo, V.; and Painho, M. 2005. Self-organizing      man Communication Research 30(3):411–433.
maps as substitutes for k-means clustering. In Computa-
tional Science–ICCS 2005. Springer. 476–483.                    Levey, C. A. 1991. Neural network having expert system
                                                                functionality. U.S. patent US5398300A.
Bloom, B. S. 1956. Taxonomy of educational objectives.
Vol. 1: cognitive domain. New York: McKay.                      Lifschitz, V.; Morgenstern, L.; and Plaisted, D. 2008.
                                                                Knowledge representation and classical logic. In van
Chavent, M.; Kuentz-Simonet, V.; Liquet, B.; and Saracco,       Harmelen, F.; Lifschitz, V.; and Porter, B., eds., Handbook
J. 2012. ClustOfVar: an R package for the clustering of         of Knowledge Representation, volume 3 of Foundations of
variables. Journal of Statistical Software 50(13).              Artificial Intelligence. Elsevier. chapter 1, 3–88.
Davis, R.; Shrobe, H.; and Szolovits, P. 1993. What is a        Markov, N. T., and Kennedy, H. 2013. The importance
knowledge representation? AI Magazine 14(1):17–33.              of being hierarchical. Current Opinion in Neurobiology
Drescher, G. L. 1989. Made-up minds : a constructivist          23(2):187–194.
approach to artificial intelligence. Ph.D. Dissertation, Mas-   Martin, A.; Hinkelmann, K.; Gerber, A.; Lenat, D.; van
sachusetts Institute of Technology.                             Harmelen, F.; and Clark, P. 2019. Preface. In Martin, A.;
Falkner, A., and Haselböck, A. 2013. Challenges of knowl-      Hinkelmann, K.; Gerber, A.; Lenat, D.; van Harmelen, F.;
edge evolution in practice. AI Communications 26(1):3–14.       and Clark, P., eds., Proceedings of the AAAI 2019 Spring
                                                                Symposium on Combining Machine Learning with Knowl-
Fox, R. 2001. Constructivism examined. Oxford Review of
                                                                edge Engineering (AAAI-MAKE 2019).
Education 27(1):23–35.
                                                                Martinez-Gil, J. 2015. Automated knowledge base manage-
Gallant, S. I. 1988. Connectionist expert systems. Commun.      ment: A survey. Computer Science Review 18:1–9.
ACM 31(2):152169.
                                                                Morik, K.; Kietz, B.-E.; Emde, W.; and Wrobel, S. 1993.
Guisado-Gámez, J.; Dominguez-Sal, D.; and Larriba-Pey,         Knowledge Acquisition and Machine Learning. San Fran-
J.-L. 2013. Massive query expansion by exploiting graph         cisco, CA, USA: Morgan Kaufmann Publishers.
knowledge bases. arXiv 1310.5698.
                                                                Quartz, S. R. 1993. Neural networks, nativism, and the
Hayes, A. F., and Krippendorff, K. 2007. Answering the call     plausibility of constructivism. Cognition 48(3):223–242.
for a standard reliability measure for coding data. Commu-
                                                                Reich, K. 2004. Konstruktivistische Didaktik. Lehren und
nication Methods and Measures 1(1):77–89.
                                                                Lernen aus interaktionistischer Sicht. Munich: Luchterhan,
Hendler, J. A. 1989. On the need for hybrid systems. Con-       2nd edition.
nection Science 1(3):227–229.                                   Richardson, M., and Domingos, P. 2003. Building large
Herrmann, C. S. 1995. A hybrid fuzzy-neural expert system       knowledge bases by mass collaboration. In Proceedings of
for diagnosis. In Proceedings of the 14th International Joint   the 2nd International Conference on Knowledge Capture,
Conference on Artificial Intelligence - Vol. 1, 494–500. San    K-CAP 03, 129–137. New York, NY, USA: Association for
Francisco, CA, USA: Morgan Kaufmann Publishers.                 Computing Machinery.
Hertz, H. 1894. Die Prinzipien der Mechanik in neuem            Schmid, T. 2018. Automatisierte Analyse von Impedanz-
Zusammenhange dargestellt. In Hertz, H., ed., Gesammelte        spektren mittels konstruktivistischen maschinellen Lernens.
Werke, volume 3. Leipzig: Barth.                                Ph.D. Dissertation, Leipzig.
Hinton, G. E., and Salakhutdinov, R. R. 2006. Reducing          Schmid, T. 2019. Deconstructing the final frontier of arti-
the dimensionality of data with neural networks. Science        ficial intelligence: Five theses for a constructivist machine
313(5786):504–507.                                              learning. In Martin, A.; Hinkelmann, K.; Gerber, A.; Lenat,
Hudson, D. L.; Cohen, M. E.; and Anderson, M. F. 1991.          D.; van Harmelen, F.; and Clark, P., eds., Proceedings of
Use of neural network techniques in a medical expert sys-       the AAAI 2019 Spring Symposium on Combining Machine
tem. International Journal of Intelligent Systems 6(2):213–     Learning with Knowledge Engineering (AAAI-MAKE 2019).
223.                                                            Skeirik, R. D. 1990. Neural network/expert system process
Jain, A. K. 2010. Data clustering: 50 years beyond k-means.     control system and method. U.S. patent US5121467A.
Pattern Recognition Letters 31(8):651–666.                      Stachowiak, H. 1973. Allgemeine Modelltheorie. Springer.
Karabatak, M., and Ince, M. C. 2009. An expert system           Subasic, P.; Yin, H.; and Lin, X. 2019. Building knowledge
for detection of breast cancer based on association rules and   basethrough deep learningrelation extraction and wikidata.
neural network. Expert Systems with Applications 36(2, Part     In Martin, A.; Hinkelmann, K.; Gerber, A.; Lenat, D.; van
2):3465–3469.                                                   Harmelen, F.; and Clark, P., eds., Proceedings of the AAAI
                                                                2019 Spring Symposium on Combining Machine Learning
Kidd, A. 2012. Knowledge acquisition for expert systems:        with Knowledge Engineering (AAAI-MAKE 2019).
A practical handbook. Springer Science & Business Media.

</pre>