=Paper=
{{Paper
|id=Vol-2600/paper11
|storemode=property
|title=Using Learning Algorithms to Create, Exploit and Maintain Knowledge Bases: Principles of Constructivist Machine Learning
|pdfUrl=https://ceur-ws.org/Vol-2600/paper11.pdf
|volume=Vol-2600
|authors=Thomas Schmid
|dblpUrl=https://dblp.org/rec/conf/aaaiss/Schmid20
}}
==Using Learning Algorithms to Create, Exploit and Maintain Knowledge Bases: Principles of Constructivist Machine Learning==
Using Learning Algorithms to Create, Exploit and Maintain Knowledge Bases: Principles of Constructivist Machine Learning Thomas Schmid Universität Leipzig Machine Learning Group Augustusplatz 10, D-04109 Leipzig, Germany schmid@informatik.uni-leipzig.de Abstract In fact, the idea of a hybrid artificial intelligence has been discussed for more than 30 years (Gallant 1988; Hendler Recently, interest has grown in connecting modern machine 1989; Skeirik 1990; Levey 1991; Morik et al. 1993). So learning approaches with traditional expert systems. This far, however, most research in this field focuses on spe- can mean, e.g, to identify patterns with neural networks and cific knowledge or application domains like medical diag- integrate them with knowledge graphs. While such com- bined systems offer a variety of advantages, few domain- nosis (Hudson, Cohen, and Anderson 1991; Karabatak and independent approaches are known to make a hybrid arti- Ince 2009; Herrmann 1995). This is to a large extent due ficial intelligence applicable without human interaction. To to the fact that knowledge bases are typically created manu- this end, we present the implementation of a constructivist ally, which is a highly time-consuming task that requires de- machine learning framework (conML). This novel paradigm tailled knowledge of the domain (Kidd 2012). No less time- uses machine learning to manage a knowledge base and consuming are exploitation and maintenance of knowledge thereby allows for both raw data-based and symbolic infor- bases, which are typical follow-up phases within the life mation processing on the same internal knowledge represen- cycle of a knowledge base. While some progress has been tation. Based on axioms for a constructivist machine learning, made in employing algorithms for these tasks, several ma- we describe which operations are required to create, exploit jor challenges for an automated management of knowledge and maintain a knowledge base and how these operations may be implemented with machine learning techniques. The major bases are still considered unresolved (Martinez-Gil 2015). practical obstacle in this approach is to implement an auto- Considering recent performance advancements in ma- mated deconstruction process that avoids ambiguity, handles chine learning, manually managed knowledge bases obvi- continuous learning and allows knowledge abstraction. As we ously constitute a serious bottleneck in creating efficient hy- demonstrate, however, these obstacles can be overcome and brid systems. For truely automated systems, however, an im- constructivist machine learning can be put into practice. plementable semantic interface between inductive machine learning and deductive expert systems is required. To this Combining machine learning and knowledge engineering is end, we have introduced a constructivist machine learning currently considered a potential game changing advance- paradigm (Schmid 2019) based on the concept of learn- ment in artificial intelligence. Neural networks and other able models and their storage in a knowledge base. While machine learning techniques have proven strength in adapt- machine learning is currently dominated by neuro-inspired ing to highly complex patterns and relationships, but are un- approaches, constructivist theories root in educational re- able to represent existing knowledge explicitly and in an ab- search (Fox 2001) and, so far, few actual implementations stract fashion as expert systems can. Expert systems, on the have been proposed for a constructivist machine learning other hand, operate on human-understandable knowledge (Drescher 1989; Quartz 1993). Central challenge for putting representations but are highly domain-specific and, more- this into practice is the implementation of an automated de- over, unable to process real-world data directly as machine construction process, which to the best of our knowledge has learning can. Therefore, it is expected that joining both fields only once been addressed successfully (Schmid 2018). will produce a hybrid artificial intelligence that is “explain- Based on this paradigm, we designed a prototype for a able, compliant and grounded in domain knowledge” (Mar- constructivist machine learning that employs a meta data- tin et al. 2019). Such systems may, e.g., be able to iden- based knowledge base. Here, we present the underlying op- tify patterns with neural networks and integrate them with erationalizations and concepts required to put constructivist knowledge graphs (Subasic, Yin, and Lin 2019). machine learning into practice. The rest of the paper is or- ganized as follows: In section I, we lay out guidelines for Copyright 2020 held by the author(s). In A. Martin, K. Hinkel- mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen automated knowledge base management. In section II, we (Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com- define Stachowiak-like models as building blocks for knowl- bining Machine Learning and Knowledge Engineering in Practice edge representations. In section III, we introduce principles (AAAI-MAKE 2020). Stanford University, Palo Alto, California, for constructivist machine learning processes. In section IV, USA, March 23-25, 2020. we summarize our approach and point out future goals. Raw Meta Data Data integrate select Raw Meta learn ML Meta Data Data Model Data modify Data Set or Stream Block Representation Knowledge Base Figure 1: Transformation of real-world data into an abstract knowledge base. Using a constructivist machine learning approach, real-world data is processed block-wise by learning algorithms with the aim of identifying an optimal representation for a given block. Each representation is then integrated into an existing knowledge base consisting of previously identified representations. I. Knowledge Management Automated Knowledge Base Management. Managing In the context of knowledge engineering, a knowledge rep- knowledge bases may be described by typical life cycle resentation is typically a mathematical formalization like a phases. Following Martinez-Gil (2015), a creation phase is logic, rule, frame or semantic net related to real-world as- characterized by acquisition, representation, storage and ma- pects (Davis, Shrobe, and Szolovits 1993). We have recently nipulation of knowledge, while an exploitation phase fo- argued that any such formalization should be regarded as a cusses on knowledge reasoning, retrieval and sharing; the model in the sense of Stachowiak’s General Model Theory maintenance phase is concerned with integration, valida- (Schmid 2019). This implies that a formalization is not only tion and meta-modeling of knowledge. Issues raising within a representation and an abstraction, but also limited to cer- these phases have been recognized and discussed (Richard- tain temporal constraints, certain subjects and a certain pur- son and Domingos 2003; Guisado-Gámez, Dominguez- pose (Stachowiak 1973). Here, we represent and use these Sal, and Larriba-Pey 2013; Falkner and Haselböck 2013). three-dimensional limitations explicitly by employing meta Most work on operating knowledge bases use a semi- data acquired together with raw data (Fig. 1). automated approach, leaving much space for more effective Hierarchical Knowledge. By a basic definition, a knowl- and efficient automated management strategies (Martinez- edge base can be simply viewed as “a set of formulas” (Lifs- Gil 2015). Important issues to be addressed include, on one chitz, Morgenstern, and Plaisted 2008). In the present work, hand, automatic generation of large knowledge bases as well we use an extended definition and regard a set of meta data- as automatic selection, combination and/or tuning of main- enriched models as a knowledge base (Schmid 2019). We tenance strategies. On the other hand, efficiency and explain- further assume a meta data-based hierarchical ordering of ablity of knowledge exploiting should be improved, too. this set, as human knowledge is from an educational per- Employment of Machine Learning. Here, machine spective assumed to be organized in distinct levels (Bloom learning techniques will be used for automatic generation of 1956). Findings from neurobiology also indicate a hierar- knowledge bases as well as for automatic maintenance. In chical organization for cognitive brain areas (Markov and creation phases, machine learning algorithms are employed Kennedy 2013). Using machine learning-based models as to identify and/or manipulate optimal knowledge represen- knowledge representations, we reflect a hierarchical order- tations. In maintenance phases, machine learning algorithms ing by using the output of other such models as input. are used for validating such knowledge representations and Knowledge Domains. A revision of Bloom’s taxonomy for supporting their integration into the knowledge base. To suggests that apart from levels also domains of human cog- this end, a major objective of maintenance is to keep the nition should be distinguished (Anderson and Krathwohl knowledge base ambiguity-free. For knowledge exploita- 2001). Conceptual knowledge, e.g., may be described as tion, machine learning-based models of such a knowledge knowledge about classifications, categories and structures. base may be applied on new input data. This is due to the Procedural knowledge, in contrast, may be described as design aspect that each model is represented by a super- knowledge about subject-specific abilities, algorithms or se- vised learning algorithm, i.e. a classifier or regressor. Con- lection criteria. We suggest to use individual knowledge sequently, the underlying classifier or regressor may be used bases for individual knowledge domains, i.e., factual, con- on new data after training. Matching and mismatching new ceptual, procedural and metacognitive models. In the present data to a model can be achieved by the respective meta data. work we wil focus on the conceptual knowledge domain and In particular, application of the knowledge base can be re- the mechanisms involved with this type of knowledge. jected if no knowledge is available for a given input. II. Models as Knowledge Representations b) Machine Models In the following, models will be used to represent acquired If a finite set of j complete vector models is approximated by knowledge. A model here is understood to be a pragmatic a machine learning algorithm, the resulting approximation is model in the sense of Stachowiak’s General Model The- referred to as a machine model M: ory (Stachowiak 1973). This includes mathematical func- tions as well as their representation or approximation by M ∼ {V0 , ..., Vj−1 } (8) machine learning techniques. More importantly, however, A machine model M with given TM , ΣM and ZM is Stachowiak-like models feature meta data about the valid- called pragmatically defined machine model M∗ : ity of the model regarding subject, purpose and time. The author, user or subject σ, of a model may in natural M∗ = (M, TM , ΣM , ZM ) (9) sciences be a sensor or a measuring device, and in observa- The temporal validity TM of a machine model M can tional studies or content analyses typically a human evalua- only be assumed to be hypothetical and defined by means of tor. The set of all model subjects σi for which a given model hypothetical interval limits. These interval limits are derived M is valid, is called ΣM and defined as the subset of the from the underlying n vector models V ∗ (Schmid 2018), (infinite) set Σ of all possible subjects: which were used to train the machine learning algorithm: ΣM ⊂ Σ (1) h i TM = min(TV0∗ , ..., TVn−1∗ ), max(TV0∗ , ..., TVn−1 ∗ ) (10) The target parameter of a model is referred to as purpose ζ. The set of all purposes ζi , for which a given model M is ΣM defines the machine learning algorithms involved in valid, is called ZM and defined as subset of the (infinite) set creating and applying M. In order to allow for automated Z of all possible model purposes: model creation, we use generic descriptors. For a standard machine model, ΣM will be a set containing only one ele- ZM ⊂ Z (2) ment. If |ΣM∗ | > 1 holds true for a given M∗ , i.e., if the The temporal validity of a given model M is in general model is valid for more than one machine learning algo- represented by a time span TM or a minimum limit τmin rithm, M∗ is called an intersubjective machine model. and a maximum limit τmax , respectively: ZM defines the target parameters of a machine model M. In most cases, ZM will be a set containing only one ele- TM = [τmin , τmax ] (3) ment. In order to allow for automated model creation, we use generic descriptors that are a combination of the corre- In contrast to Stachowiak’s model concept, we limit our sponding knowledge domain, knowledge level and type of approach to two types of models: to vector models on the task (e.g. binary classification). If M∗ is abstracted from one hand and to algorithmically generated machine models machine models M0 , ..., Mn−1 , a higher level of knowl- on the other. For both, a distinction is made between models edge is defined for M∗ than for M0 , ..., Mn−1 . with and without explicitly defined pragmatic properties. a) Vector Models c) Model Relationships The pragmatic features T , Σ, Z of Stachowiak-like models In supervised machine learning, a training vector consists of may be employed to match and discriminate models auto- an m-dimensional input vector I = (i0 , ..., im−1 ) and an n- matically. With vector models, e.g., this allows to identify dimensional output vector O = (o0 , ..., on−1 ). Moreover, a sets of pragmatically related vector models and define ap- mapping between I and O is implicitly assumed. Such vec- propriate learning strategies for each relationship. tors are referred to as (complete) vector model V: The degree of relationship between two given V = (I, O) (4) Stachowiak-like models Ma and Mb is termed = (i0 , ..., im−1 , o0 , ..., on−1 ) (5) 1. complete (T ΣZ), if TMa = TMb , ΣMa = ΣMb , ZMa = ZMb . If a given I is assigned an empty output vector, O = ∅, the corresponding V = (I, ∅) is termed an incomplete vector 2. subjective-intentional (ΣZ), model. Typical incomplete vector models are training vec- if TMa 6= TMb , ΣMa = ΣMb , ZMa = ZMb ; tors used for an unsupervised machine learning process. 3. temporal-intentional (T Z), If the pragmatic properties T , Σ and Z are explicitly de- if TMa = TMb , ΣMa 6= ΣMb , ZMa = ZMb ; fined for a complete vector model V, the resulting represen- 4. temporal-subjective (T Σ), tation is called a pragmatically defined vector model V*: if TMa = TMb , ΣMa = ΣMb , ZMa 6= ZMb ; V ∗ = (V, TV , ΣV , ZV ) (6) Such matching and discriminating is also a prerequisite Note that the time span TV , within which V is valid, is for automating a deconstruction process for machine mod- defined by the time of data collection. In the following, we els. Depending on the underlying pragmatic relationship, assume that error tolerances during data collection are negli- procedures for a ΣT , T Z, T Σ or complete deconstruction gible and that minimum and maximum borders are identical: can be defined (section III). When applying existing machine models, pragmatic fea- TV = τmin = τmax (7) tures also indicate applicability for a given task or input. End no KNOWLEDGE INTEGRATION ΣZ, T ΣZ next Update related block? Knowledge Base model? Deconstruction yes no yes ΣZ, TΣ T ΣZ Reconstruction Construction no yes learn yes Select known no Block targets? blocks? Learn block DATA MANAGEMENT REPRESENTATION LEARNING Figure 2: Principle of Constructivist Machine Learning. A given block of data and meta data is used to select and learn an optimal representation. This representation is then integrated into a knowledge base (for simplicity not depicted here) and/or modified accordingly. III. Constructivist Machine Learning blocks are discarded. After the learning processes for this learn block have terminated, the knowledge base is updated According to modern educational concepts, human learning according to the results of the learning processes. This may takes place through construction, reconstruction or decon- imply storing a newly reconstructed model as well as modi- struction of models. Following this paradigm, we develop fying or deleting existing models from the knowledge base. concepts to implement such learning processes. To put con- As long as further blocks exist, this sequence of selecting struction, reconstruction and deconstruction into practice, and processing data is repeated. we require a corresponding knowledge base consisting of Stachowiak-like models (section II) and employ a data man- Representation Learning. Various combinations of agement process in order to organize for efficient learning. learning processes are possible for a given learn block. In Data Management. As Fig. 2 depicts, starting point for the most simple case, e.g., if the knowledge base contains no a constructivist machine learning procedure is an arbitrary models yet and target values are defined for the learn block, set of pragmatically defined vector models (called block). only a reconstruction is carried out and the resulting machine From these samples, subsets of pragmatically related vec- model is stored in the knowledge base. In an educational tor models are identified and re-grouped into learn blocks. context, reconstruction implies in general application, repe- Dependending on the sample relationship, ΣZ-, T Σ-, T Z- tition or imitation, in particular the search for order, patterns or completely related learn blocks may be found. The size or models (Reich 2004, p. 145). Similarly, the reconstruction of these learn blocks determines the following learning. Not of a machine model is here understood as supervised learn- all forms of relationship, however, are equally suitable for a ing from given examples. In contrast to classical supervised model construction. Especially constructions based on com- learning, however, competing machine models are generated pletely and T Z-related learn blocks offer little added value. and evaluated with regard to their intersubjective validity. Learn blocks of completely related vector models that are If no target values are defined for the learn block, such not redundant but divergent even represent a serious source targets are produced in a construction process, before the re- of error. Learn blocks of T Z-related models basically al- sulting model candidates enter the reconstruction process. In low the generation of new models, which then, however, an educational context, construction is in general associated does not represent a construction process but an intersub- with creativity, innovation and production, and in particular jective reconstruction process. Therefore, for constructions with the search for new variations, combinations or transfers learn blocks of ΣZ-related vector models are preferred. (Reich 2004, p. 145). For machine models this is interpreted If at least one learn block exceeds a user-defined mini- as an unsupvervised learning that identifies or defines alter- mum number of samples, the largest learn block is selected native n-dimensional outputs to a set of incomplete vector to undergo one or more learning processes. All other learn models. Thereby, competing model candidates are created Z k02 T k-Means ... Z k0k T Learn ... Block Z s02 T SOMk ... Z s0k T UNSUPERVISED LEARNING CANDIDATE FILTERING Figure 3: Construction process for conceptual knowledge. A given learn block is analyzed by k-Means and self-organizing maps (SOM) for cluster numbers 2,...,k. The resulting clusterings (s02,...,s0k,k02,...,k0k) are filtered on a user-defined basis. that are evaluated in a following reconstruction process. Ra- it is desirable to identify as many different machine models tionale behind this is that it is a priori unclear which of the as possible in as many different ways as possible. In a ba- models constructed from a learn block can be reconstructed sic conceptual construction setting, the well-known k-Means with best accuracy and intersubjectivity. clustering as well as the neuro-inspired self-organizing map Knowledge Integration. After successful reconstruction, (Baçao, Lobo, and Painho 2005) are used as alternative ap- mechanisms are needed to manage integration into the proaches. In a basic procedural construction setting, feature knowledge base. In particular, a deconstruction process is clustering (Chavent et al. 2012) and autoencoders (Hinton carried out to avoid redundancies and contradictions, if prag- and Salakhutdinov 2006) may be employed. If the algo- matically related models exist in the knowledge base. In an rithm requires to define in advance the number k of clus- educational context, deconstruction in general means the in- ters to be identified, all possible clusterings between 2 and vestigation of an already existing construct for incomplete- k are being tested. This maximum number of clusters is ness, for the unforeseen and the unconscious, and in partic- called maximum categorical complexity κk in the following. ular the search for possible omissions, simplifications, ad- With each clustering method κk −1 machine models with ditions and criticism (Reich 2004, p. 145). In constructivist k = {2, ..., κk } clusters or categories are generated. machine learning, deconstruction is in particular associated Candidate Filtering. Prerequisite for many clustering with automated re-training of models and creating abstracted methods is the prior definition of a number of clusters to models. Deconstruction may result in modifying or discard- be determined. Usually, several runs with different cluster ing models of the knowledge base. numbers are carried out with the same procedure and the clusterings obtained are evaluated with an external proce- a) Construction dure (Jain 2010). Here, optimal clustering is determined by The aim of the construction process is to provide alternative reconstructing model candidates. Before entering the recon- interpretations, or model candidates, with alternative model struction process, however, clusterings are filtered by user- purposes for a given learn block. In particular, more than defined settings for minimal cluster size and minimal clus- one model candidate is created for the same data during tering error (e.g. minimal intra cluster error, if applicable). construction and sent to a following reconstruction process. The key components of the construction process are unsu- b) Reconstruction pervised learning and candidate filtering (Fig. 3). The aim of the reconstruction process is to validate model Unsupervised Learning. Depending on the knowledge candidates, assign model subjects and guarantee intersubjec- domain under consideration, different types of unsupervised tivity. The key components of this process are preprocessing, algorithms are employed. For conceptual knowledge in the supvervised learning and intersubjectivity evaluation (Fig. sense of Bloom’s taxonomy (section I), or knowledge about 4). If more than one model candidate enters the reconstruc- classifications, categories and structures, respectively, clus- tion process from the construction process, only one model tering algorithms are employed. The purpose of clustering is selected as optimal learn block representation and trans- in this case is to identify distinguishable categories within ferred into the subsequent deconstruction process. All other a learn block. In order to create diverse model candidates, reconstructed models are discarded. Neural Network Z complexity no Krippendorffs level yes Z s02 M Σ T >κ? α ok? T Random Forest Model yes no Intersubjective Candidate Model ... Feature Selection Discard Model PREPROCESSING SUPERVISED LEARNING INTERSUBJECTIVITY ASSESSMENT Figure 4: Reconstruction process. After initial preprocessing, supervised learning algorithms are applied on a given model candidate. The predictions of these algorithms for test data is then used to evalute intersubjetivity of the model candidate. Preprocessing. For a given learn block or set of vector statistically motivated learning procedure are applied in par- models, respectively, first the number of input variables is allel. Considering these criteria as well as the availability assessed. If this number is greater than a user-defined max- of suitable implementations, the methods used for the re- imum model complexity κ, an algorithmic feature selection construction process are multi-layer perceptrons and random is carried out in order to reduce the complexity of the model forests. In addition, further methods like support vector ma- to ≤ κ. In principle, filter as well as wrapper and embedded chines could be used to increase to diversity of methods. methods could be used for this. The most convenient deci- Intersubjectivity. Each supervised learning yields indi- sion criterion for this is the amount of data to be processed or vidual target values for each input vector. Consequently, the selection processes to be performed. For large amounts the question arises to what extent these competing meth- of data, filter methods are preferable due to their efficiency ods agree. Analogously to empirical studies, this is quan- and computation effort, but they do not always provide op- tified and evaluated with the interrater reliability coefficient timal feature subsets. Embedded feature selection methods, Krippendorf’s α. This coefficient can be calculated for both on the other hand, promise a particularly careful selection of nominal and metric scales and can therefore be used for features, but require considerably more computational effort both classification and regression (Krippendorff 1970). In than filters for large amounts of data. contrast to other reliability coefficients, it can also be ap- For this reason, a hybrid approach is used here. As a ba- plied to any number of raters. An α value of 1 indicates sic principle, if an input data set consists of a smaller set of optimal reliability, while a value less than or equal to 0 low-dimensional vector models, an embedded method is ap- implies that there is no match between scores. Krippen- plied. Conversely, a filter is used if an input data set consists dorf’s α was repeatedly proposed as a standard measure of a particularly large set of particularly high-dimensional for quantifying interrater reliability (Krippendorff 2004; vector models. Whether an embedded method is used or not Hayes and Krippendorff 2007). Those reconstructed models is decided by means of user-defined auxiliary parameters that have been trained successfully, but whose α value does for the maximum allowed number of input dimensions and not exceed a user-defined threshold value, are discarded. If the maximum allowed number of vector models. By default, none of the reconstructed models exceeds this α threshold, Correlation-based Feature Selection (CFS) is used as filter, the current total reconstruction process is aborted. and a Random Forest as embedded method. Model Selection. If more than one model candidate Supervised Learning. Preprocessing is followed by ap- passes the reconstruction process, it must be decided which plication of at least two alternative supervised learning al- of these competing models will be integrated into the knowl- gorithms that are carried out in parallel. It is important to edge base. In order to identify the learn block representation note that these algorithms do act independently, but aim that is least dependent on specific methods, these models are to achieve agreement. In order to enable broad application, ranked in descending order using Krippendorff’s α. Since these methods should be able to solve both classification and the maximum α value implies the maximum degree of inter- regression tasks. In order to facilitate automated configura- subjectivity, the model with the maximum α value can be in- tion, they should also be non-parametric, i.e. they should terpreted as the clearest model in the sense of Heinrich Hertz not require a priori assumptions about the density distribu- (Hertz 1894, p. 2f). If two or more models within the se- tion of the data. Furthermore, a fundamental diversity of the quence have an identical α value, the model with the small- procedures is desirable in the sense of different procedural est original image space is selected from this new subset. approaches. For this purpose, a biologically inspired and a This can be interpreted as the choice of the simplest model. Update Know- ledge Base yes Store Replace Model TΣ? Disposal no New Model Old Model yes no no ΣZ? success? T ΣZ? yes no Old Model yes Z no M Σ T Model Model Learn Block T ΣZ? yes Fusion Differentiation Generation Z Mx Σ T New Model Reconstruction∗ Construction RELATIONSHIP KNOWLEDGE MANAGEMENT MODEL RE-TRAINING ABSTRACTION Figure 5: Deconstruction process. For deconstruction, two pragmatically related models are considered pairwise. They can undergo a re-training procedure, which makes use of the reconstruction process, or be used to abstract knowledge by undergoing a construction process. c) Deconstruction newly reconstructed model shows a complete relationship to The aim of the deconstruction process is to combine new and an existing model from the knowledge base, this may intro- old knowledge in a way that avoids ambiguity and allows to duce error and contradiction into the knowledge base. There- abstract knowledge automatically. The key components of fore, this relationship is handled with high priority. this process are relationship management, model re-training Model Re-training. With ΣZ-related models, the aim of and knowledge abstraction (Fig. 5). Prerequisite is that an deconstruction is to extend or replace the existing model existing model has been identified from the corresponding from the knowledge base. In particular, it is assessed knowledge base that exhibits a pragmatic relationship to a whether the temporal validity of the existing model can newly reconstructed model. In the event that two or more re- be expanded according to the temporal validity of the new lated models are identified for a newly reconstructed model, model. Both models are fused into a new model that is re- these can either be deconstructed consecutively or the de- trained via the reconstruction process. If successful, the old construction process is aborted as soon as a complete, ΣZ, model is replaced by the fused model, otherwise the new T Z or T Σ deconstruction was successful. model and the fused model are discarded. For completely re- Relationship Management. What procedures are carried lated models, re-training is initiated by model fusion as well out during deconstruction depends on the type of relation- as by model differentiation. Model differentiation means ship (section II) between the two models entering the decon- that it is tested whether the fused model may be split in two struction process together. The decision on what measures to submodels of more limited temporal validity. undertake is the initial task of the deconstruction process. In In contrast to ΣZ relationships, deconstruction of com- case of completely and ΣZ-related models, this relationship pletely related models can not only extend but also falsify is assessed by model re-training, which makes use of the re- the validity of these models. If the model fusion is falsified construction process. In case of T Σ-related models, decon- in this case, the differentiation of the fused model is exe- struction is carried out in terms of a knowledge abstraction cuted or, if necessary, one of the contradicting models is dis- procedure, which makes use of the construction process. The carded. The disposal of models is carried out according to case of T Z-related models would reflect that models with a user-defined regime, which makes a distinction between a the same purpose and same temporal validity but differing conservative (Mold retained, Mnew discarded) and an inte- subjects have been identified, which under a fixed intersub- grative (Mold discarded, Mnew added to knowledge base) jective reconstruction scheme is not possible; therefore, this regime. Alternatively, if Mnew is based on a larger set of relationship is not explicitly handled in the following. If a vector models than Mold , Mnew is added to the knowledge 5 4 Knowledge level 3 2 1 1.,52 1.,51 1.,5 0 1.,49 Timestamp [109s] 1.,48 1.,47 Features Figure 6: Three-dimensional visualization of a hierarchical knowledge base acquired by constructivist machine learning. Nodes represent machine models, except for level 0 whether nodes indicate which of the 349 features of the input data are used. Gray bars at the x-axis indicate sets of similar features of the input data. Graph edges indicate connections betweens machine models. base and Mold is discarded; otherwise, Mold is retained and of a given model on a given input. High intersubjectivity Mnew discarded. This regime is referred to as opportunistic. and low ambiguity can be achieved for learned models and Knowledge Abstraction. A T Σ relationship provides the knowledge bases by implementing consent-oriented multi- basis to construct a new model on the next higher level of algorithm supervised learning. The suggested deconstruc- the knowledge base. In this case, both models share a con- tion mechanisms allow to update a knowledge base automat- gruent temporal validity and a common set of model subjec- ically. Moreover, the deconstruction process defined even fa- tives while differing in their model purpose. First, the newly cilitates automated knowledge abstraction based on existing reconstructed model is stored to the knowledge base. The models of the knowledge base (Fig. 6). Given these features, old model from the knowledge base is left unaltered. Using constructivist machine learning is an ideal framework for the outputs, or target values respectively, of the T Σ-related applications in which diverse data sources need to be inte- models, a new learn block without target values is formed. grated, knowledge needs to be both assessible and automat- This learn block is assigned a higher level than the underly- ically updated, and where ambiguity has to be resolved. ing models possess in the knowledge base and transferred to Based on the presented principles, we will extend our ap- a construction process, from which all further learning pro- proach of combining machine learning and knowledge en- cesses may be passed. Thereby, repeated abstraction from a gineering in the future. While using generic descriptors for single learn block is possible. Knowledge abstraction may model purposes already allows to create a generic ontol- be limited by a user-defined maximum of knowledge levels. ogy, e.g., it is in practice desirable to match automatically learned knowledge representations with existing ontologies. IV. Conclusions and Future Work By this, it may become possible to transform existing knowl- With this work, we have defined implementation princi- edge bases at least partially into automatically managed sys- ples for a constructivist machine learning framework1 . We tems. Further, we need to emphasize that our approach is have demonstrated that by combining Stachowiak’s Gen- currently focusing on conceptual knowledge, but not limited eral Model Theory and constructivist learning theories, ma- to this domain. Future work will in particular include work chine learning algorithms can be used to create, exploit on procedural knowledge and on ways to combine concep- and maintain hierarchical knowledge bases. In contrast to tual and procedural knowledge in a meta-cognitive domain. classical machine learning, this allows for an explicit rep- resentation of acquired knowledge. Further, the employed Acknowledgements meta data structures support decisions on the applicability I would like to thank Dmitrij Denisenko, Florian Große, 1 Dennis Carrer and Michael Hermelschmidt for reviewing Currently, we are implementing conML for Python and other and discussing implementation details and working on re- languages. The project is available online from our Git repository at http://git.informatik.uni-leipzig.de/ml-group implementations of the original conML prototype. References Krippendorff, K. 1970. Bivariate agreement coefficients for reliability of data. In Bortatta., ed., Sociological Methodol- Anderson, L. W., and Krathwohl, D. R. 2001. A taxonomy ogy. 139–150. for learning, teaching and assessing: a revision of Bloom’s taxonomy of educational objectives. New York: Longman. Krippendorff, K. 2004. Reliability in content analysis: some common misconceptions and recommendations. Hu- Baçao, F.; Lobo, V.; and Painho, M. 2005. Self-organizing man Communication Research 30(3):411–433. maps as substitutes for k-means clustering. In Computa- tional Science–ICCS 2005. Springer. 476–483. Levey, C. A. 1991. Neural network having expert system functionality. U.S. patent US5398300A. Bloom, B. S. 1956. Taxonomy of educational objectives. Vol. 1: cognitive domain. New York: McKay. Lifschitz, V.; Morgenstern, L.; and Plaisted, D. 2008. Knowledge representation and classical logic. In van Chavent, M.; Kuentz-Simonet, V.; Liquet, B.; and Saracco, Harmelen, F.; Lifschitz, V.; and Porter, B., eds., Handbook J. 2012. ClustOfVar: an R package for the clustering of of Knowledge Representation, volume 3 of Foundations of variables. Journal of Statistical Software 50(13). Artificial Intelligence. Elsevier. chapter 1, 3–88. Davis, R.; Shrobe, H.; and Szolovits, P. 1993. What is a Markov, N. T., and Kennedy, H. 2013. The importance knowledge representation? AI Magazine 14(1):17–33. of being hierarchical. Current Opinion in Neurobiology Drescher, G. L. 1989. Made-up minds : a constructivist 23(2):187–194. approach to artificial intelligence. Ph.D. Dissertation, Mas- Martin, A.; Hinkelmann, K.; Gerber, A.; Lenat, D.; van sachusetts Institute of Technology. Harmelen, F.; and Clark, P. 2019. Preface. In Martin, A.; Falkner, A., and Haselböck, A. 2013. Challenges of knowl- Hinkelmann, K.; Gerber, A.; Lenat, D.; van Harmelen, F.; edge evolution in practice. AI Communications 26(1):3–14. and Clark, P., eds., Proceedings of the AAAI 2019 Spring Symposium on Combining Machine Learning with Knowl- Fox, R. 2001. Constructivism examined. Oxford Review of edge Engineering (AAAI-MAKE 2019). Education 27(1):23–35. Martinez-Gil, J. 2015. Automated knowledge base manage- Gallant, S. I. 1988. Connectionist expert systems. Commun. ment: A survey. Computer Science Review 18:1–9. ACM 31(2):152169. Morik, K.; Kietz, B.-E.; Emde, W.; and Wrobel, S. 1993. Guisado-Gámez, J.; Dominguez-Sal, D.; and Larriba-Pey, Knowledge Acquisition and Machine Learning. San Fran- J.-L. 2013. Massive query expansion by exploiting graph cisco, CA, USA: Morgan Kaufmann Publishers. knowledge bases. arXiv 1310.5698. Quartz, S. R. 1993. Neural networks, nativism, and the Hayes, A. F., and Krippendorff, K. 2007. Answering the call plausibility of constructivism. Cognition 48(3):223–242. for a standard reliability measure for coding data. Commu- Reich, K. 2004. Konstruktivistische Didaktik. Lehren und nication Methods and Measures 1(1):77–89. Lernen aus interaktionistischer Sicht. Munich: Luchterhan, Hendler, J. A. 1989. On the need for hybrid systems. Con- 2nd edition. nection Science 1(3):227–229. Richardson, M., and Domingos, P. 2003. Building large Herrmann, C. S. 1995. A hybrid fuzzy-neural expert system knowledge bases by mass collaboration. In Proceedings of for diagnosis. In Proceedings of the 14th International Joint the 2nd International Conference on Knowledge Capture, Conference on Artificial Intelligence - Vol. 1, 494–500. San K-CAP 03, 129–137. New York, NY, USA: Association for Francisco, CA, USA: Morgan Kaufmann Publishers. Computing Machinery. Hertz, H. 1894. Die Prinzipien der Mechanik in neuem Schmid, T. 2018. Automatisierte Analyse von Impedanz- Zusammenhange dargestellt. In Hertz, H., ed., Gesammelte spektren mittels konstruktivistischen maschinellen Lernens. Werke, volume 3. Leipzig: Barth. Ph.D. Dissertation, Leipzig. Hinton, G. E., and Salakhutdinov, R. R. 2006. Reducing Schmid, T. 2019. Deconstructing the final frontier of arti- the dimensionality of data with neural networks. Science ficial intelligence: Five theses for a constructivist machine 313(5786):504–507. learning. In Martin, A.; Hinkelmann, K.; Gerber, A.; Lenat, Hudson, D. L.; Cohen, M. E.; and Anderson, M. F. 1991. D.; van Harmelen, F.; and Clark, P., eds., Proceedings of Use of neural network techniques in a medical expert sys- the AAAI 2019 Spring Symposium on Combining Machine tem. International Journal of Intelligent Systems 6(2):213– Learning with Knowledge Engineering (AAAI-MAKE 2019). 223. Skeirik, R. D. 1990. Neural network/expert system process Jain, A. K. 2010. Data clustering: 50 years beyond k-means. control system and method. U.S. patent US5121467A. Pattern Recognition Letters 31(8):651–666. Stachowiak, H. 1973. Allgemeine Modelltheorie. Springer. Karabatak, M., and Ince, M. C. 2009. An expert system Subasic, P.; Yin, H.; and Lin, X. 2019. Building knowledge for detection of breast cancer based on association rules and basethrough deep learningrelation extraction and wikidata. neural network. Expert Systems with Applications 36(2, Part In Martin, A.; Hinkelmann, K.; Gerber, A.; Lenat, D.; van 2):3465–3469. Harmelen, F.; and Clark, P., eds., Proceedings of the AAAI 2019 Spring Symposium on Combining Machine Learning Kidd, A. 2012. Knowledge acquisition for expert systems: with Knowledge Engineering (AAAI-MAKE 2019). A practical handbook. Springer Science & Business Media.