ESC-Rules: Explainable, Semantically Constrained Rule Sets Martin Glauer1,* , Robert West2 , Susan Michie3 and Janna Hastings1,3,* 1 Institute for Intelligent Interacting Systems, Otto-von-Guericke University Magdeburg, Germany 2 Department of Behavioural Science and Health, University College London, UK 3 Department of Clinical, Educational and Health Psychology, University College London, UK Abstract We describe a novel approach to explainable prediction of a continuous variable based on learning fuzzy weighted rules. Our model trains a set of weighted rules to maximise prediction accuracy and minimise an ontology-based ’semantic loss’ function including user-specified constraints on the rules that should be learned in order to maximise the explainability of the resulting rule set from a user perspective. This system fuses quantitative sub-symbolic learning with symbolic learning and constraints based on domain knowledge. We illustrate our system on a case study in predicting the outcomes of behavioural interventions for smoking cessation, and show that it outperforms other interpretable approaches, achieving performance close to that of a deep learning model, while offering transparent explainability that is an essential requirement for decision-makers in the health domain. Keywords explainable machine learning, rules learning, semantic loss, evidence synthesis, behaviour change 1. Introduction The rate evidence is generated outstrips the rate at which it can be synthesised, necessitating au- tomated approaches [1]. The Human Behaviour-Change Project (HBCP) is an interdisciplinary collaboration between behavioural scientists, computer scientists and information systems architects that aims to build an end-to-end automated system for evidence synthesis in be- havioural science [2]. For this objective, explanations of the predictions are as important as the accuracy of the system, since the intended users are practitioners and policy-makers who will use the insights gained from the evidence in order to make recommendations thus require suitable transparency and accountability [3]. Deep neural networks typically operate as black boxes without giving intrinsic insights into why specific predictions have been made.1 Thus, there is a need for “glass-box” explainable machine learning frameworks for making predictions and recommendations that can transpar- NeSy 22: 16th International Workshop on Neural-Symbolic Learning and Reasoning, 28-30 September, London, UK * Corresponding author. $ martin.glauer@ovgu.de (M. Glauer); j.hastings@ucl.ac.uk (J. Hastings) € https://jannahastings.github.io/ (J. Hastings)  0000-0001-6772-1943 (M. Glauer); 0000-0001-6398-0921 (R. West); 0000-0003-0063-6378 (S. Michie); 0000-0002-3469-4923 (J. Hastings) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 1 Although, methods to give explanations for such networks are advancing [4, 5]. ently provide complete explanations in a form that matches the semantic expectations of the users [6, 7]. We aimed to develop an explainable system for the prediction of behaviour change inter- vention outcomes, based on a corpus of annotated literature together with features from an ontology - the Behaviour Change Intervention ontology [8] - and their logical relationships. Straightforward application of semantic approaches is not well suited for the quantitative task of predicting intervention outcomes, and moreover traditional symbolic learning approaches such as rules or decision trees lead to explanations that are overly complex and not ranked by their quantitative impact on the outcome variable. A deep neural network approach had better quantitative performance, but was not acceptable for our users due to the lack of transparency and explainability. Thus, we aimed to develop a ‘best of both worlds’ hybrid predictive approach that combined aspects of the symbolic and neural approaches. Rules-based systems are inherently explainable because the features appear transparently in the rules. Our approach builds on systems that are able to learn rules from data. One of the earliest learning rule neural network systems was the Knowledge Based Artificial Neural Network - KBANN [9]. This approach translates domain knowledge into rules which are encoded into the structure of a neural network, for which weights are then learned. The approach that we developed has furthermore been inspired by traditional ‘neuro-fuzzy’ systems. It is, in particular, based on the principles of Tagaki-Sugeno controllers [10]. These controllers have been developed to account for the fact that expert knowledge, albeit valuable, is often too vague to be turned into rigid logical rules, thus necessitating fuzzy and weighted approaches to rules learning. 2. Explainable Semantically Constrained Rule Sets We describe our approach in terms of feature preparation, architecture and optimisation, and semantic penalties. The system is implemented in Python using PyTorch. Source code is available at https://github.com/HumanBehaviourChangeProject/semantic-prediction. 2.1. Feature preparation A precondition for our rules-based approach is that all input data features are binarized. Thus, categorical variables in the dataset are exploded into separate columns per value, and contin- uous (quantitative) variables are binarized by selecting ranges using one of several different approaches depending on the meanings of the values: • Separation into meaningful semantic categories, e.g. for our case study we transform mean age values into child, young adult, older adult, and elderly, delineated with a fuzzy membership operator since the boundaries between categories are not rigid. • Fixed-width categories, delineated with a fuzzy membership operator. For example, we divide number of times tobacco smoked into groups of width 5 corresponding to <5, <10, <15, ... <50. Note that this formulation creates an ordering because if the value is e.g. 6, then all of <10, <15, ..., <50 will be set on, and if the value is 46, only <50 will be set on. • Categories selected based on quantiles in the dataset (i.e. a fixed proportion of the available data in each grouping, rather than fixed range of values), again using the