=Paper= {{Paper |id=Vol-2491/abstract63 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2491/abstract63.pdf |volume=Vol-2491 |dblpUrl=https://dblp.org/rec/conf/bnaic/HarmelenT19 }} ==None== https://ceur-ws.org/Vol-2491/abstract63.pdf
          A Boxology of Design Patterns for
        Hybrid Learning and Reasoning Systems

                      Frank van Harmelen, Annette ten Teije

                           Vrije Universiteit Amsterdam
                  {frank.van.harmelen,annette.ten.teije}@vu.nl




        Abstract. We propose a set of design patterns to describe a large vari-
        ety of systems that combine statistical techniques from machine learning
        with symbolic techniques from knowledge representation. As in other
        areas of computer science (knowledge engineering, software engineering,
        process mining), such design patterns help to systematize the literature,
        clarify which combinations of techniques serve which purposes, and en-
        courage re-use of software components. We have validated our composi-
        tional design patterns against a large body of recent literature.1


    Recent years have seen a strong increase in interest in combining Machine
Learning methods with Knowledge Representation methods, fuelled by the com-
plementary functionalities of both types of methods, and by their complementary
strengths and weaknesses. This increasing interest has resulted in a large volume
of diverse papers in a variety of venues, and from a variety of communities. This
paper is an attempt to create structure in this large, diverse and rapidly grow-
ing literature. We present a conceptual framework, in the form of a set of design
patterns, that can be used to categorize techniques for combining learning and
reasoning. In the full paper, we have validated our design patterns against more
than 50 papers from the research literature from the last decade. Our claim is
that each of the systems that we encountered in those references is captured by
one of our design patterns. Broadly recognized advantages of such design pat-
terns are: they distill previous experience in a reusable form for future design
activities, they encourage re-use of code, they allow composition of such patterns
into more complex systems, they provide a common language in a community,
and they are a useful didactic device
    Our design patterns are expressed in a graphical notation2 . We use ovals to
denote algorithmic components that perform some computation, and boxes to
denote their input and output. We distinguish two types of algorithmic compo-
nents (ovals): those that perform some form of deductive inference (labelled as
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).

1
     Full version in J. of Web Engineering, (18),1-3,97-124, 2019.
2
    boxology: ”A representation of an organized structure as a graph of labelled nodes
    and connections between them”, https://www.definitions.net/definition/boxology
2       F. van Harmelen & A. ten Teije

the ”KR” components) and those that perform some form inductive inference
(the ”ML” components): KR ML . We also use two kinds of input- and
                                                     sym data
output-boxes: symbolic structures, and other data:                .
    We now present some example patterns for hybrid systems that perform
reasoning and learning. The full paper presents a larger set of 15 patterns.
    From symbols to data and back again A recent class of ”graph com-
pletion” systems [3] apply inductive techniques to a knowledge graph to predict
addition edges. Almost all graph completion algorithms first translate the knowl-
edge graph to a representation in a high-dimensional vector space (a process
called ”embedding”), described by the following pattern:
                    sym        ML        data       ML         sym

    Deriving an intermediate abstraction for reasoning In [2] a raw data-
stream is first abstracted into a stream of symbols with the help of a symbolic
ontology, and this stream of symbols is then fed into a classifier (which performs
better on the symbolic data than on the original raw data).
                    data       KR        sym        ML         sym

                               sym

   Learning with symbolic information as a prior The following design
pattern describes machine learning systems that use prior knowledge:
                              data        ML        data

                                         sym

An example of this are the Logic Tensor Networks in [1], where the authors show
that encoding prior knowledge in symbolic form allows for better learning results
on fewer training data, as well as more robustness against noise.
   Concluding comments Each design pattern abstracts from specific mathe-
matical and algorithmic details of the specific components, and only looks at the
functional behaviour of the pattern and the functional dependencies between the
components. This makes our descriptions of hybrid systems as design patterns
abstract and general.

References
1. Donadello, I., Serafini, L., d’Avila Garcez, A.S.: Logic tensor networks for semantic
   image interpretation. In: IJCAI. pp. 1596–1602 (2017)
2. Kop, R., et al.: Predictive modeling of colorectal cancer using a dedicated pre-
   processing pipeline on routine electronic medical records. Comp. in Bio. and Med.
   76, 30–38 (2016)
3. Paulheim, H.: Knowledge graph refinement: A survey of approaches and evaluation
   methods. Semantic Web 8(3), 489–508 (2017)