<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards an Algebraic Approach to Theory and Concept Evaluation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pietro Galliani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Free University of Bozen-Bolzano</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>I introduce a theoretical framework for reasoning about the values of theories (understood as sets of elements of random conceptual algebras) and about the values of concepts with respect to theories. Then I de ne theory formation games and argue that they provide an adequate theoretical framework for the evaluation of strategies and algorithms for computational creativity; and, nally, I brie y discuss two possible practical applications of this framework, namely the automatic search for Natural Deduction rule systems for logics and the construction of falling-rule-list-plus-de nitions classi cation models.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        One problem common to computational models of creativity, in mathematics
or in other areas, is that it is often less than clear how such models, or their
products, are to be evaluated [
        <xref ref-type="bibr" rid="ref11 ref15 ref18 ref19 ref20">17, 19, 14, 18, 10</xref>
        ]. There exist models of concept
formation that have been shown to be, in principle, capable of reproducing
nontrivial conceptual leaps, of the kind that { if produced spontaneously by a human
being { would surely be regarded as creative: for instance, in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] it is shown how
the notion of prime ideal can arise from the blending of two conceptual spaces
representing commutative rings with unity and integer numbers respectively.
However, the same systems are equally able to generate, by the same techniques,
concepts that a human would instantly recognize as spurious and of little value.
It is not currently clear which criteria humans use for such an evaluation: while
some conditions that, let us say, a conceptual blending operation [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ] should satisfy
in order for its result to be meaningful have been proposed (e.g. [
        <xref ref-type="bibr" rid="ref7 ref8">7, 6</xref>
        ]), it is fair
to say that, at the moment, models of creativity and concept formation are
relatively well suited for the formal explanation of creative concept formation
but not as useable for tools for the assisted (let alone autonomous) creation of
novel and useful concepts in mathematics or in other disciplines.
      </p>
      <p>
        In this work, I present a simple, abstract, algebraic formalization of the
notion of the value of a theory T (or of a concept c given the theory T). This
formalization does not involve the modelling of researchers as individual agents
with utilities and possible actions, as it is often done in the area of social
simulation (see e.g. [
        <xref ref-type="bibr" rid="ref1 ref12 ref22 ref6">1, 11, 21</xref>
        ]): while such approaches can often be the source of
very valuable insights, the high number of possible choices involved in their
speci cation poses serious di culties to the analysis of their implications. Finally,
Copyright © 2018 for this paper by its authors. Copying permitted for private and academic purposes.
      </p>
      <p>
        I brie y discuss two potential applications of this framework to the problems
of building Natural Deduction systems and of improving the interpretability of
falling rule list classi cation models [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Basic de nitions</title>
      <p>Concepts, mathematical or non-mathematical, do not live in isolation; and, to
a large degree, creative activity can be understood as the generation of novel
concepts through the combination of known ones. However, as remarked in the
introduction, not all such combinations are feasible, let alone fruitful. This
justi es the following de nition:
De nition 1 (Conceptual Algebra). A conceptual algebra A is a pair (A; ),
where
{ A is a set of concepts;
{ is a partial operation over A.1</p>
      <p>In the above de nition, the algebra A is not required to be commutative. In
this way, we can distinguish between two possibly distinct roles (active/passive)
of concepts in a concept combination: a concept can either be applied to another
concept to modify it, or be modi ed by the application of another concept. To
make an example, if we take the concept of \prime ideal" as a combination of
the concepts \prime number" and \ideal", it is clear that it is the notion of
primality that is being applied to the notion of ideal and not vice versa; and
indeed, the resulting concept belongs to the conceptual domain of ideals rather
than to the one of numbers. Quite arbitrarily, in a concept combination a b the
left operand a is assumed to take the active role and the right one b is assumed
to take the passive one.</p>
      <p>A theory T is then de ned as a set of concepts in a conceptual algebra.
It is important to emphasize here, for the following analysis to be meaningful,
that a theory is not understood as a set of axioms : instead, it is taken to be
the set of all concepts - be them statements, rules, heuristics, or proofs - that
are part of the \received knowledge" about a subject. For instance, the Riesz
Representation Theorem is certainly part of the theory of modern functional
analysis, even though it cannot meaningfully be considered one of its axioms.2
De nition 2 (Theory). A theory over a conceptual algebra A = (A; ) is a
nite subset T A.
1 That is, it is a subset A of A A A such that (a; b; c); (a; b; c0) 2 A implies
c = c0. As usual, we write a b = c for (a; b; c) 2 A.
2 In fact, it is questionable whether a notion such as \the axioms of functional analysis"
has a meaningful and unique denotation to begin with: axiomatic systems can be
very useful tools to systematize mathematical knowledge, but they are not necessarily
epistemically prior to it.</p>
      <p>How valuable would such a theory be? In our framework, we do not have any
access whatsoever to properties of individual concepts such as their \simplicity"
or \elegance": \internal" properties a ecting the values of individual concepts
could be added to our framework easily enough, but for now it is preferable to
avoid such complications and treat concepts as individually indistinguishable.
The only di erence between concepts in this framework, thus, lies in which
further concepts can be obtained (or \derived") from them through applications of
the concept combination operator .</p>
      <p>The following analysis of theory value is based on three guiding principles:
Guiding Principle 1: Storing and using theories involves costs (e.g. paper,
disk space, e ort required for researchers to achieve pro ciency with them...).
These costs grow approximately linearly with the number of concepts
contained in the theory.</p>
      <p>Guiding Principle 2: The more concepts can be derived from a theory, the
more valuable it is.</p>
      <p>Guiding Principle 3: Concepts that can be derived from a theory in few steps
contribute more to its value than concepts that can be derived from it only
in many steps.</p>
      <p>In accordance with the above principles, I now propose a de nition for the
value v(T) of a theory T in a conceptual algebra A. First, however, I need a
couple of auxiliary concepts:
De nition 3 (n-th level span and edge of a theory). Let A = (A; ) be a
conceptual space and let T A be a theory in it. Then for all n 2 N, its n-th
level span hTin and its n-th level edge [T]n are de ned recursively as follows:
{ hTi0 = T;
{ For all n 2 N, [T]n = fc 2 AnhTin : 9a 2 hTii; b 2 hTij s.t. a b =
c and i + j = ng;
{ For all n 2 N, hTin+1 = hTin [ [T]n.</p>
      <p>In other words, hTin is the set of all concepts that can be derived from those in
T within n concept combination steps,3, while [T]n is the set of all those that
cannot, but could be derived in one extra step.</p>
      <p>
        As per Guiding Principle 2, all concepts in hTi = T[Sk1=0[T]k contribute
to the value v(T) of T. Furthermore, as per Guiding Principle 3, the concepts
in T |which are already in the theory as given| contribute individually more to
v(T) than those in [T]0, which require one concept formation step to be obtained;
and, in general, the individual contributions of concepts in [T]i are greater than
those of concepts in [T]j whenever i &lt; j. As no further information regarding
3 It may be worth pointing out, once more, that these are not meant to be deduction
steps in a speci c formal system. The concepts contained in T may correspond to
statements; but they may also correspond to de nitions, heuristics or so on.
the contributions of concepts to the value of theory is available, a conservative
choice is to let |by choice of unit| the concepts in T individually provide a
unit contribution to the value of the theory,4 and to let the concepts in [T]i
contribute for vi 2 [0; 1] each, where 1 &gt; v0 &gt; v1 : : : &gt; vn : : : Additionally, it is
reasonable to require vn to tend to zero for n ! 1: indeed, the contribution of a
concept to the value of theory becomes negligible as the length of the derivation
necessary to obtain it from concepts in the theory grows extremely large. Thus,
and by analogy with the notion of discounted reward in reinforcement learning
[
        <xref ref-type="bibr" rid="ref23">22</xref>
        ], a simple (and by no means unique, but reasonable enough until and unless
evidence to the contrary is found) choice would be to let vk = k+1 for some
xed discount rate 2 [0; 1].
      </p>
      <p>Finally, as per Guiding Principle 1, working with a theory T carries costs
which grow linearly with its size jTj: thus, v(T) also contains a term CjTj,
where C 2 R 0.</p>
      <p>We thus obtained the following de nition, which is the central de nition of
this work:
De nition 4 (Value of a theory in a conceptual algebra). Let C 2 R 0
be the cost per concept and let 2 [0; 1] be the concept derivation discount rate.
Then for every conceptual algebra A = (A; ) and every theory T A, the value
vC; (T) is given by
vC; (cjT) = vC; (T [ fcg)
vC; (Tnfcg):</p>
      <p>Note that this de nition makes sense regardless of whether the concept c is
or is not part of T already; and that, moreover, it follows easily from it that if
T = fc1 : : : cng then vC; (T) = Pn</p>
      <p>t=1 vC; (ctjfci : i &lt; tg).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Towards a Formal Model of Creativity</title>
      <p>The simple model presented above can be used as a component of a formal,
abstract framework through which to evaluate the creativity of an algorithm.
4 In other words, the unit of value corresponds to the amount of value that a single
concept would contribute to the theory if added directly to it, before considering
costs and possible derivations.</p>
      <p>The idea, in brief, is that a procedure is creative insomuch as it can generate
valuable theories in an unknown (but { at a cost { explorable) conceptual algebra.
In what follows, I present a decision-theoretical formalization of this intuition.
De nition 6 (Theory Formation Games). Let A = (A; ) be a conceptual
algebra. Furthermore, let 0 2 R 0 be a budget, let ; 2 R 0 be the costs
associated to combination and random exploration respectively, and { as usual
{ let C and be the memorization cost and the concept derivation discount rate
respectively. Then a theory formation game position is a tuple p = (K; F ; ),
where</p>
      <sec id="sec-3-1">
        <title>1. K A is a set of known concepts;</title>
        <p>2. F is a set of known facts, of the form a b = c or of the form a b =# for
a, b and (if applicable) c in K;
3. 2 R 0 is the remaining budget.</p>
        <p>The initial position of the game is (;; ;; 0),5 and the possible moves given a
given position (K; F ; ) are de ned as follows:
{ If then, for all a; b 2 K, a b = ? is a possible move. If a b is de ned
in A and is some c 2 A, then the successor position is (K [ fcg; F [ fa b =
cg; ); otherwise, it is (K; F [ fa b =#g; ).
{ If then ?? is a possible move. The successor positions are then of the
form (K [ fcg; F ; ), where the concept c is an arbitrary concept in A.
{ For all T K, Choose(T) is always a possible move which ends the game.</p>
        <p>T is then the output theory.</p>
        <p>A strategy for such a game is a function from positions (K; F ; ) to legal
moves. A complete play of such a strategy is then a sequence p0p1 : : : pn, where
p0 is the starting position (;; ;; ), for each i = 1 : : : n it is the case that pi+1
is a possible successor of position pi under the move (pi), and (pn) is of the
form Choose(T). The value of such a play is then simply the value vC; (T) of
the output theory T.6</p>
        <p>The value of a strategy is then de ned simply as the expectation of the
value of its complete plays, when the elements c added to K during ?? moves
are chosen randomly from the uniform distribution7 over the set A of all possible
concepts.
5 Of course, this can be changed if one wishes to consider games starting from
nonempty sets of known concepts and facts.
6 It is easy to see that no in nite-length plays are possible in theory formation games,
as a b = ? and ?? moves decrease the available budget until only play-ending
moves Choose(T) remain available.
7 Or from another suitable distribution. Note that the selected c may be in K already:
this re ects the fact that, if most of the concepts relevant to a certain subject are
known already, it is rarely productive to go shing for the others in a random,
undirected way.</p>
        <p>In the above de nition, theory formation games are speci ed with respect
to a xed conceptual algebra A. However, it is not usually the case that the
structure of a conceptual algebra is known in advance; and indeed, there would
not be any reason for the positions of concept formation games to contain a set
F of known facts if only one conceptual algebra was considered possible. But as
we will now see, it is easy to extend theory formation games to distributions of
algebras:
De nition 7 (Theory Formation Games: Values of Strategies with
Distributions). Let P(A) be a probability distribution over conceptual algebras
(A; ) all having the same domain A,8. Moreover, let , , , C and be as
before and let be a strategy. Then the value of with respect to the distribution
P(A) is the expectation of the value of with respect to a random algebra A
chosen according to the distribution P(A).</p>
        <p>
          In brief, the value of a strategy represents its ability to nd valuable theories
in an unknown conceptual algebra, when exploring said algebras { either by
computing the combination of two concepts through a b = ? moves or by
attempting to nd possibly novel concepts through ?? moves { carries a cost.
This framework is rather reminiscent of bandit algorithms [
          <xref ref-type="bibr" rid="ref24">23</xref>
          ]: but whereas the
main problem that bandit algorithms need to solve is to nd the optimal balance
between the exploitation of known options and the exploration of novel ones, in
the case of theory formation games the chief di culties that a strategy must
address lie rst in balancing the search for novel concepts with the investigation
of the properties of the known ones, and then in selecting { on the basis of the
information gathered { a set of concepts which is expected to be of optimal value.
        </p>
        <p>I leave to future work the discussion and comparison of various possible {
and computationally feasible { strategies in concept formation games; here, I
merely observe that the problems that such strategies must overcome to be
successful appear to mirror very closely the ones that humans face when involved
in activities such as mathematical research, and that therefore this framework is
a promising abstract testbed for the study of creatively advantageous strategies.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Two Examples</title>
      <p>Until now, the framework introduced in this work has been discussed from a very
abstract perspective. In this section, I will instead show two practical examples of
it, related to the problem of axiomatizing a logic and to the problem of learning
falling rule list and de nitions from a given dataset.</p>
      <p>Let L be a logic with Modus Ponens and Contraction { for instance,
propositional logic { and let the (in nite) algebra A be the set of all Natural Deduction
rules</p>
      <p>P1 : : : Pn</p>
      <p>Q
which are valid in L, plus the following other types of objects:
8 For simplicity, we can assume that A = f1 : : : N g for some integer N .
{ For every i 2 N, a premise selection operator Seli;
{ For every i 2 N and any rule r 2 A with at least i premises, an object
Seli(r) =</p>
      <p>P1 : : : Pbi : : : Pn</p>
      <p>Q
which di ers from r only in that its i-th premise is highlighted;
{ For every i; j 2 N with i &lt; j, an object Eqi;j ;
{ For every atomic expression A and (possibly complex) expression S, a
substitution operator Sub(A; S);</p>
      <sec id="sec-4-1">
        <title>The concept combination rule is de ned as follows:</title>
        <p>{ For all i 2 N, if r 2 A has at least i premises then Seli r = Seli(r);
{ For all i 2 N, if r; Seli(s) 2 A and the conclusion Q of r is the same as
the highlighted premise Pbi of Seli(s) then r Seli(s) is the rule obtained by
replacing its marked premise with r. In other words,</p>
        <p>S1 : : : Sk</p>
        <p>Pi</p>
        <p>P1 : : : Pbi : : : Pn = P1 : : : Pn 1S1 : : : SkPi+1 : : : Pn</p>
        <p>Q Q
;
{ For all i; j 2 N with i &lt; j and rules r 2 A with at least j premises and such
that its i-th and j-th premises are equal, then Eqi;j r is the rule obtained
from r by removing its j-th premise. In other words,</p>
        <p>Eqi;j</p>
        <p>P1 : : : Pi : : : Pj : : : Pn = P1 : : : Pi : : : Pj 1Pj+1 : : : Pn</p>
        <p>Q Q
whenever Pi = Pj ;
{ For all substitution operators Sub(A; S) and all rules r, Sub(A; S) r is the
result of replacing all occurrences of A in the premises and conclusion of r
with the expression S;
{ For all a; b 2 A, if none of the above cases are applicable then a b is unde ned.</p>
        <p>Now let T be a theory in our algebra { that is to say, a nite set of rules
r, premise highlighting operators Seli, rules with highlighted premises Seli(r),
premise merging operators Eqi;j and substitution operators Sub(A; S). Then, for
a xed choice of cost C and discount rate , the value
describes the tradeo between the cost of memorizing such a set of rules and
operators and their usefulness as tools to generate further rules; and the
corresponding theory formation game describes the process of seeking such a set of
rules when performing deductions or sampling random rules carries a cost.</p>
        <p>This framework can be easily modi ed in various ways: for instance, one
could easily modify the de nition of [T]n by making the operators Seli, Eqi;j
and Sub(A; S) always available as left operands, regardless if they are in any
[T]i, and furthermore require T to consist only of rules r. Then, it would also
be advisable to replace the expressions j[T]nj in the de nition of v(T) with a
ner-grained notion of the contribution of [T]n to the value of a theory, one
which would likewise count only rules and { for instance { count them only up
to substitution.</p>
        <p>
          A more complex variation on this theme could be to adapt the framework
for theory evaluation described in this work to falling rule lists ([
          <xref ref-type="bibr" rid="ref25">24</xref>
          ]).
        </p>
        <p>
          Falling rule lists are classi cation models which can be represented as
cascading sequences of if ...then ...else ... rules, to be applied in order to
a given input until a match is found. Despite their apparent simplicity, in many
cases their performances are comparable to these of less easily interpretable
methods, and it was recently shown in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] that certi ably optimal falling rule
lists for categorical data9 can be derived e ciently.
        </p>
        <p>if (age = 23{25) ^ (priors = 2{3) then YES
else if (age = 18{20) then YES
else if (sex = male) ^ (age = 21{22) then YES
else if (priors &gt; 3) then YES
else NO
end if</p>
        <p>One of the main advantages of falling rule lists lies in their ease of
interpretability: given a falling rule list and an input, it is possible not only to extract
a prediction but also to straightforwardly present a justi cation for it, that is,
to produce a list of reasons why a given rule is applicable and no previous one
is.</p>
        <p>However, the interpretability of a falling rule list degrades rapidly with the
increase of the number of features: while the falling rule list of Figure 1 is
perfectly readable, an analogous falling rule list involving even just fty di erent
input features { a comparatively modest input dimensionality, in many modern
applications { would be markedly less so.</p>
        <p>A natural way to surpass this di culty could consist in introducing complex
concepts, de ned in terms of combinations of input concepts which occur more
than once in the left hand side of rules, and rewriting the falling rule list in
order to make use of them, as in Figure 2: in many cases, this would be expected
not only to improve the readability and interpretability of the model, but also
to provide us with de nitions for complex concepts which are relevant to the
classi cation problem in exam.
9 That is, data in which input features and outputs take values in nite, xed domains.
if (A ^ B ^ C ^ E ^ F ^ G) then YES
else if (A ^ C ^ D ^ G) then YES
else if (A ^ C ^ G ^ H) then YES
else if (A ^ F ^ H) then YES
else if (A ^ G) then YES
else NO
end if
if (T1 ^ B ^ E ^ F ) then YES
else if (T1 ^ D) then YES
else if (T1 ^ H) then YES
else if (A ^ F ^ H) then YES
else if (T2) then YES
else NO
end if</p>
        <p>When generating de nitions and falling rule lists, one must consider tradeo s
between the total size of the model, its predictive power, and the number of
steps required to classify instances.10 This is reminiscent of the tradeo between
representation size and derivation length which has been previously discussed in
this work; and, as it will now be sketched, it can be reduced to it.</p>
        <p>Very brie y, the elements of the algebra A in this case are de nitions X
Condition, dataset instances d with their attributes, and falling rule lists
indexed by dataset instances d : R. The combination operator can be used to
combine two de nitions, applying the former inside of the latter; or to combine
a concept de nition with a dataset instance which satis es it, adding the de ned
concept to the instance; or to combine a dataset instance d with a rule list d : R,
causing the rule list to yield a success d&gt; (if its rst rule is applicable to d and
its output is correct), a failure d? (if its rst rule is applicable to d but its
output is incorrect), or returning the rule obtained by removing its rst premise (if
its rst rule is not applicable to d). Figure 3 contains examples illustrating the
behaviour of these rules.</p>
        <p>It is useful to remark that, according to the above mentioned rules, the
generation of a success (or a failure) for a dataset instance corresponds in a
natural way to the explanation of its classi cation in terms of the given rule list
and de nitions. For instance, consider the rule set and de nitions of Figure 2
right, and the dataset instance d : A; C; D; G; H; YES. Then a possible way to
generate d&gt; could be given by rst applying d to the rule list, thus removing the
rst rule as inapplicable; then applying the two de nitions to d, thus obtaining
d : A; C; D; G; H; T2; T1; YES; and nally applying the rst rule of the derived
rule list (that is, the second rule of the original rule list) if (T1 ^ D) then YES.
10 The motivation for attempting to minimize this number does not rise from
computational costs as much as from explainability issues: the more steps { that is, de nition
applications or discardings of inapplicable premises { are necessary to classify an
instance, the less readable the explanation of its classi cation will be.
T2</p>
        <p>A ^ G
A ^ G
d : A; G; H
=
=
d : A; F; H; YES
d :
= d :
if (A ^ F ^ H) then</p>
        <p>YES
else if (T2) then</p>
        <p>YES
else NO
end if
if (T2 ^ C ^ H) then</p>
        <p>YES
else if (A^F ^H) then</p>
        <p>YES
else if (T2) then</p>
        <p>YES
else NO
end if
if (A ^ F ^ H) then</p>
        <p>YES
else if (T2) then</p>
        <p>YES
else NO
end if
if (A ^ F ^ H) then</p>
        <p>YES
else if (T2) then</p>
        <p>YES
else NO
end if
= d&gt;
= d?
d : A; F; H; YES</p>
        <p>d :
d : A; F; H; NO</p>
        <p>d :
Fig. 3. Concept combination in the algebra of falling rule lists, de nitions and dataset
instances. From top to bottom: combining two de nitions, combining a de nition and
a dataset instance, combining a dataset instance with a rule list whose rst rule is not
applicable, combining a dataset instance with a rule list whose rst rule is
applicable (correct output), combining a dataset instance with a rule list whose rst rule is
applicable (incorrect output).</p>
        <p>The length of such a derivation, therefore, can be used as a measure for the
complexity of the explanation.</p>
        <p>In general, a theory T can be de ned as a rule list, indexed by all available
dataset instances, and a set of de nitions.11 In order to measure its cost more
accurately, we may weigh every rule list or de nition in proportion to its size;
and then, we may modify the computation of the [T]n by making all dataset
instances available as left operands when computing it (even if they are not in T).
Finally, we change the contribution j[T]nj of [T]n to the value of T by counting
the number of successes and failures in [T]n (and weighing them appropriately),
instead of counting the number of elements in [T]n (in other words, we assign
positive value to the successes, negative to the failures, and zero to the de nitions
and rule lists in [T]n). Multiplying the contribution of [T]n to the value of
T by the factor n+1 then corresponds to valuing more highly successes (and
penalizing more highly failures) if they can be obtained from the initial rule
list and de nitions in a small number of steps; thus, maximizing the value of a
falling rule list plus de nitions does not simply correspond to searching for a rule
list which classi es accurately as many dataset instances as possible, but rather
to seeking one such that, additionally, the instances which can be successfully
classi ed admit as short an explanation (which is to say, in the above framework,
a derivation) as possible. The choice of the cost and discount rate parameters, as
well as the choice of the value of successes d&gt; and failures d?, de nes the tradeo
between the objective of classifying correctly as many instances as possible in as
few steps as possible with the objective of having an acceptably compact set of
initial rule lists and de nitions.</p>
        <p>This is only a brief sketch, and more detailed presentation and analysis of this
adaptation of the framework to the case of falling rule lists is left for further work.
Here, it serves to illustrate a possible practical application of the framework
presented here, one which in my opinion might be worth exploring further.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Ahrweiler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Modelling theory communities in science</article-title>
          .
          <source>Journal of Arti cial Societies and Social Simulation</source>
          <volume>14</volume>
          (
          <issue>4</issue>
          ),
          <volume>8</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Angelino</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Larus-Stone</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alabi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seltzer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Learning certi - ably optimal rule lists for categorical data</article-title>
          .
          <source>stat 1050</source>
          ,
          <issue>6</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bou</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schorlemmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corneli</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez-Ram rez</surname>
          </string-name>
          , D.,
          <string-name>
            <surname>Maclean</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smaill</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pease</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The role of blending in mathematical invention</article-title>
          .
          <source>In: Proceedings of the Sixth International Conference on Computational Creativity June</source>
          . vol.
          <volume>55</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Colton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bundy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walsh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Agent based cooperative theory formation in pure mathematics</article-title>
          .
          <source>In: Proceedings of AISB</source>
          <year>2000</year>
          <article-title>symposium on creative and cultural aspects and applications of AI and cognitive science</article-title>
          . pp.
          <volume>11</volume>
          {
          <issue>18</issue>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Colton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bundy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walsh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>On the notion of interestingness in automated mathematical discovery</article-title>
          .
          <source>International Journal of Human-Computer Studies</source>
          <volume>53</volume>
          (
          <issue>3</issue>
          ),
          <volume>351</volume>
          {
          <fpage>375</fpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>11 Nothing prevents us, in principle, from allowing multiple rule lists inside T; but for simplicity, that possibility is not considered here</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Confalonieri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kutz</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troquard</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galliani</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Porello</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Pen~aloza, R.,
          <string-name>
            <surname>Schorlemmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Coherence, similarity, and concept generalisation (2017), under review</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Confalonieri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schorlemmer</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A process model for concept invention</article-title>
          .
          <source>In: Proc. of the 7th International Conference on Computational Creativity</source>
          ,
          <source>ICCC16</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Dyson</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A meeting with Enrico Fermi</article-title>
          .
          <source>Nature</source>
          <volume>427</volume>
          (
          <issue>6972</issue>
          ),
          <volume>297</volume>
          {
          <fpage>297</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Fauconnier</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turner</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The way we think: Conceptual blending and the mind's hidden complexities</article-title>
          .
          <source>Basic Books</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Galanter</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Computational aesthetic evaluation: Past and future</article-title>
          .
          <source>In: Computers and Creativity</source>
          , pp.
          <volume>255</volume>
          {
          <fpage>293</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Gilbert</surname>
            ,
            <given-names>N.:</given-names>
          </string-name>
          <article-title>A simulation of the structure of academic science</article-title>
          .
          <source>Sociological Research Online</source>
          <volume>2</volume>
          (
          <issue>2</issue>
          ) (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Inglis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aberdein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Beauty is not simplicity: an analysis of mathematicians' proof appraisals</article-title>
          .
          <source>Philosophia Mathematica</source>
          <volume>23</volume>
          (
          <issue>1</issue>
          ),
          <volume>87</volume>
          {
          <fpage>109</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Inglis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aberdein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Diversity in proof appraisal</article-title>
          .
          <source>In: Mathematical Cultures</source>
          , pp.
          <volume>163</volume>
          {
          <fpage>179</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Jordanous</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Evaluating evaluation: Assessing progress in computational creativity research</article-title>
          .
          <source>In: Proceedings of the second international conference on computational creativity (ICCC-11)</source>
          . Mexico City, Mexico. pp.
          <volume>102</volume>
          {
          <issue>107</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pease</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Mathematical practice, crowdsourcing, and social machines</article-title>
          .
          <source>In: International Conference on Intelligent Computer Mathematics</source>
          . pp.
          <volume>98</volume>
          {
          <fpage>119</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Mayer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khairy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Howard</surname>
          </string-name>
          , J.:
          <article-title>Drawing an elephant with four complex parameters</article-title>
          .
          <source>American Journal of Physics</source>
          <volume>78</volume>
          (
          <issue>6</issue>
          ),
          <volume>648</volume>
          {
          <fpage>649</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [17]
          <string-name>
            <surname>McCormack</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Open problems in evolutionary music and art</article-title>
          .
          <source>In: Workshops on Applications of Evolutionary Computation</source>
          . pp.
          <volume>428</volume>
          {
          <fpage>436</fpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [18]
          <string-name>
            <surname>McCormack</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>d'Inverno</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Computers and creativity: The road ahead</article-title>
          .
          <source>In: Computers and creativity</source>
          , pp.
          <volume>421</volume>
          {
          <fpage>424</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Pease</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winterstein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colton</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Evaluating machine creativity</article-title>
          .
          <source>In: Workshop on creative systems, 4th international conference on case based reasoning</source>
          . pp.
          <volume>129</volume>
          {
          <issue>137</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Raman-Sundstro</surname>
            m,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The notion of t as a mathematical value</article-title>
          .
          <source>In: Mathematical Cultures</source>
          , pp.
          <volume>271</volume>
          {
          <fpage>285</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Righi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takacs</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The miracle of peer review and development in science: an agent-based model</article-title>
          . Scientometrics pp.
          <volume>1</volume>
          {
          <issue>21</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Sutton</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barto</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          :
          <article-title>Introduction to reinforcement learning</article-title>
          , vol.
          <volume>135</volume>
          . MIT Press Cambridge (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Vermorel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohri</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Multi-armed bandit algorithms and empirical evaluation</article-title>
          .
          <source>In: European conference on machine learning</source>
          . pp.
          <volume>437</volume>
          {
          <fpage>448</fpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Falling rule lists</article-title>
          .
          <source>In: Arti cial Intelligence and Statistics</source>
          . pp.
          <volume>1013</volume>
          {
          <issue>1022</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>