<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Solving Abstract Reasoning Tasks with Grammatical Evolution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Raphael Fischer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Jakobs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sascha Mücke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katharina Morik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TU Dortmund, AI Group</institution>
          ,
          <addr-line>Dortmund</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Abstraction and Reasoning Corpus (ARC) comprising image-based logical reasoning tasks is intended to serve as a benchmark for measuring intelligence. Solving these tasks is very dificult for ofthe-shelf ML methods due to their diversity and low amount of training data. We here present our approach, which solves tasks via grammatical evolution on a domain-specific language for image transformations. With this approach, we successfully participated in an online challenge, scoring among the top 4% out of 900 participants.</p>
      </abstract>
      <kwd-group>
        <kwd>Machine Learning</kwd>
        <kwd>Reasoning</kwd>
        <kwd>Grammatical Evolution</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The ARC challenge requires participants to develop a model that is able to solve
tasks Di from the set D = fD1; : : : ; DM g. Each Di comprises an image-based
Copyright © 2020 by the paper’s authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
1 https://www.kaggle.com/c/abstraction-and-reasoning-challenge/
2 ls8-arc team on the oficial ARC challenge leaderboard.</p>
      <p>(a) Training task 48
(b) Training task 100
(c) Training task 397
reasoning task with mi training pairs f(xk; yk)g and ni test pairs f(x^`; y^`)g,
where mi &gt; 0 is typically around 3 and ni &gt; 0 is mostly 1. The images feature
up to 10 discrete colors C = fc1; : : : ; c10g, and image sizes range between 1
and 30 pixels per dimension. The implied function f that maps inputs to the
expected output images is vastly diferent for every task, often based on abstract
features (e.g. connected shapes) or pattern continuation (see Figure 1). For each
task Di, the method should derive f~i from f(xk; yk)g 2 Di which correctly maps
all input images to the expected output: 8(x^`; y^`) 2 Di : f~i(x^`) = y^`.</p>
      <p>The challenge participants have access to M = 400 training tasks, which
show some logic concepts and come with labels y^`. The final scoring however is
based on an undisclosed test set, whose task are only seen by the model.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Reasoning Approach</title>
      <p>For our approach, we assume that f can be broken down into a sequence of
basic image transformations. We developed a custom domain-specific language
(DSL) specifying such sequences, whose space of expressions is explored using
an evolutionary algorithm (EA). We first explain our language in more detail
and then show how we use an EA to create ARC task solvers from our DSL.
Domain-Specific Language In a first step, we manually implemented solvers
for approximately 20 random training tasks and identified reoccurring image
operations, which became the basis of our DSL.</p>
      <p>
        Let X = Su;v 1 Cu v denote the set of rectangular images with colors in C,
and X ordered lists of images, which we call layers. Our DSL is a context-free
grammar in Backus Naur Form (BNF) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]; the non-terminal symbols represent
function sets whose members (i) modify images (T = X X ), (ii) decompose an
image into layers (S = (X )X ), (iii) combine layers into a single image (J = X X )
or (iv) modify a layer object (L = (X )X ). The terminal productions of these
symbols are either concrete functions of the corresponding type or more complex
function compositions. Figure 2 depicts a visualization of our DSL function types.
      </p>
      <p>Our atomic functions comprise basic operations such as translation, rotation
and cropping, as well as layer-specific operations like extracting a layer, sorting
crop, rotate, …</p>
      <p>T</p>
      <p>X</p>
      <p>X</p>
      <p>L</p>
      <p>sort, filter, …
split_colors, duplicate, …</p>
      <p>S</p>
      <p>J
union, top, …
by diferent criteria, splitting images into layers, and merging them together. For
additional flexibility, we also allow higher-order functions like map, which lifts
the T -type to an L-type function by applying it to each layer. The grammar
structure ensures that all possible expressions have an overall T -type logic, i.e.
they transform a single input image into an output image. This allows to find
solvers for some tasks, an example is given in Figure 3.</p>
      <p>
        Grammatical Evolution We use Grammatical Evolution (GE) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to generate
expressions in our DSL, and ultimately find solvers for a given task. We use the
standard modulo-based mapping from codons to syntax trees of our DSL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to
obtain image functions f~. Uniform mutation and 1-point crossover are used to
produce ofspring [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. To prevent running out of codons, we limit the maximum
tree depth by preemptively excluding rules at every tree node.We choose the
next generation’s parents via tournament selection on the combined parent and
ofspring population. For this we assess the loss value of each function, given by
the distance of its outputs f~(x) to the ground-truth output images y, averaged
over all mi training pairs,
      </p>
      <p>L(f jDi) =
1 mi</p>
      <p>X dimg(f (xk); yk):
mi k=1
Here we define dimg as (i) the proportion of correctly colored pixels (if images
have equal size) or (ii) the Euclidean distance between the color histograms:
dimg(x; y) =</p>
      <p>(y)k2
1 + k (x)
(Pi 1fxi6=yig=(uxvx) if ux = uy and vx = vy</p>
      <p>else
stripc1
split_colors sortArea;Desc
top
crop
(1)
(2)
where u; v is the image width and height, 1fP g = 1 if P &gt; else 0, and
(x) = Pxi2x(1fxi=cg)c&gt;2C as the (unnormalized) color histogram of x. f~ is an
optimal solution if (and only if) all predictions are equal to the expected output,
i.e. L(f jDi) = 0, thus we can also use Equation 1 as an early-stopping criterion.
Following the evaluation procedure of the challenge, our accuracy is the
proportion of correctly solved tasks: ACC = M 1 PiM=1 Q`n=i1 1ffi(x^`)=y^`g. Here, fi is
the solution produced by our method after training on task Di. A grid search
was performed to obtain good hyperparameters for population size, mutation
rate and mutation strength. As GE is randomized, we ran the experiments with
40 diferent seeds and discuss the averaged results with standard deviation.</p>
      <p>We first evaluate our method on the 400 training tasks of ARC, from which
we solved 7:68( 0:61)%. The challenge leaderboard evaluation however is based
on a secret data set of 100 tasks with slightly higher logic complexity. Here, we
were able to correctly solve ACC = 3% of the tasks. Despite this seemingly low
value, we scored among the top 30 of over 900 participants, which illustrates
the challenge’s non-triviality. The small number of tasks makes it hard to assess
the usefulness of atomic functions in the DSL. Moreover, there is no information
about the overlap of required logic operations between training and test tasks.</p>
      <p>Our results also made us question the efectiveness of GE considering the
tremendous search space complexity. Small mutations to the current best
individual, such as swapping out a single atomic function, can significantly change
its behavior. As a result, the algorithm operates in a highly non-convex search
space. We therefore compared our EA approach to simply generating random
individuals from our DSL. This random search baseline solves an average of
6:17( 0:13)% of the training tasks, indicating that the EA is able to traverse
the search space at least somewhat more eficiently. This is even more significant
on the test tasks, where random-search is not able to solve even a single task.
5</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>We showed that our DSL+GE approach is viable for solving reasoning tasks.
By designing a language for image-based logic and learning corresponding
expressions from the training instances, we were able to score well in the ARC
challenge. Ensuring a high expressiveness of the DSL, while at the same time
limiting the complexity to allow tractable function evolution, appears to be the
key problem. Finding the optimal set of functional atoms for the given domain
may thus be subject to further research.</p>
      <p>The ARC challenge’s dificulty illustrates the long way still to go until ML
methods reach abstract reasoning capabilities comparable to the human mind’s.
Acknowledgment This research has been funded by the Federal Ministry of
Education and Research of Germany as part of the competence center for
machine learning ML2R (01|18038A)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chollet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>The measure of intelligence</article-title>
          . arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>01547</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gulwani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Automating string processing in spreadsheets using input-output examples</article-title>
          .
          <source>ACM Sigplan Notices</source>
          <volume>46</volume>
          (
          <issue>1</issue>
          ),
          <fpage>317</fpage>
          -
          <lpage>330</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hernández-Orallo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martínez-Plumed</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmid</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siebers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dowe</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Computer models solving intelligence test problems: Progress and implications</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>230</volume>
          ,
          <fpage>74</fpage>
          -
          <lpage>107</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Koza</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Genetic programming as a means for programming computers by natural selection</article-title>
          .
          <source>Statistics and Computing</source>
          <volume>4</volume>
          ,
          <fpage>87</fpage>
          --
          <lpage>112</lpage>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Legg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A collection of definitions of intelligence</article-title>
          .
          <source>Advances in Artificial General Intelligence: Concepts</source>
          ,
          <source>Architectures and Algorithms</source>
          <volume>157</volume>
          (07
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>O</given-names>
            <surname>'Neill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ryan</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Grammatical evolution</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          <volume>5</volume>
          (
          <issue>4</issue>
          ),
          <fpage>349</fpage>
          -
          <lpage>358</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ryan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Collins</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neill</surname>
            ,
            <given-names>M.O.</given-names>
          </string-name>
          :
          <article-title>Grammatical evolution: Evolving programs for an arbitrary language</article-title>
          .
          <source>In: European Conference on Genetic Programming</source>
          . pp.
          <fpage>83</fpage>
          -
          <lpage>96</lpage>
          . Springer (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>