=Paper=
{{Paper
|id=None
|storemode=property
|title=Platform for Rapid Prototyping of AI Architectures
|pdfUrl=https://ceur-ws.org/Vol-1422/127.pdf
|volume=Vol-1422
|dblpUrl=https://dblp.org/rec/conf/itat/HrossoKVF15
}}
==Platform for Rapid Prototyping of AI Architectures==
<pdf width="1500px">https://ceur-ws.org/Vol-1422/127.pdf</pdf>
<pre>
J. Yaghob (Ed.): ITAT 2015 pp. 127–134
Charles University in Prague, Prague, 2015


                        Platform for Rapid Prototyping of AI Architectures

                                  Peter Hroššo, Jan Knopp, Jaroslav Vítků, and Dušan Fedorčák

                                                        GoodAI, Czech Republic
                                             contact author: peter.hrosso@keenswh.com

Abstract: Researching artificial intelligence (AI) is a big
endeavour. It calls for agile collaboration among research
teams, fast sharing of work between developers, and the
easy testing of new hypotheses. Our primary contribution
is a novel simulation platform for the prototyping of new
algorithms with a variety of tools for visualization and de-
bugging. The advantages of this platform are presented
within the scope of three AI research problems: (1) mo-
tion execution in a complex 3D world; (2) learning how
to play a computer game based on reward and punish-
ment; and (3) learning hierarchies of goals. Although there
are no theoretical novelties in (1,2,3), our goal is to show
with these experiments that the proposed platform is not               Figure 1: Print-screen of our simulation platform.
just another ANN simulator. This framework instead aims
to provide the ability to test proactive and heterogeneous
modular systems in a closed-loop with the environment.                In this article, we would like to present our attempt to
Furthermore, it enables the rapid prototyping, testing, and        create such a tool. We here introduce a platform which
sharing of new AI architectures, or their parts.                   allows:
                                                                     • Easy prototyping of new models and fast sharing of
                                                                       existing ones (Sec. 4.1)
1    Introduction and Related Work
                                                                     • Control of an agent in an environment on top of clas-
The recent boom in the field of artificial intelligence (AI)           sic data processing (Sec. 3.3)
was brought on by advances in so-called narrow AI, rep-
resented by highly specialized and optimized algorithms              • Modular approach – seamless connecting of models
designed for solving specific tasks. Such programs can                 inside a greater architecture (Sec. 2.1)
even sometimes surpass human performance when solv-
                                                                     • Various tools for the visualization of data (Fig. 2)
ing the single problem for which they were created. But
these narrow AI programs lack one feature which has been             • Simplified debugging (Sec. 2.2)
so far widely omitted, partly due to its overwhelming dif-
ficulty: generality.                                                 • User Friendly GPU programming (Sec. 2.1)
   In order to compensate for this deficiency, the field of          • Scalable due to GPU parallel computation
artificial general intelligence (AGI) is bringing the focus
back to broadening the range of solvable tasks. The ul-              • Support of several scenarios such as an agent in an
timate goal of AGI is therefore the creation of an agent               environment, classification, tic-tac-toe etc.
which can perform well (at human level or better) at any
task solvable by a human. For a more detailed description             Our platform also includes a variety of visualization
of AI/AGI, see e.g. [23].                                          tools and enables easy access to diverse data sets not only
   Pursuing such a goal is a hard task. According to the           for tasks such as image classification or recognition, but
scientific method – the only guideline we have – we need           also scenarios where an agent interacts with its environ-
to come up with new theories, design experiments for test-         ment. Last but not least, it is open source and freely avail-
ing them, and evaluate their results. Such a cycle needs           able under a non-commercial license1 . The primary goal
to be repeated often, because it can be expected to reach          of this tool is easy collaboration among both specialists
more dead ends than breakthroughs. We don’t know how               and laymen for developing novel algorithms, especially in
to increase the rate of coming up with new ideas, but what         the field of AI.
can be improved is the efficiency of research. What we                There are tools, languages, and libraries that are good
need is better tools which will simplify the implementa-           in particular areas. In the research community, widely
tion of new theories, speed up experiments, and help us               1 The platform is available as Brain Simulator at http://www.

understand the results better by visualizing obtained data.        goodai.com/brainsimulator
128                                                                                    P. Hroššo, J. Knopp, J. Vítků, D. Fedorčák


used are Matlab [9] and Python [17] for prototyping by             (a)
code, platforms that aim at high level graphical model-
ing (Simulink [20], Software Architect [8]), data analy-           (b)
sis (Azure [10], Rapid Miner [18]), advanced visualization
(ParaView [15]), rich graphical user interface (Blender [2],
Maya [1]), modular computation (ROS [19]), or specific li-         (c)
braries for sharing [7] and parallel computation [3]. Each
                                                                   (d)
of these instruments is important in their specific domain,
but there is none which would cover under one roof the
most prominent features of all of those mentioned. Our          Figure 2: Visualization of data. Left: memory block
platform is an attempt to fill this niche, and offers both      of a 1 × 3 matrix can be visualized as: (a) value,
high-level graphical coding and possibly, but not neces-        (b) color-map, (c) value of each element in time, or
sarily, also low-level (i.e. CUDA [3]) programming.             (d) value as a color-map in time. Right: more compli-
   There are several tools for simulating neural networks       cated task-specific visualizations can be implemented too,
(NN). Nengo [11] or OpenNN [14] focus on experiment-            i.e.growing neural gas [26].
ing with all possible modifications of NN. Unfortunately,       running, it often happens that its outcome is not what was
they either lack in visualization or focus on over-specific     expected. Such a simulation platform could not be imag-
design approaches. Moreover, usage of these tools often         ined without a tool for runtime analysis of algorithms.
requires extensive programing knowledge and installation        For this purpose various data observers can be displayed so
of extension packages [4, 13]. In contrast, other tools that    the experimenter can visualize the computed data, evalu-
provide rich visualization focus purely on the functions of     ate performance of the model, and change its parameters
our brain. For example, PSICS [16] uses 3D synapse vi-          during runtime if needed.
sualization to show data flow in parts of the brain, Digi-         Our platform is tailored to suit two different points of
Cortex [6] nicely visualizes spike activations of the whole     view of the architecture development process:
brain in time, and Cx3D [5] simulates growth of the cor-
tex in 3D. Extensive comparison of various neural network         • A user who desires quick architecture modeling and
simulators, including our platform (Brain Simulator), can           needs fast access to already existing state-of-the-art
be found in [12].                                                   modules (such as PCA, NN, image pre-processing
   It is worth noting that the proposed platform is not lim-        etc.) to experiment with. This perspective requires
ited to the design of neural networks only. Any algorithm           no coding and it’s done through graphical modeling.
useful for AI, machine learning, or control can be incorpo-         Furthermore, it is often crucial to have good insight
rated (various mathematical transformations, filters, PID           into the running model, and thus a large set of visual-
controller, image segmentation, hashing functions, dictio-          ization tools is available (Fig. 2).
nary, etc., are already included). The heterogeneous char-
                                                                  • On the other hand, a researcher/developer often re-
acter of the platform is its main advantage.
                                                                    quires the creation of a new module or the import of
   Throughout this work, we describe our platform in the
                                                                    an already existing library. Our API provides an easy
following Sec. 2. In Sec. 3, tasks where we show advan-
                                                                    way for such a module to be created and added to
tages of the platform are introduced. In Sec. 4, we discuss
                                                                    the inner shared repository. Moreover, the API offers
the experience with our tool and its advantages and weak-
                                                                    an opportunity to hook the code to the GUI and bring
ness in the testing scenarios. The paper is concluded in
                                                                    needed interactivity. Finally, the API defines a rigid
Sec. 5.
                                                                    interface, ensuring that the new module will be com-
                                                                    patible with other modules.
2     Simulation Platform
                                                                   The platform was designed to meet both needs. It is im-
Our modus operandi reflects our goals - we are aiming for       portant to distinguish between them as a user can be a per-
a modular cognitive architecture, so we needed an environ-      son interested in machine learning, but less experienced in
ment which would efficiently support the whole life-cycle       programming. Our platform can be a good starting point,
of experiments, starting with the testing of already existing   and the learning curve should therefore be smooth enough
algorithms, going through the design of a new algorithm,        to bring the person in effortlessly.
and ending with a results evaluation. We developed a plat-         From the experienced researcher/developer point of
form where various algorithms from machine learning and         view, the platform should provide a convenient set of tools
narrow AI are available. It is easy to pick some, connect       that can help with the development of novel algorithms
them, and start experimenting effortlessly. Agile develop-      and/or be able to envelope existing work into module that
ment requires frequent testing of new hypotheses, which         can be easily shared among a team.
is facilitated by an easy way of prototyping new modules           Finally, the community-driven approach renders itself
for the platform as well as their fast training and evalua-     very powerful and we believe that it can speed up the re-
tion (accelerated on GPU) on data. After the experiment is      search vastly. For this reason, we are planning to in-build
Platform for Rapid Prototyping of AI Architectures                                                                         129


a “module market” to allow for the sharing of state-of-             Before running the simulation, the order of nodes exe-
the-art research results between many co-working teams.          cution needs to be evaluated. There are other aspects that
                                                                 level the problem up (e.g. inner cycles, clustering and bal-
                                                                 ancing of the model in HPC environment) but it usually
2.1    Platform Meta Model
                                                                 boils down to various forms of dependency ordering, cy-
There are three basic concepts defined in the meta model:        cle detection, or the job shop problem [34]. Solving these
a node, a task and a memory block. The node encapsulates         tasks is automated and user/developer assistance is usually
a functional block or algorithm that can “live” on its own       discouraged, but there are use cases where user aid is nec-
(e.g. matrix operations, data transformations, various ma-       essary or can simplify the problem substantially.
chine learning models, etc.). A node needs a memory for             There is also another view of the problem of execution
its function. The memory is organized into a set of mem-         order when faced in the area of machine learning. It turns
ory blocks that are aggregated inside the node. Some of          out that many of ML methods are surprisingly noise resis-
these memory blocks can be designated as output blocks           tant (i.e.neural nets). Therefore, if approached with cau-
and others as input blocks. The connection between input         tion, one can run the model asynchronously and let inner
and output memory blocks is provided by the user.                parts of the model deal with sometimes temporally incon-
   From the functional point of view, the node behaviour         sistent data. We made some experiments and the prelim-
can usually be divided into a set of tasks where each            inary results show that relatively complex models can be
task is a part of the realized algorithm. Both nodes and         run completely without synchronization.
tasks can define a set of parameters. Usually, node pa-             Another aspect of the model execution is GPU enhanced
rameters describe structural properties (i.e. size of mem-       computation which can speed up the simulation substan-
ory blocks) whereas task parameters affect behavior. At          tially. The main purpose of our simulation platform is fast
present, the memory model is constant during the simula-         prototyping and testing of hypotheses. With increasing
tion, and therefore structural properties are editable only in   generality the efficiency usually decreases, so one should
design time. On the other hand, it is useful to change task      not expect top execution speed from our simulation plat-
parameters during simulation and observe changes in              form. The devised practice is to design, test, and analyze
behavior of the algorithm/node.                                  new architectures, and once the final model is tested and
   Memory blocks are located at GPU (device memory)              working, it can be replaced by a specialized, highly op-
and every task can be seen as a collection of kernel calls       timized implementation still within the platform environ-
(methods executed on GPU). If two nodes are connected            ment. Finally, it can be argued that the overall time neces-
in the GUI, it means that they have a pointer to the same        sary to get from an idea to the final product is much shorter
memory block (input in one node, output in the other).           compared to the classic approach of writing a specific pro-
   If one requires dynamically allocated memory, the user        gram from scratch for each new experiment.
can either define a memory block that is large enough, or           An important part of the development process is the
implement the node only for the CPU (which is more flex-         easy visualization of what is happening at each part of the
ible than GPU) using all data structures supported by C#.        designed system. This is especially important for debug-
The only mandatory requirement is usage of input/output          ging as the most frequent problem is due to the difference
memory blocks.                                                   between what the programmer thinks the program should
   All concepts described above can be easily implemented        do and what it does in reality. In addition to the variety
through rich API that is provided. The actual implemen-          of observers that have already been discussed (Fig. 2), the
tation relies heavily on annotated code describing various       platform contains its own debugger, where one can walk
aspects of the model (UI interactivity, constraints, persis-     through the execution of all components used in the model.
tence, etc.). It allows the user to be extremely efficient in
creating model prototypes. Sometimes, this can lead to
unreadable, over-annotated code which is hard to maintain        3   Testing Scenarios
but this can be eliminated by applying standard software
design patterns like MVC when needed.                            Whether the ultimate goal of AGI (a general autonomous
                                                                 machine) is achievable or not [35], researchers focus on
2.2    Computation                                               its sub-goals such as learning how to play games [37],
                                                                 etc. One of the prominent building blocks for these sub-
As described above, the prototyped model forms an ori-           goals are neural networks in the form of deep learning
ented graph with nodes and data connection edges. As the         and CNN, and which have recently made big progress
connections between nodes can be any of M → N and re-            in speech recognition [40], computer vision [33], med-
current connections are also possible, the resulting graph       ical analyses [43], or language translation [28]. Le and
can be very complex. Moreover, the usual model is con-           colleagues [36] used a deep network to learn in unsuper-
nected to the world node from which “perception” inputs          vised manner what an ordinary “cat” looks like only by
are taken, and control outputs are passed, forming the main      watching youtube videos. While NN can also learn how to
loop of the simulation.                                          play simple games [32, 37], they usually fail in structured
130                                                                                      P. Hroššo, J. Knopp, J. Vítků, D. Fedorčák


problems which demand learning hierarchies or chains of         a six-legged robot within the game and connected it bi-
goals. From this perspective, it seems promising to fo-         directionally with our simulation platform. In one direc-
cus on machines which can control another machine, such         tion the game sends visual data from the robot’s view and
as NN that learn how to control a Turing Machine [27].          a description of the state of the robot’s body. In the other
They designed a neural network which learns a procedure         direction motoric commands from our control module in-
to control a Turing Machine to sort numbers.                    side the simulation platform are sent to the robot, which
   As the goal of this paper is to provide a tool that short-   executes them in the game.
cuts the research path to an autonomous machine, we                                                       The control mod-
will show how it performs on three selected AI tasks                                                   ule was trained to
solved by our team: learning motion control, game play-                                                associate        visual
ing of the Atari game Breakout, and learning hierarchies                                               input with motor
of goals. Our solutions are highly inspired by current ma-                                             commands        in    a
chine learning literature with a stress on the usage of neu-                                           supervised         way.
ral networks, which are one of the basic building blocks                                               The         associative
for bigger architectures.                                                                              memory was im-
   The first experiment (Sec. 3.1) will demonstrate how                                                plemented with a
our platform can be connected to an external source of                                                 Self-Organizing
input data and how various modules of narrow AI can be          Figure 3: Overall architec-            Map [31], which
combined together to form a functioning system which can        ture. Raw visual signals are           found the most sim-
drive a robot in a virtual world with simulated physics.        processed into symbols, which          ilar representative of
                                                                are then added to the working          the received input in
   The second experiment (Sec. 3.2) will be situated in
                                                                memory. States corresponding           the visual memory
much simpler simulation environment – an Atari [38]
                                                                to reward and punishment are           and returned the
game called Breakout. In this experiment a more advanced
                                                                accumulated and later used as          associated high-level
adaptive system will be showcased. The system works di-
                                                                teaching signals for training the      motoric      command
rectly on raw image input. It takes advantage of the se-
                                                                action selection network.              (turn left/right, move
mantic pointer architecture [25] for representing its per-
ceptions and for converting them into a long term memo-         forward/backward). These high-level commands were
ries such as goals. This knowledge is then used for learn-      then unrolled into sequences of body states consisting of
ing necessary actions for playing the game.                     joint angles of all of the robot’s limbs using a recurrent
   In the third experiment (Sec. 3.3), we move a bit higher     neural network (RNN) [39]. These body states were
in the level of abstraction. The presented problem con-         afterwards used as waypoints for a control RNN which
sists of an agent in a simple 2D environment which needs        was trained to act as an inverse dynamics model of the
to satisfy a chain of preconditions before reaching a re-       robot’s body. In order to reach a specified waypoint, the
ward, such as if the agent wants to turn on a light, it needs   control network generated full motoric inputs to the robot
to press a switch, but to get to the switch he also needs       - the desired angular velocities of joints.
to overcome an obstacle (a door controlled by another
switch). The task is solved by hierarchical reinforcement          The training
learning [30].                                                  phase consisted
                                                                of a mentor
   To clarify why we selected these experiments, one could
                                                                leading       the
imagine the three systems as parts of a future higher-order
                                                                robot       from
cognitive architecture where they will work together. The
                                                                various     start-
system from the first experiment could be thought of as
                                                                ing     locations
a basic motoric and sensory system driven by reflexes and
                                                                towards a goal
higher-level commands. These would come from the sys-
                                                                location in the
tem used in the second experiment, which would allow the
                                                                environment, which was identified by an easily distin-
agent to learn how to reach a specific goal. And finally, the
                                                                guishable 3D symbol located on that position. The mentor
third system should discover the hierarchy of goals and
                                                                was implemented by a hard-coded navigation system. In
preconditions, and thus could resemble a simplified ver-
                                                                this way the hexapod was trained to look for the goal
sion of the agent’s central executive. Such connection of
                                                                symbol and when it appeared in the robot’s field of view,
the systems remains for our future work.
                                                                to navigate successfully towards the destination through
                                                                the environment.
3.1   SE Robot
                                                                3.2 Atari Game
We took advantage of a sandbox game called Space En-
gineers [21], which provides a physically realistic 3D en-      The Breakout game was chosen as our second testing
vironment where various structures can be built. We built       scenario. The game consists of a ball, a paddle, and
Platform for Rapid Prototyping of AI Architectures                                                                                       131

         segmentation        attention
                               score                  CNN features


                      graph                patch                     working
 input image
                   optimization          extraction                  memory

                                                                                           (a) The goal                  (b) Legend

                                                                               Figure 5: Multiple goals. (a) The agent’s current goal is to
                                                                               reach the light switch and turn on the lights. (b) objects of
                                                                               the environment

Figure 4: Top: Image processing. First, an input image is
segmented into super-pixels (SP) using SLIC [22]. Sec-                            Details of the vision system and semantic point archi-
ond, each SP is connected with its neighbors and nearby                        tecture are described in Appendix II.
SP are assigned into the same object id. Third, the at-
tention score (sA ) is estimated for each object. Fourth,
features are estimated for the object with the highest sA                      3.3 2D World with Hierarchical Goals
by a hierarchy of convolutions and fully-connected lay-
ers. Fifth, the object features are clustered into Visual                      In previous testing scenarios we wanted to test if our sys-
Words [41] to constitute a “Working Memory”. Bottom:                           tem was able to coordinate complex motoric commands
Corresponding implementation in our platform.                                  in a 3D environment, learn simple goals, and act towards
                                                                               maximizing the received reward. Our goal for the third
bricks. The ball bounces from walls, can destroy bricks,                       scenario is to increase the generality of the designed sys-
and can fall to the ground, for which the player is pe-                        tem to enable identification and satisfaction of chained
nalized by losing a life. After losing 4 lives, the game                       preconditions before the final goal can be reached.
is over. When all bricks are destroyed, the player                                We present a task, consisting of a simple 2D world,
successfully finishes the level and                                            where a single source of reward is located – a light bulb,
enters the next one consisting of                                              which starts in the “off” position and should be turned on
a different arrangement of bricks.                                             by the agent. This can be achieved by pressing a switch,
The player has three actions avail-                                            but the switch is hidden behind a locked door. The door
able which accelerate the paddle to                                            can be unlocked through a switch, but this switch is hid-
the right, to the left, or decelerate.                                         den behind another locked door. It would be possible to
Even though our modular approach                                               chain the preconditions further in this manner, but without
uses pure unstructured data input                                              the loss of generality we use only two locked doors with
(raw image as in [37]), it later extracts the structure, so                    two matching switches. The setup can be seen in Fig. 5.
we can understand the inner workings of the model as op-                          We approached this problem by employing HARM (Hi-
posed to the cited work. The architecture of the system                        erarchical Action Reinforcement Motivation system) [30].
consists of four main parts: image processing, working                         It is an approach based on a combination of a hierarchical
memory, accumulators of reward and penalty, and an ac-                         Q-learning algorithm [42] and a motivation model. The
tion selection network (Fig. 3).                                               system is able to learn and compose different strategies in
   Relevant information about the objects is extracted from                    order to create a more complex goal.
the raw bitmap in the Vision System (Fig. 4).                                     Q-learning is able to “spread information about the re-
   Working memory (WM) is the agent’s internal repre-                          ward” received in a specific state (e.g. the agent reaching
sentation of the environment. It contains all of the objects                   a position on the map) to the surrounding space, so the
detected by vision. WM is kept up to date by adding new                        brain can take proper action by climbing the steepest gra-
objects which haven’t been seen yet, and by updating those                     dient of the Q function later. However, if the goal state is
already seen. The identity of objects is detected through                      far away from the current state, it might take a long time
a comparison of visual features. Contents of the work-                         to build a strategy that will lead to that goal state. Also,
ing memory are transformed into a symbolic representa-                         a high number of variables in the environment can lead to
tion and passed to the goals memory and action selection                       extremely long routes through the state space, rendering
network.                                                                       the problem almost unsolvable.
   The goals memory is trained by accumulating states as-                         There are several ideas that can improve the overall per-
sociated with reward and punishment in their respective                        formance of the algorithm. First, this agent rewards itself
semantic pointers, goal+ and goal- . These are then used                       for any successful change to the environment. The motiva-
for evaluating the quality of game-states, which is neces-                     tion value can be assigned to each variable change so the
sary for training the action selection network.                                agent is constantly motivated to change its surroundings.
132                                                                                         P. Hroššo, J. Knopp, J. Vítků, D. Fedorčák


                                                                   the persistence capabilities allow easy sharing of whole
                                                                   models (projects) and merging of them together.
                                                                     + It is easy to replace an existing module with its im-
                                                                   proved version, as the architecture is separated from the
                                                                   implementation. As the backward compatibility becomes
                                                                   crucial at this point, the inner versioning system was im-
                                                                   plemented.
                                                                      + Provided interface drives users to follow design pat-
Figure 6: Learnt strategy. Visualization of the agent’s            terns when developing low-level optimized modules.
knowledge for a particular task, which changes the state              - Current version runs on MS Windows only, but a port
of the lights. It tells the agent what to do in order to           to MacOS and Linux is planned for the future.
change the state of the lights in all known states of the             - The user has an option to either develop optimized
world. The heat map corresponds to the expected utility            modules in code (CUDA [3] or C#) or use the graphical
(“usefulness”) of the best action learned in a given state.        interface for connecting existing modules into bigger ar-
A graphical representation of the best action is shown at          chitectures. There is no middle layer which would support
each position on the map.                                          scripting.
   Second, for each variable that the agent is able to
change, it creates a Q-learning module assigned to the             4.1 Experience of Newcomers
variable (e.g. changing the state of a door). Therefore, it        The expertise of people that have started to use our plat-
can learn an underlying strategy defining how this change          form varies from C++ experts to Matlab users only. We
can be made again. In such a system, a whole network               found that users with a very short training can connect ex-
of Q-learning modules can be created, where each module            isting modules into simple architectures (like neural net-
learns a different strategy.                                       work MNIST image recognizer) as the graphical model-
   Third, in order to lower the complexity of each sub-            ing is somehow natural and easy to understand. The linear
problem (strategy), the brain can analyze its “experience          learning curve of newcomers is supported by a video tu-
buffer” from the past and eventually drop variables that           torial as well as several examples of how to implement
are not affected by its actions or are not necessary for the       simple and more advanced tasks2 .
current goal (i.e. strategy to fulfill the goal).                     For development of new modules, it is necessary to un-
   A mixture of these improvements creates a hierarchical          derstand a programming language (C#, C++, CUDA) at
decision model that is built online (first, the agent is left to   the basic level at least. Once the definitions of inputs, out-
(semi-)randomly explore the environment). After a suffi-           puts, tasks, and kernels (four lines of code each) are un-
cient amount of knowledge is gathered, we can “order” the          derstood, developers soon start creating their own nodes.
agent to fulfill a goal by manually raising the motivation         Their learning curve then equals learning how to use a new
that corresponds to a variable that we want to change. The         library.
agent then will execute the learned abstract action (strat-
egy) by traversing the network of Q-learning modules and           4.2 Our Observations on the Testing Scenarios
unrolling it into a chain of primitive actions that lie at the
bottom.                                                            In the first scenario (Sec. 3.1), we have shown that our
                                                                   platform successfully connects with the open source game
                                                                   Space Engineers [21]. Modules created in the platform
4     Discussion                                                   controlled the hexapod in the world of the game. It was
                                                                   the understanding of the game’s communication module
Throughout the work on the testing scenarios (Sec. 3) we           that took the most time in this case. Otherwise the devel-
have observed several advantages and weakness of the               opment of the controller did not raise any challenges for
platform. In this section, our experience with the usage           the platform.
of the platform is discussed. The discussion is focused es-           The second scenario (Sec. 3.2) consisted of several
pecially on the end-user experience, i.e. experience of a          modules that were developed independently. We found
person that did not develop the platform but wants to use          it extremely useful that each module communicates with
it for solving her problem.                                        others using only the pre-defined interface (memory
    First, a list of identified features is presented, and then    blocks) that correspond to a sketched diagram (i.e. Fig. 4).
further experience is described.                                   Modules were merged into one big architecture right be-
  + Fast and easy online observation and interaction with          fore the deadline without any complications. As the fi-
the simulation.                                                    nal model was quite large and performance-demanding,
  + Created modules can be easily understood and shared               2 Documentation   available   at   http://docs.goodai.com/
with collaborators due to the same interface. Moreover,            brainsimulator/
Platform for Rapid Prototyping of AI Architectures                                                                              133


we were forced to profile, find bottlenecks, and optimize          [9] MATLAB: The language of technical computing. Available
in the process. It was extremely useful to visualize data              at http://www.mathworks.com/products/matlab/.
flowing between (and inside) the modules.                         [10] Microsoft Azure.           Available at http://azure.
   In the third scenario (Sec. 3.3), HARM constituted a                microsoft.com/en-us/.
single module with complex insides. Therefore this sce-           [11] The nengo neural simulator. Available at http://nengo.
nario presented an ideal example for designing a num-                  ca/.
ber of task-specific visualization tools (for example, the        [12] Neural networks simulators. Available at https://goo.
agent’s knowledge in Fig. 6).                                          gl/hRf4KA.
                                                                  [13] OpenCV: Open source computer vision. Available at
                                                                       http://opencv.org/.
5     Conclusion
                                                                  [14] OpenNN: Open neural networks library. Available at
We have presented a platform for prototyping AI architec-              http://www.intelnics.com/opennn/.
tures. The platform is tailored both for users with no math-      [15] ParaView. Available at http://www.paraview.org/.
ematical/programming background but with a high desire            [16] PSICS: The parallel stochastic ion channel simulator.
to experiment with AI modules, and for researchers/de-                 Available at http://www.psics.org/.
velopers who want to improve and experiment with their            [17] Python. Available at https://www.python.org/.
existing state-of-the-art techniques.                             [18] Rapid miner. Available at https://rapidminer.com/.
   To show the usage of our platform we have presented            [19] ROS. Available at http://www.ros.org/.
three development scenarios: linkage with a 3D game               [20] Simulink: Simulation and model-based design. Available at
world and controlling an agent there; playing an Atari                 http://www.mathworks.com/products/simulink/.
game using the raw bitmap input processed by computer             [21] Space Engineers, open source code. Available at https://
vision techniques, attention model, and semantic pointer               github.com/KeenSoftwareHouse/SpaceEngineers.
architecture; and learning a complex hierarchy of goals.          [22] Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P.,
   The proposed platform opens up possibilities to share               Susstrunk, S.: Slic superpixels compared to state-of-the-
ideas not only within the community but also with non-                 art superpixel methods. PAMI (2012), 2274–2282
experts who can boost the research via rapid testing, or          [23] Ben, G.: Artificial general intelligence: concept, state of
utilize fresh, out-of-the-box solutions. There is also the             the art, and future prospects. Journal of Artificial General
prospect of support from the open source community - if                Intelligence 5 (2014), 1–48
not directly in the development, then at least in assessing       [24] Bishop, C. M.: Pattern recognition and machine learning.
missing features, so we can incorporate them and thus pro-             Springer, 2006
vide a tool that can be used at many levels of expertise.         [25] Eliasmith, C.: How to build a brain: a neural architecture
   We believe that by providing an open platform for AI                for biological cognition (Oxford Series on Cognitive Mod-
and ML experiments along with a smooth learning curve,                 els and Architectures). Oxford University Press, 2013
we can bring together many enthusiasts across different           [26] Fritzke, B.: A growing neural gas network learns topolo-
fields of interest, potentially leading to unexpected ad-              gies. In: NIPS, 1995
vancements in research.                                           [27] Graves, A., Wayne, G., Danihelka, I.: Neural turing ma-
                                                                       chines. CoRR, 2014
  Acknowledgement. This material is based upon work               [28] He, X., Gao, J., Deng, L.: Deep learning for natural
supported by GoodAI and Keen Software House.                           language processing and related applications (tutorial at
                                                                       ICASSP). ICASSP, 2014
                                                                  [29] Huang, F. J., Boureau, Y. L., Lecun, Y.: Unsupervised
References
                                                                       learning of invariant feature hierarchies with applications
 [1] Autodesk maya. Available at http://www.autodesk.                  to object recognition. In: CVPR, 2007
     com/products/maya/overview.                                  [30] Kadlecek, D., Nahodil, P.: Adopting animal concepts in hi-
 [2] Blender. Available at https://www.blender.org/.                   erarchical reinforcement learning and control of intelligent
                                                                       agents. In: Proc. 2nd IEEE RAS & EMBS BioRob, 2008
 [3] CUDA. Available at https://developer.nvidia.com/
     cuda-zone.                                                   [31] Kohonen, T., Schroeder, M. R., Huang, T. S.:           Self-
                                                                       organizing maps. 3rd edition, 2001
 [4] CVX: Software for Disciplined Convex Programming.
     Available at http://cvxr.com/.                               [32] Koutník, J., Cuccu, G., Schmidhuber, J., Gomez, F.: Evolv-
                                                                       ing large-scale neural networks for vision-based reinforce-
 [5] Cx3D: Cortex simulation in 3D. Available at http://www.
                                                                       ment learning. In: GECCO, 2013
     ini.uzh.ch/~amw/seco/cx3d/.
                                                                  [33] Krizhevsky, A., Sutskever, I., Hinton, G. E.: Imagenet
 [6] DigiCortex: Biological neural network simulator. Available        classification with deep convolutional neural networks. In:
     at http://www.dimkovic.com/node/1.                                NIPS, Curran Associates, Inc., 2012
 [7] GitHub. Available at https://github.com/.                    [34] Tsaia, M. -J., Chang, H.-Y., Huang, K. -C., Huanga, T -C.,
 [8] IBM Rational Software Architect.             Available at         Tung, Y. -H.: Moldable job scheduling for hpc as a service
     http://www.ibm.com/developerworks/downloads/                      with application speedup model and execution time infor-
     r/architect/index.html.                                           mation. Journal of Convergence, 2013
134                                                                                                      P. Hroššo, J. Knopp, J. Vítků, D. Fedorčák


[35] Kurzweil, R.: The singularity is near: when humans tran-
     scend biology, 2006
[36] Le, Q., Ranzato, M. ’A., Monga, R., Devin, M., Chen, K.,
     Corrado, G., Dean, J., Ng, A.: Building high-level features
     using large scale unsupervised learning. In: ICML, 2012
[37] Volodymyr et al. Mnih: Human-level control through deep
     reinforcement learning. Nature 518 (2015), 529–533
[38] Naddaf, Y.: Game-independent AI agents for playing                        Figure 8: Left: Goal+ and goal- . States associated with
     Atari 2600 console games. Masters, University of Alberta,                 received reward and punishment are accumulated in semantic
     2010                                                                      pointers goal+ and goal- , respectively. Right: Action learning.
                                                                               Actions selection is learnt in a supervised way with fitness com-
[39] Rojas, R.: Neural networks: a systematic introduction.
                                                                               puted from goal+ and goal- as a teaching signal.
     Springer-Verlag New York, Inc., New York, NY, USA,
     1996                                                                      is then clustered into a “Working Memory” (WM). The
[40] Sak, H., Vinyals, O., Heigold, G., Senior, A., McDer-                     WM stores feature id together with the object position for
     mott, E., Monga, R., Mao, M.: Sequence discriminative                     the last 10 seen objects.
     distributed training of long short-term memory recurrent
                                                                                  For CNN features, we used two convolutions layers of
     neural networks. In: Interspeech, 2014
                                                                               8 and 5 neurons and patch sizes 5 × 5 followed by a fully-
[41] Sivic, J., Zisserman, A.: Video Google: A text retrieval
                                                                               connected layer of 16 neurons. Learning converged in
     approach to object matching in videos. In: ICCV 2 (2003),
     1470–1477
                                                                               6K iterations. We observed no performance improvement
                                                                               with bigger networks. WM was implemented as a simple
[42] Sutton, R. S., Barto, A. G.: Introduction to reinforcement
     learning. MIT Press, Cambridge, MA, USA, 1st edition,
                                                                               K-means [24].
     1998
                                                                               Appendix II: Semantic Pointer Architecture
[43] Xu, Y., Mo, T., Feng, Q., Zhong, P., Lai, M.,
     Chang, E. I. -C.: Deep learning of feature representation                 As was already mentioned in section 1, one of the main
     with multiple instance learning for medical image analysis.               features of our method is the semantic pointer architecture
     ICASSP, 2014                                                              (SPA), which merges the symbolic and connectionist ap-
                                                                               proach. Artificial neural networks are very powerful adap-
Appendix I: Details of Image Processing
                                                                               tive tools, but their usage usually comes at the expense of
Unlike the major stream of Computer Vision, our approach                       losing detailed insight into how exactly the task is solved.
has to be unsupervised without any training data. Thus, we                     Such a drawback can be mitigated by using the SPA and
have no prior knowledge and the system has to learn every-                     its variable binding. It introduces composite symbols of
thing on-the-fly. Our system is a pipeline visualized and                      the form X bind x, where X is the name of the variable
implemented in Fig. 4. It consists of the following parts:                     and x is its value. It is then possible to train a network to
The input image is first                                                       perform complicated transforms such as:
segmented into a set
of super-pixels [22]                                                                       V ⊗ (X ⊗ x +Y ⊗ y) → C ⊗ (Y ⊗ x)                     (1)
(SP). Then, each SP
is connected to its                                                            which could be interpreted as an action selection rule for
vicinity constituting a                                                        the pong game:
graph where nodes are
SPs and edges connect Figure 7: Performance of SLIC                              Visual ⊗ (Ball ⊗ x1 + Paddle ⊗ x2 ) →
neighboring SPs. SPs (blue) and SLIC+graph optimiza-                                                         → Move ⊗ (Paddle ⊗ x1 ) (2)
with similar color are tion (red) w.r.t the number of seg-
merged into connected ments.                                                      If ball is seen at position x1 and paddle at x2 , execute
components. Note that                                                          command ’move paddle to position x1 ’. Without SPA it
while more SP speeds up the segmentation, it slows down                        would be much harder to maintain understanding of the
the graph optimization algorithm, see Fig. 7. Once we                          transformed symbols, if not entirely impossible.
have object proposals, we estimate an attention score (sA )
for each object, sA (oi ) = ψtime (oi ) + ψmove (oi ), where                   Goals Memory. The accumulated states g+ and g− are
ψtime (oi ) is time since we have focused on the object oi ,                   used for calculation of quality q of the state x, using dot
ψmove (oi ) is the object’s movement. The object with the                      product ’·’, q = g+ · x − g− · x, see Fig. 8 left.
highest sA is selected3 and its position together with its
size define an image patch. The image patch is processed                       Action Learning.        Training of action selection
into a CNN features [29], and this feature representation                      (Fig. 8 right) is delayed by one simulation step to
                                                                               allow the system to observe results of its actions. Actions
     3 Once the object is selected, its ψ ime is decreased and then it won’t
                                         t
                                                                               leading to higher quality states are labeled as correct,
be selected in the next time step.                                             otherwise incorrect.

</pre>