J. Yaghob (Ed.): ITAT 2015 pp. 127–134 Charles University in Prague, Prague, 2015 Platform for Rapid Prototyping of AI Architectures Peter Hroššo, Jan Knopp, Jaroslav Vítků, and Dušan Fedorčák GoodAI, Czech Republic contact author: peter.hrosso@keenswh.com Abstract: Researching artificial intelligence (AI) is a big endeavour. It calls for agile collaboration among research teams, fast sharing of work between developers, and the easy testing of new hypotheses. Our primary contribution is a novel simulation platform for the prototyping of new algorithms with a variety of tools for visualization and de- bugging. The advantages of this platform are presented within the scope of three AI research problems: (1) mo- tion execution in a complex 3D world; (2) learning how to play a computer game based on reward and punish- ment; and (3) learning hierarchies of goals. Although there are no theoretical novelties in (1,2,3), our goal is to show with these experiments that the proposed platform is not Figure 1: Print-screen of our simulation platform. just another ANN simulator. This framework instead aims to provide the ability to test proactive and heterogeneous modular systems in a closed-loop with the environment. In this article, we would like to present our attempt to Furthermore, it enables the rapid prototyping, testing, and create such a tool. We here introduce a platform which sharing of new AI architectures, or their parts. allows: • Easy prototyping of new models and fast sharing of existing ones (Sec. 4.1) 1 Introduction and Related Work • Control of an agent in an environment on top of clas- The recent boom in the field of artificial intelligence (AI) sic data processing (Sec. 3.3) was brought on by advances in so-called narrow AI, rep- resented by highly specialized and optimized algorithms • Modular approach – seamless connecting of models designed for solving specific tasks. Such programs can inside a greater architecture (Sec. 2.1) even sometimes surpass human performance when solv- • Various tools for the visualization of data (Fig. 2) ing the single problem for which they were created. But these narrow AI programs lack one feature which has been • Simplified debugging (Sec. 2.2) so far widely omitted, partly due to its overwhelming dif- ficulty: generality. • User Friendly GPU programming (Sec. 2.1) In order to compensate for this deficiency, the field of • Scalable due to GPU parallel computation artificial general intelligence (AGI) is bringing the focus back to broadening the range of solvable tasks. The ul- • Support of several scenarios such as an agent in an timate goal of AGI is therefore the creation of an agent environment, classification, tic-tac-toe etc. which can perform well (at human level or better) at any task solvable by a human. For a more detailed description Our platform also includes a variety of visualization of AI/AGI, see e.g. [23]. tools and enables easy access to diverse data sets not only Pursuing such a goal is a hard task. According to the for tasks such as image classification or recognition, but scientific method – the only guideline we have – we need also scenarios where an agent interacts with its environ- to come up with new theories, design experiments for test- ment. Last but not least, it is open source and freely avail- ing them, and evaluate their results. Such a cycle needs able under a non-commercial license1 . The primary goal to be repeated often, because it can be expected to reach of this tool is easy collaboration among both specialists more dead ends than breakthroughs. We don’t know how and laymen for developing novel algorithms, especially in to increase the rate of coming up with new ideas, but what the field of AI. can be improved is the efficiency of research. What we There are tools, languages, and libraries that are good need is better tools which will simplify the implementa- in particular areas. In the research community, widely tion of new theories, speed up experiments, and help us 1 The platform is available as Brain Simulator at http://www. understand the results better by visualizing obtained data. goodai.com/brainsimulator 128 P. Hroššo, J. Knopp, J. Vítků, D. Fedorčák used are Matlab [9] and Python [17] for prototyping by (a) code, platforms that aim at high level graphical model- ing (Simulink [20], Software Architect [8]), data analy- (b) sis (Azure [10], Rapid Miner [18]), advanced visualization (ParaView [15]), rich graphical user interface (Blender [2], Maya [1]), modular computation (ROS [19]), or specific li- (c) braries for sharing [7] and parallel computation [3]. Each (d) of these instruments is important in their specific domain, but there is none which would cover under one roof the most prominent features of all of those mentioned. Our Figure 2: Visualization of data. Left: memory block platform is an attempt to fill this niche, and offers both of a 1 × 3 matrix can be visualized as: (a) value, high-level graphical coding and possibly, but not neces- (b) color-map, (c) value of each element in time, or sarily, also low-level (i.e. CUDA [3]) programming. (d) value as a color-map in time. Right: more compli- There are several tools for simulating neural networks cated task-specific visualizations can be implemented too, (NN). Nengo [11] or OpenNN [14] focus on experiment- i.e.growing neural gas [26]. ing with all possible modifications of NN. Unfortunately, running, it often happens that its outcome is not what was they either lack in visualization or focus on over-specific expected. Such a simulation platform could not be imag- design approaches. Moreover, usage of these tools often ined without a tool for runtime analysis of algorithms. requires extensive programing knowledge and installation For this purpose various data observers can be displayed so of extension packages [4, 13]. In contrast, other tools that the experimenter can visualize the computed data, evalu- provide rich visualization focus purely on the functions of ate performance of the model, and change its parameters our brain. For example, PSICS [16] uses 3D synapse vi- during runtime if needed. sualization to show data flow in parts of the brain, Digi- Our platform is tailored to suit two different points of Cortex [6] nicely visualizes spike activations of the whole view of the architecture development process: brain in time, and Cx3D [5] simulates growth of the cor- tex in 3D. Extensive comparison of various neural network • A user who desires quick architecture modeling and simulators, including our platform (Brain Simulator), can needs fast access to already existing state-of-the-art be found in [12]. modules (such as PCA, NN, image pre-processing It is worth noting that the proposed platform is not lim- etc.) to experiment with. This perspective requires ited to the design of neural networks only. Any algorithm no coding and it’s done through graphical modeling. useful for AI, machine learning, or control can be incorpo- Furthermore, it is often crucial to have good insight rated (various mathematical transformations, filters, PID into the running model, and thus a large set of visual- controller, image segmentation, hashing functions, dictio- ization tools is available (Fig. 2). nary, etc., are already included). The heterogeneous char- • On the other hand, a researcher/developer often re- acter of the platform is its main advantage. quires the creation of a new module or the import of Throughout this work, we describe our platform in the an already existing library. Our API provides an easy following Sec. 2. In Sec. 3, tasks where we show advan- way for such a module to be created and added to tages of the platform are introduced. In Sec. 4, we discuss the inner shared repository. Moreover, the API offers the experience with our tool and its advantages and weak- an opportunity to hook the code to the GUI and bring ness in the testing scenarios. The paper is concluded in needed interactivity. Finally, the API defines a rigid Sec. 5. interface, ensuring that the new module will be com- patible with other modules. 2 Simulation Platform The platform was designed to meet both needs. It is im- Our modus operandi reflects our goals - we are aiming for portant to distinguish between them as a user can be a per- a modular cognitive architecture, so we needed an environ- son interested in machine learning, but less experienced in ment which would efficiently support the whole life-cycle programming. Our platform can be a good starting point, of experiments, starting with the testing of already existing and the learning curve should therefore be smooth enough algorithms, going through the design of a new algorithm, to bring the person in effortlessly. and ending with a results evaluation. We developed a plat- From the experienced researcher/developer point of form where various algorithms from machine learning and view, the platform should provide a convenient set of tools narrow AI are available. It is easy to pick some, connect that can help with the development of novel algorithms them, and start experimenting effortlessly. Agile develop- and/or be able to envelope existing work into module that ment requires frequent testing of new hypotheses, which can be easily shared among a team. is facilitated by an easy way of prototyping new modules Finally, the community-driven approach renders itself for the platform as well as their fast training and evalua- very powerful and we believe that it can speed up the re- tion (accelerated on GPU) on data. After the experiment is search vastly. For this reason, we are planning to in-build Platform for Rapid Prototyping of AI Architectures 129 a “module market” to allow for the sharing of state-of- Before running the simulation, the order of nodes exe- the-art research results between many co-working teams. cution needs to be evaluated. There are other aspects that level the problem up (e.g. inner cycles, clustering and bal- ancing of the model in HPC environment) but it usually 2.1 Platform Meta Model boils down to various forms of dependency ordering, cy- There are three basic concepts defined in the meta model: cle detection, or the job shop problem [34]. Solving these a node, a task and a memory block. The node encapsulates tasks is automated and user/developer assistance is usually a functional block or algorithm that can “live” on its own discouraged, but there are use cases where user aid is nec- (e.g. matrix operations, data transformations, various ma- essary or can simplify the problem substantially. chine learning models, etc.). A node needs a memory for There is also another view of the problem of execution its function. The memory is organized into a set of mem- order when faced in the area of machine learning. It turns ory blocks that are aggregated inside the node. Some of out that many of ML methods are surprisingly noise resis- these memory blocks can be designated as output blocks tant (i.e.neural nets). Therefore, if approached with cau- and others as input blocks. The connection between input tion, one can run the model asynchronously and let inner and output memory blocks is provided by the user. parts of the model deal with sometimes temporally incon- From the functional point of view, the node behaviour sistent data. We made some experiments and the prelim- can usually be divided into a set of tasks where each inary results show that relatively complex models can be task is a part of the realized algorithm. Both nodes and run completely without synchronization. tasks can define a set of parameters. Usually, node pa- Another aspect of the model execution is GPU enhanced rameters describe structural properties (i.e. size of mem- computation which can speed up the simulation substan- ory blocks) whereas task parameters affect behavior. At tially. The main purpose of our simulation platform is fast present, the memory model is constant during the simula- prototyping and testing of hypotheses. With increasing tion, and therefore structural properties are editable only in generality the efficiency usually decreases, so one should design time. On the other hand, it is useful to change task not expect top execution speed from our simulation plat- parameters during simulation and observe changes in form. The devised practice is to design, test, and analyze behavior of the algorithm/node. new architectures, and once the final model is tested and Memory blocks are located at GPU (device memory) working, it can be replaced by a specialized, highly op- and every task can be seen as a collection of kernel calls timized implementation still within the platform environ- (methods executed on GPU). If two nodes are connected ment. Finally, it can be argued that the overall time neces- in the GUI, it means that they have a pointer to the same sary to get from an idea to the final product is much shorter memory block (input in one node, output in the other). compared to the classic approach of writing a specific pro- If one requires dynamically allocated memory, the user gram from scratch for each new experiment. can either define a memory block that is large enough, or An important part of the development process is the implement the node only for the CPU (which is more flex- easy visualization of what is happening at each part of the ible than GPU) using all data structures supported by C#. designed system. This is especially important for debug- The only mandatory requirement is usage of input/output ging as the most frequent problem is due to the difference memory blocks. between what the programmer thinks the program should All concepts described above can be easily implemented do and what it does in reality. In addition to the variety through rich API that is provided. The actual implemen- of observers that have already been discussed (Fig. 2), the tation relies heavily on annotated code describing various platform contains its own debugger, where one can walk aspects of the model (UI interactivity, constraints, persis- through the execution of all components used in the model. tence, etc.). It allows the user to be extremely efficient in creating model prototypes. Sometimes, this can lead to unreadable, over-annotated code which is hard to maintain 3 Testing Scenarios but this can be eliminated by applying standard software design patterns like MVC when needed. Whether the ultimate goal of AGI (a general autonomous machine) is achievable or not [35], researchers focus on 2.2 Computation its sub-goals such as learning how to play games [37], etc. One of the prominent building blocks for these sub- As described above, the prototyped model forms an ori- goals are neural networks in the form of deep learning ented graph with nodes and data connection edges. As the and CNN, and which have recently made big progress connections between nodes can be any of M → N and re- in speech recognition [40], computer vision [33], med- current connections are also possible, the resulting graph ical analyses [43], or language translation [28]. Le and can be very complex. Moreover, the usual model is con- colleagues [36] used a deep network to learn in unsuper- nected to the world node from which “perception” inputs vised manner what an ordinary “cat” looks like only by are taken, and control outputs are passed, forming the main watching youtube videos. While NN can also learn how to loop of the simulation. play simple games [32, 37], they usually fail in structured 130 P. Hroššo, J. Knopp, J. Vítků, D. Fedorčák problems which demand learning hierarchies or chains of a six-legged robot within the game and connected it bi- goals. From this perspective, it seems promising to fo- directionally with our simulation platform. In one direc- cus on machines which can control another machine, such tion the game sends visual data from the robot’s view and as NN that learn how to control a Turing Machine [27]. a description of the state of the robot’s body. In the other They designed a neural network which learns a procedure direction motoric commands from our control module in- to control a Turing Machine to sort numbers. side the simulation platform are sent to the robot, which As the goal of this paper is to provide a tool that short- executes them in the game. cuts the research path to an autonomous machine, we The control mod- will show how it performs on three selected AI tasks ule was trained to solved by our team: learning motion control, game play- associate visual ing of the Atari game Breakout, and learning hierarchies input with motor of goals. Our solutions are highly inspired by current ma- commands in a chine learning literature with a stress on the usage of neu- supervised way. ral networks, which are one of the basic building blocks The associative for bigger architectures. memory was im- The first experiment (Sec. 3.1) will demonstrate how plemented with a our platform can be connected to an external source of Self-Organizing input data and how various modules of narrow AI can be Figure 3: Overall architec- Map [31], which combined together to form a functioning system which can ture. Raw visual signals are found the most sim- drive a robot in a virtual world with simulated physics. processed into symbols, which ilar representative of are then added to the working the received input in The second experiment (Sec. 3.2) will be situated in memory. States corresponding the visual memory much simpler simulation environment – an Atari [38] to reward and punishment are and returned the game called Breakout. In this experiment a more advanced accumulated and later used as associated high-level adaptive system will be showcased. The system works di- teaching signals for training the motoric command rectly on raw image input. It takes advantage of the se- action selection network. (turn left/right, move mantic pointer architecture [25] for representing its per- ceptions and for converting them into a long term memo- forward/backward). These high-level commands were ries such as goals. This knowledge is then used for learn- then unrolled into sequences of body states consisting of ing necessary actions for playing the game. joint angles of all of the robot’s limbs using a recurrent In the third experiment (Sec. 3.3), we move a bit higher neural network (RNN) [39]. These body states were in the level of abstraction. The presented problem con- afterwards used as waypoints for a control RNN which sists of an agent in a simple 2D environment which needs was trained to act as an inverse dynamics model of the to satisfy a chain of preconditions before reaching a re- robot’s body. In order to reach a specified waypoint, the ward, such as if the agent wants to turn on a light, it needs control network generated full motoric inputs to the robot to press a switch, but to get to the switch he also needs - the desired angular velocities of joints. to overcome an obstacle (a door controlled by another switch). The task is solved by hierarchical reinforcement The training learning [30]. phase consisted of a mentor To clarify why we selected these experiments, one could leading the imagine the three systems as parts of a future higher-order robot from cognitive architecture where they will work together. The various start- system from the first experiment could be thought of as ing locations a basic motoric and sensory system driven by reflexes and towards a goal higher-level commands. These would come from the sys- location in the tem used in the second experiment, which would allow the environment, which was identified by an easily distin- agent to learn how to reach a specific goal. And finally, the guishable 3D symbol located on that position. The mentor third system should discover the hierarchy of goals and was implemented by a hard-coded navigation system. In preconditions, and thus could resemble a simplified ver- this way the hexapod was trained to look for the goal sion of the agent’s central executive. Such connection of symbol and when it appeared in the robot’s field of view, the systems remains for our future work. to navigate successfully towards the destination through the environment. 3.1 SE Robot 3.2 Atari Game We took advantage of a sandbox game called Space En- gineers [21], which provides a physically realistic 3D en- The Breakout game was chosen as our second testing vironment where various structures can be built. We built scenario. The game consists of a ball, a paddle, and Platform for Rapid Prototyping of AI Architectures 131 segmentation attention score CNN features graph patch working input image optimization extraction memory (a) The goal (b) Legend Figure 5: Multiple goals. (a) The agent’s current goal is to reach the light switch and turn on the lights. (b) objects of the environment Figure 4: Top: Image processing. First, an input image is segmented into super-pixels (SP) using SLIC [22]. Sec- Details of the vision system and semantic point archi- ond, each SP is connected with its neighbors and nearby tecture are described in Appendix II. SP are assigned into the same object id. Third, the at- tention score (sA ) is estimated for each object. Fourth, features are estimated for the object with the highest sA 3.3 2D World with Hierarchical Goals by a hierarchy of convolutions and fully-connected lay- ers. Fifth, the object features are clustered into Visual In previous testing scenarios we wanted to test if our sys- Words [41] to constitute a “Working Memory”. Bottom: tem was able to coordinate complex motoric commands Corresponding implementation in our platform. in a 3D environment, learn simple goals, and act towards maximizing the received reward. Our goal for the third bricks. The ball bounces from walls, can destroy bricks, scenario is to increase the generality of the designed sys- and can fall to the ground, for which the player is pe- tem to enable identification and satisfaction of chained nalized by losing a life. After losing 4 lives, the game preconditions before the final goal can be reached. is over. When all bricks are destroyed, the player We present a task, consisting of a simple 2D world, successfully finishes the level and where a single source of reward is located – a light bulb, enters the next one consisting of which starts in the “off” position and should be turned on a different arrangement of bricks. by the agent. This can be achieved by pressing a switch, The player has three actions avail- but the switch is hidden behind a locked door. The door able which accelerate the paddle to can be unlocked through a switch, but this switch is hid- the right, to the left, or decelerate. den behind another locked door. It would be possible to Even though our modular approach chain the preconditions further in this manner, but without uses pure unstructured data input the loss of generality we use only two locked doors with (raw image as in [37]), it later extracts the structure, so two matching switches. The setup can be seen in Fig. 5. we can understand the inner workings of the model as op- We approached this problem by employing HARM (Hi- posed to the cited work. The architecture of the system erarchical Action Reinforcement Motivation system) [30]. consists of four main parts: image processing, working It is an approach based on a combination of a hierarchical memory, accumulators of reward and penalty, and an ac- Q-learning algorithm [42] and a motivation model. The tion selection network (Fig. 3). system is able to learn and compose different strategies in Relevant information about the objects is extracted from order to create a more complex goal. the raw bitmap in the Vision System (Fig. 4). Q-learning is able to “spread information about the re- Working memory (WM) is the agent’s internal repre- ward” received in a specific state (e.g. the agent reaching sentation of the environment. It contains all of the objects a position on the map) to the surrounding space, so the detected by vision. WM is kept up to date by adding new brain can take proper action by climbing the steepest gra- objects which haven’t been seen yet, and by updating those dient of the Q function later. However, if the goal state is already seen. The identity of objects is detected through far away from the current state, it might take a long time a comparison of visual features. Contents of the work- to build a strategy that will lead to that goal state. Also, ing memory are transformed into a symbolic representa- a high number of variables in the environment can lead to tion and passed to the goals memory and action selection extremely long routes through the state space, rendering network. the problem almost unsolvable. The goals memory is trained by accumulating states as- There are several ideas that can improve the overall per- sociated with reward and punishment in their respective formance of the algorithm. First, this agent rewards itself semantic pointers, goal+ and goal- . These are then used for any successful change to the environment. The motiva- for evaluating the quality of game-states, which is neces- tion value can be assigned to each variable change so the sary for training the action selection network. agent is constantly motivated to change its surroundings. 132 P. Hroššo, J. Knopp, J. Vítků, D. Fedorčák the persistence capabilities allow easy sharing of whole models (projects) and merging of them together. + It is easy to replace an existing module with its im- proved version, as the architecture is separated from the implementation. As the backward compatibility becomes crucial at this point, the inner versioning system was im- plemented. + Provided interface drives users to follow design pat- Figure 6: Learnt strategy. Visualization of the agent’s terns when developing low-level optimized modules. knowledge for a particular task, which changes the state - Current version runs on MS Windows only, but a port of the lights. It tells the agent what to do in order to to MacOS and Linux is planned for the future. change the state of the lights in all known states of the - The user has an option to either develop optimized world. The heat map corresponds to the expected utility modules in code (CUDA [3] or C#) or use the graphical (“usefulness”) of the best action learned in a given state. interface for connecting existing modules into bigger ar- A graphical representation of the best action is shown at chitectures. There is no middle layer which would support each position on the map. scripting. Second, for each variable that the agent is able to change, it creates a Q-learning module assigned to the 4.1 Experience of Newcomers variable (e.g. changing the state of a door). Therefore, it The expertise of people that have started to use our plat- can learn an underlying strategy defining how this change form varies from C++ experts to Matlab users only. We can be made again. In such a system, a whole network found that users with a very short training can connect ex- of Q-learning modules can be created, where each module isting modules into simple architectures (like neural net- learns a different strategy. work MNIST image recognizer) as the graphical model- Third, in order to lower the complexity of each sub- ing is somehow natural and easy to understand. The linear problem (strategy), the brain can analyze its “experience learning curve of newcomers is supported by a video tu- buffer” from the past and eventually drop variables that torial as well as several examples of how to implement are not affected by its actions or are not necessary for the simple and more advanced tasks2 . current goal (i.e. strategy to fulfill the goal). For development of new modules, it is necessary to un- A mixture of these improvements creates a hierarchical derstand a programming language (C#, C++, CUDA) at decision model that is built online (first, the agent is left to the basic level at least. Once the definitions of inputs, out- (semi-)randomly explore the environment). After a suffi- puts, tasks, and kernels (four lines of code each) are un- cient amount of knowledge is gathered, we can “order” the derstood, developers soon start creating their own nodes. agent to fulfill a goal by manually raising the motivation Their learning curve then equals learning how to use a new that corresponds to a variable that we want to change. The library. agent then will execute the learned abstract action (strat- egy) by traversing the network of Q-learning modules and 4.2 Our Observations on the Testing Scenarios unrolling it into a chain of primitive actions that lie at the bottom. In the first scenario (Sec. 3.1), we have shown that our platform successfully connects with the open source game Space Engineers [21]. Modules created in the platform 4 Discussion controlled the hexapod in the world of the game. It was the understanding of the game’s communication module Throughout the work on the testing scenarios (Sec. 3) we that took the most time in this case. Otherwise the devel- have observed several advantages and weakness of the opment of the controller did not raise any challenges for platform. In this section, our experience with the usage the platform. of the platform is discussed. The discussion is focused es- The second scenario (Sec. 3.2) consisted of several pecially on the end-user experience, i.e. experience of a modules that were developed independently. We found person that did not develop the platform but wants to use it extremely useful that each module communicates with it for solving her problem. others using only the pre-defined interface (memory First, a list of identified features is presented, and then blocks) that correspond to a sketched diagram (i.e. Fig. 4). further experience is described. Modules were merged into one big architecture right be- + Fast and easy online observation and interaction with fore the deadline without any complications. As the fi- the simulation. nal model was quite large and performance-demanding, + Created modules can be easily understood and shared 2 Documentation available at http://docs.goodai.com/ with collaborators due to the same interface. Moreover, brainsimulator/ Platform for Rapid Prototyping of AI Architectures 133 we were forced to profile, find bottlenecks, and optimize [9] MATLAB: The language of technical computing. Available in the process. It was extremely useful to visualize data at http://www.mathworks.com/products/matlab/. flowing between (and inside) the modules. [10] Microsoft Azure. Available at http://azure. In the third scenario (Sec. 3.3), HARM constituted a microsoft.com/en-us/. single module with complex insides. Therefore this sce- [11] The nengo neural simulator. Available at http://nengo. nario presented an ideal example for designing a num- ca/. ber of task-specific visualization tools (for example, the [12] Neural networks simulators. Available at https://goo. agent’s knowledge in Fig. 6). gl/hRf4KA. [13] OpenCV: Open source computer vision. Available at http://opencv.org/. 5 Conclusion [14] OpenNN: Open neural networks library. Available at We have presented a platform for prototyping AI architec- http://www.intelnics.com/opennn/. tures. The platform is tailored both for users with no math- [15] ParaView. Available at http://www.paraview.org/. ematical/programming background but with a high desire [16] PSICS: The parallel stochastic ion channel simulator. to experiment with AI modules, and for researchers/de- Available at http://www.psics.org/. velopers who want to improve and experiment with their [17] Python. Available at https://www.python.org/. existing state-of-the-art techniques. [18] Rapid miner. Available at https://rapidminer.com/. To show the usage of our platform we have presented [19] ROS. Available at http://www.ros.org/. three development scenarios: linkage with a 3D game [20] Simulink: Simulation and model-based design. Available at world and controlling an agent there; playing an Atari http://www.mathworks.com/products/simulink/. game using the raw bitmap input processed by computer [21] Space Engineers, open source code. Available at https:// vision techniques, attention model, and semantic pointer github.com/KeenSoftwareHouse/SpaceEngineers. architecture; and learning a complex hierarchy of goals. [22] Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., The proposed platform opens up possibilities to share Susstrunk, S.: Slic superpixels compared to state-of-the- ideas not only within the community but also with non- art superpixel methods. PAMI (2012), 2274–2282 experts who can boost the research via rapid testing, or [23] Ben, G.: Artificial general intelligence: concept, state of utilize fresh, out-of-the-box solutions. There is also the the art, and future prospects. Journal of Artificial General prospect of support from the open source community - if Intelligence 5 (2014), 1–48 not directly in the development, then at least in assessing [24] Bishop, C. M.: Pattern recognition and machine learning. missing features, so we can incorporate them and thus pro- Springer, 2006 vide a tool that can be used at many levels of expertise. [25] Eliasmith, C.: How to build a brain: a neural architecture We believe that by providing an open platform for AI for biological cognition (Oxford Series on Cognitive Mod- and ML experiments along with a smooth learning curve, els and Architectures). Oxford University Press, 2013 we can bring together many enthusiasts across different [26] Fritzke, B.: A growing neural gas network learns topolo- fields of interest, potentially leading to unexpected ad- gies. In: NIPS, 1995 vancements in research. [27] Graves, A., Wayne, G., Danihelka, I.: Neural turing ma- chines. CoRR, 2014 Acknowledgement. This material is based upon work [28] He, X., Gao, J., Deng, L.: Deep learning for natural supported by GoodAI and Keen Software House. language processing and related applications (tutorial at ICASSP). ICASSP, 2014 [29] Huang, F. J., Boureau, Y. L., Lecun, Y.: Unsupervised References learning of invariant feature hierarchies with applications [1] Autodesk maya. Available at http://www.autodesk. to object recognition. In: CVPR, 2007 com/products/maya/overview. [30] Kadlecek, D., Nahodil, P.: Adopting animal concepts in hi- [2] Blender. Available at https://www.blender.org/. erarchical reinforcement learning and control of intelligent agents. In: Proc. 2nd IEEE RAS & EMBS BioRob, 2008 [3] CUDA. Available at https://developer.nvidia.com/ cuda-zone. [31] Kohonen, T., Schroeder, M. R., Huang, T. S.: Self- organizing maps. 3rd edition, 2001 [4] CVX: Software for Disciplined Convex Programming. Available at http://cvxr.com/. [32] Koutník, J., Cuccu, G., Schmidhuber, J., Gomez, F.: Evolv- ing large-scale neural networks for vision-based reinforce- [5] Cx3D: Cortex simulation in 3D. Available at http://www. ment learning. In: GECCO, 2013 ini.uzh.ch/~amw/seco/cx3d/. [33] Krizhevsky, A., Sutskever, I., Hinton, G. E.: Imagenet [6] DigiCortex: Biological neural network simulator. Available classification with deep convolutional neural networks. In: at http://www.dimkovic.com/node/1. NIPS, Curran Associates, Inc., 2012 [7] GitHub. Available at https://github.com/. [34] Tsaia, M. -J., Chang, H.-Y., Huang, K. -C., Huanga, T -C., [8] IBM Rational Software Architect. Available at Tung, Y. -H.: Moldable job scheduling for hpc as a service http://www.ibm.com/developerworks/downloads/ with application speedup model and execution time infor- r/architect/index.html. mation. Journal of Convergence, 2013 134 P. Hroššo, J. Knopp, J. Vítků, D. Fedorčák [35] Kurzweil, R.: The singularity is near: when humans tran- scend biology, 2006 [36] Le, Q., Ranzato, M. ’A., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., Ng, A.: Building high-level features using large scale unsupervised learning. In: ICML, 2012 [37] Volodymyr et al. Mnih: Human-level control through deep reinforcement learning. Nature 518 (2015), 529–533 [38] Naddaf, Y.: Game-independent AI agents for playing Figure 8: Left: Goal+ and goal- . States associated with Atari 2600 console games. Masters, University of Alberta, received reward and punishment are accumulated in semantic 2010 pointers goal+ and goal- , respectively. Right: Action learning. Actions selection is learnt in a supervised way with fitness com- [39] Rojas, R.: Neural networks: a systematic introduction. puted from goal+ and goal- as a teaching signal. Springer-Verlag New York, Inc., New York, NY, USA, 1996 is then clustered into a “Working Memory” (WM). The [40] Sak, H., Vinyals, O., Heigold, G., Senior, A., McDer- WM stores feature id together with the object position for mott, E., Monga, R., Mao, M.: Sequence discriminative the last 10 seen objects. distributed training of long short-term memory recurrent For CNN features, we used two convolutions layers of neural networks. In: Interspeech, 2014 8 and 5 neurons and patch sizes 5 × 5 followed by a fully- [41] Sivic, J., Zisserman, A.: Video Google: A text retrieval connected layer of 16 neurons. Learning converged in approach to object matching in videos. In: ICCV 2 (2003), 1470–1477 6K iterations. We observed no performance improvement with bigger networks. WM was implemented as a simple [42] Sutton, R. S., Barto, A. G.: Introduction to reinforcement learning. MIT Press, Cambridge, MA, USA, 1st edition, K-means [24]. 1998 Appendix II: Semantic Pointer Architecture [43] Xu, Y., Mo, T., Feng, Q., Zhong, P., Lai, M., Chang, E. I. -C.: Deep learning of feature representation As was already mentioned in section 1, one of the main with multiple instance learning for medical image analysis. features of our method is the semantic pointer architecture ICASSP, 2014 (SPA), which merges the symbolic and connectionist ap- proach. Artificial neural networks are very powerful adap- Appendix I: Details of Image Processing tive tools, but their usage usually comes at the expense of Unlike the major stream of Computer Vision, our approach losing detailed insight into how exactly the task is solved. has to be unsupervised without any training data. Thus, we Such a drawback can be mitigated by using the SPA and have no prior knowledge and the system has to learn every- its variable binding. It introduces composite symbols of thing on-the-fly. Our system is a pipeline visualized and the form X bind x, where X is the name of the variable implemented in Fig. 4. It consists of the following parts: and x is its value. It is then possible to train a network to The input image is first perform complicated transforms such as: segmented into a set of super-pixels [22] V ⊗ (X ⊗ x +Y ⊗ y) → C ⊗ (Y ⊗ x) (1) (SP). Then, each SP is connected to its which could be interpreted as an action selection rule for vicinity constituting a the pong game: graph where nodes are SPs and edges connect Figure 7: Performance of SLIC Visual ⊗ (Ball ⊗ x1 + Paddle ⊗ x2 ) → neighboring SPs. SPs (blue) and SLIC+graph optimiza- → Move ⊗ (Paddle ⊗ x1 ) (2) with similar color are tion (red) w.r.t the number of seg- merged into connected ments. If ball is seen at position x1 and paddle at x2 , execute components. Note that command ’move paddle to position x1 ’. Without SPA it while more SP speeds up the segmentation, it slows down would be much harder to maintain understanding of the the graph optimization algorithm, see Fig. 7. Once we transformed symbols, if not entirely impossible. have object proposals, we estimate an attention score (sA ) for each object, sA (oi ) = ψtime (oi ) + ψmove (oi ), where Goals Memory. The accumulated states g+ and g− are ψtime (oi ) is time since we have focused on the object oi , used for calculation of quality q of the state x, using dot ψmove (oi ) is the object’s movement. The object with the product ’·’, q = g+ · x − g− · x, see Fig. 8 left. highest sA is selected3 and its position together with its size define an image patch. The image patch is processed Action Learning. Training of action selection into a CNN features [29], and this feature representation (Fig. 8 right) is delayed by one simulation step to allow the system to observe results of its actions. Actions 3 Once the object is selected, its ψ ime is decreased and then it won’t t leading to higher quality states are labeled as correct, be selected in the next time step. otherwise incorrect.