=Paper= {{Paper |id=Vol-1510/paper8 |storemode=property |title=A Network-based Communication Platform for a Cognitive Computer |pdfUrl=https://ceur-ws.org/Vol-1510/paper8.pdf |volume=Vol-1510 |dblpUrl=https://dblp.org/rec/conf/aic/NumanFPL15 }} ==A Network-based Communication Platform for a Cognitive Computer== https://ceur-ws.org/Vol-1510/paper8.pdf

A Network-based Communication Platform for a Cognitive Computer Mostafa W. Numan, Jesse Frost, Braden J. Phillips, and Michael Liebelt Centre for High Performance Integrated Technologies and Systems (CHiPTec) School of Electrical and Electronic Engineering The University of Adelaide Adelaide, Australia {mostafa.numan,jesse.frost,braden.phillips,michael.liebelt}@adelaide.edu.au Abstract. Street is a reconfigurable parallel computer architecture. It executes a production language directly in hardware with the aim of re- alising advanced cognitive agents in a more energy efficient manner than conventional computers. Street requires frequent communication between many processing elements and to make this communication more energy efficient, a network-based communication platform, StreetNet, is pro- posed in this paper. It maps the processing elements onto a 2D mesh ar- chitecture optimized according to the data dependencies between them. A deadlock-free deterministic routing function is considered for this plat- form along with the concept of sleep period, analogous to human sleeping, to reorganize the placements of processing elements based on runtime traffic statistics. These mechanisms serve to reduce total network traffic and hence minimise energy consumption. Keywords: Cognitive computer, computer architecture, networks-on- chip, mapping 1 Introduction Street is a reconfigurable, flat, parallel architecture designed for symbolic cog- nitive workloads [6]. The goal of Street is to find a new computer architecture that can take advantage of the huge number of transistors in modern integrated circuits to achieve advanced cognitive computation in real time, but with much lower power consumption than current computers. It is designed to use in real time embedded implementations of artificial general intelligence, exemplified by the plethora of potential autonomous robotics applications. The new machine is very different from conventional computers, consisting of many simple process- ing elements executing and communicating in parallel. A bus-based interconnect performs well in production systems with a small number of processing elements, or when groups of dependent productions are mapped to the same processor [1], however it does not scale well for more frequently interacting processing ele- ments. For chips with a large number of processing elements, network based communication provides better scalability, and is seen as the most efficient solu- tion [13]. In this paper, a network-based communication platform, StreetNet, is proposed for efficient communication among the processing elements of Street. 2 Street Street executes a parallel production language directly in hardware. This lan- guage, which we call Street Language, is inspired by Forgy’s OPS5 [5] and the languages used in the Soar [10] and ACT-R [3] cognitive architectures. However Street Language is different from all of these. Street is asynchronous, with no global match-select-act cycle as found in traditional production systems. This asynchronous model provides the best opportunity to parallelise traditional pro- duction systems in application level [2]. 2.1 Street Language An intelligent system is implemented using a set of production rules written in Street Language [6]. Each production rule is an if-then statement: if a specified pattern exists in working memory, then the rule makes some changes to working memory. Working memory is a set of tuples called working memory elements (WMEs). Each WME has one or more elements called attributes. For instance, the WME (ID17 source ID2) has 3 attributes: ID17, source, and ID2. Here is a simple example of working memory of just 3 WMEs: {(ID17 name Torrens), (ID17 source ID2), (isCounted ID5)} Each production rule consists of a left hand side (LHS) of one or more con- dition elements (CEs), and a right hand side (RHS) of one or more actions. In the example in Fig. 1, (

type dog) is a CE and (

isOld) is an ac- tion. A complex cognitive agent would consist of thousands of production rules operating on symbolic and numeric data in working memory. st {oldDogs (

type dog) // condition elements (

age ( > 7)) --> (

isOld) // actions } Fig. 1. A Street Language production rule A subset of working memory that satisfies all of the CEs in a production rule with consistent variable assignments is called an instantiation of the rule. So instantiations of the rule above will be pairs of WMEs in working memory such as: {(pet1 type dog), (pet1 age 8)}. Note that the first attributes must be Procedural Cognitive architecture Episodic & Memory production rules Semantic Memory Cognitive Architecture Street language Kernel Street language Productors Network on Chip I/O Working memory distributed among local memories in productors Street Engine Fig. 2. Hardware/software stack on Street (adopted from [6]) the same as they were specified by the same variable

. The WMEs must join on any shared variables. The actions of a production rule are performed for each new instantiation of the rule. An action such as (

isOld) adds a WME to working memory. Actions can also remove WMEs from working memory. This change in working memory may cause other production rules to instantiate, and all new instantiations are executed. 2.2 Street Architecture The Street architecture consists of a large number of identical and simple mi- crocoded processing elements (PEs) with a single production rule assigned to each. A PE contains a controller, a block of content-addressable memories (CAMs) and an Arithmetic Logic Unit (ALU). The PEs communicate using tokens to no- tify each other of changes to working memory. The local memory of a PE stores just the subset of working memory that may lead to instantiations of its rule. Each PE matches the associated production rule against its own local memory with an algorithm similar to TREAT [12]. For each incoming token, a PE up- dates the contents of its local memory, finds new instantiations (match), and outputs tokens corresponding to the rule’s actions (act). The controller coordi- nates the PE’s match-act cycle. Fig. 2 shows Street architecture executing an agent using a symbolic cognitive architecture. 3 StreetNet: The Communication Platform When a PE produces tokens these are transmitted to other PEs and may cause new rule instantiations and yet more tokens. There may be a large amount of token traffic between PEs or small clusters of PEs so efficient data interconnect is Routing Table Switch Dest. Table Productor Fig. 3. A 3 × 3 StreetNet structure required. For a device with a large number of PEs a network based interconnect, named the StreetNet, is proposed in this paper. 3.1 Network Architecture A regular tile-based 2D network architecture is considered for StreetNet. Each tile contains a single PE and router, however adjacent tiles may be linked to share memory resources (discussed below in 3.6). Every PE has a destination table generated from the dependency graph. It lists the desired destinations and corresponding tokens to those destinations. Each router is connected to its local PE and four neighbouring tiles. Each router also has a routing table that is checked for destination. The tokens are broken into packets and forwarded to the neighboring tile towards the destination. A crossbar switch is used as the switching fabric in the router. Fig. 3 shows the structure of a StreetNet. 3.2 Dependency Graph The mapping of PEs onto network architecture is based on a dependency graph. A dependency graph is a directed graph, where each vertex pi represents a PE. Directed arcs represent non-zero communication paths between two PEs and are assigned a weight characterising the communication rate between the PEs. Fig. 4 shows an example of a dependency graph of nine PEs. This graph is used to map the PEs onto the network so that the most dependent PEs are placed close together in the expectation this will reduce communication latency and power consumption. The PEs are sorted by total incoming and outgoing traffic that was recorded during runtime, and mapped in this order. This is useful since the positions of the PEs with high traffic requirement have higher impact on the overall energy consumption. This dependency graph is updated during a sleep period (described in subsection 3.5) depending on runtime traffic statistics. P3 0.0391742 0.0728398 P2 0.0256243 P4 0.0612313 0.027465 0.0130523 P1 0.0468928 P6 0.0448455 0.0322667 0.00800304 P5 P7 0.0285391 P8 P0 0.0256137 Fig. 4. An example of dependency graph with nine PEs 3.3 Mapping Techniques In [8], an energy aware mapping technique is proposed for networks-on-chip (NoC) with a regular architecture. This technique is adopted in the StreetNet. The average energy consumed in sending one bit of data from a tile to a neigh- boring tile is calculated as E = ES + EB + EL (1) where ES , EB and EL are the energy consumed at the switch, buffer and link. Since the energy consumed for buffering is negligible compared to EL [8], (1) becomes E = ES + EL (2) Now, if the bit traverses n hops to reach tile tj from tile ti , the average energy consumption is Eti ,tj = n × ES + (n − 1) × EL (3) For a system that involves a large number of processing elements, it is impor- tant to adopt efficient mapping and routing techniques so that total energy consumption is minimized and communication traffic does not exceed available bandwidth. Two different mapping techniques have been considered in this work: one is based on Simulated Annealing (SA) and the second is based on Branch- and-bound (BB) technique. Simulated Annealing based Mapping Simulated Annealing (SA) [9] is a well known technique for solving optimization problems. It effectively optimizes solutions over large state spaces by making iterative improvement. It has a con- cept of temperature which is initially very high and keeps reducing in every step until it reaches the minimum temperature. For each temperature, it starts with current_temp=MAX_TEMPERATURE; previous_cost=INITIAL_COST; current_mapping=randomMapping(); current_cost=cost(current_mapping); do{ while(attempts