=Paper=
{{Paper
|id=Vol-1510/paper8
|storemode=property
|title=A Network-based Communication Platform for a Cognitive Computer
|pdfUrl=https://ceur-ws.org/Vol-1510/paper8.pdf
|volume=Vol-1510
|dblpUrl=https://dblp.org/rec/conf/aic/NumanFPL15
}}
==A Network-based Communication Platform for a Cognitive Computer==
https://ceur-ws.org/Vol-1510/paper8.pdf
A Network-based Communication Platform for a
Cognitive Computer
Mostafa W. Numan, Jesse Frost, Braden J. Phillips, and Michael Liebelt
Centre for High Performance Integrated Technologies and Systems (CHiPTec)
School of Electrical and Electronic Engineering
The University of Adelaide
Adelaide, Australia
{mostafa.numan,jesse.frost,braden.phillips,michael.liebelt}@adelaide.edu.au
Abstract. Street is a reconfigurable parallel computer architecture. It
executes a production language directly in hardware with the aim of re-
alising advanced cognitive agents in a more energy efficient manner than
conventional computers. Street requires frequent communication between
many processing elements and to make this communication more energy
efficient, a network-based communication platform, StreetNet, is pro-
posed in this paper. It maps the processing elements onto a 2D mesh ar-
chitecture optimized according to the data dependencies between them.
A deadlock-free deterministic routing function is considered for this plat-
form along with the concept of sleep period, analogous to human sleeping,
to reorganize the placements of processing elements based on runtime
traffic statistics. These mechanisms serve to reduce total network traffic
and hence minimise energy consumption.
Keywords: Cognitive computer, computer architecture, networks-on-
chip, mapping
1 Introduction
Street is a reconfigurable, flat, parallel architecture designed for symbolic cog-
nitive workloads [6]. The goal of Street is to find a new computer architecture
that can take advantage of the huge number of transistors in modern integrated
circuits to achieve advanced cognitive computation in real time, but with much
lower power consumption than current computers. It is designed to use in real
time embedded implementations of artificial general intelligence, exemplified by
the plethora of potential autonomous robotics applications. The new machine is
very different from conventional computers, consisting of many simple process-
ing elements executing and communicating in parallel. A bus-based interconnect
performs well in production systems with a small number of processing elements,
or when groups of dependent productions are mapped to the same processor [1],
however it does not scale well for more frequently interacting processing ele-
ments. For chips with a large number of processing elements, network based
communication provides better scalability, and is seen as the most efficient solu-
tion [13]. In this paper, a network-based communication platform, StreetNet, is
proposed for efficient communication among the processing elements of Street.
2 Street
Street executes a parallel production language directly in hardware. This lan-
guage, which we call Street Language, is inspired by Forgy’s OPS5 [5] and the
languages used in the Soar [10] and ACT-R [3] cognitive architectures. However
Street Language is different from all of these. Street is asynchronous, with no
global match-select-act cycle as found in traditional production systems. This
asynchronous model provides the best opportunity to parallelise traditional pro-
duction systems in application level [2].
2.1 Street Language
An intelligent system is implemented using a set of production rules written in
Street Language [6]. Each production rule is an if-then statement: if a specified
pattern exists in working memory, then the rule makes some changes to working
memory. Working memory is a set of tuples called working memory elements
(WMEs). Each WME has one or more elements called attributes. For instance,
the WME (ID17 source ID2) has 3 attributes: ID17, source, and ID2. Here is
a simple example of working memory of just 3 WMEs:
{(ID17 name Torrens), (ID17 source ID2), (isCounted ID5)}
Each production rule consists of a left hand side (LHS) of one or more con-
dition elements (CEs), and a right hand side (RHS) of one or more actions.
In the example in Fig. 1, ( type dog) is a CE and (
isOld) is an ac-
tion. A complex cognitive agent would consist of thousands of production rules
operating on symbolic and numeric data in working memory.
st {oldDogs
(
type dog) // condition elements
(
age ( > 7))
-->
( isOld) // actions
}
Fig. 1. A Street Language production rule
A subset of working memory that satisfies all of the CEs in a production rule
with consistent variable assignments is called an instantiation of the rule. So
instantiations of the rule above will be pairs of WMEs in working memory such
as: {(pet1 type dog), (pet1 age 8)}. Note that the first attributes must be
Procedural Cognitive architecture Episodic &
Memory production rules Semantic
Memory
Cognitive Architecture Street language
Kernel
Street language
Productors
Network on Chip
I/O
Working memory distributed among
local memories in productors
Street Engine
Fig. 2. Hardware/software stack on Street (adopted from [6])
the same as they were specified by the same variable
. The WMEs must join
on any shared variables. The actions of a production rule are performed for each
new instantiation of the rule. An action such as (
isOld) adds a WME to
working memory. Actions can also remove WMEs from working memory. This
change in working memory may cause other production rules to instantiate, and
all new instantiations are executed.
2.2 Street Architecture
The Street architecture consists of a large number of identical and simple mi-
crocoded processing elements (PEs) with a single production rule assigned to
each. A PE contains a controller, a block of content-addressable memories (CAMs)
and an Arithmetic Logic Unit (ALU). The PEs communicate using tokens to no-
tify each other of changes to working memory. The local memory of a PE stores
just the subset of working memory that may lead to instantiations of its rule.
Each PE matches the associated production rule against its own local memory
with an algorithm similar to TREAT [12]. For each incoming token, a PE up-
dates the contents of its local memory, finds new instantiations (match), and
outputs tokens corresponding to the rule’s actions (act). The controller coordi-
nates the PE’s match-act cycle. Fig. 2 shows Street architecture executing an
agent using a symbolic cognitive architecture.
3 StreetNet: The Communication Platform
When a PE produces tokens these are transmitted to other PEs and may cause
new rule instantiations and yet more tokens. There may be a large amount of
token traffic between PEs or small clusters of PEs so efficient data interconnect is
Routing
Table
Switch
Dest.
Table
Productor
Fig. 3. A 3 × 3 StreetNet structure
required. For a device with a large number of PEs a network based interconnect,
named the StreetNet, is proposed in this paper.
3.1 Network Architecture
A regular tile-based 2D network architecture is considered for StreetNet. Each
tile contains a single PE and router, however adjacent tiles may be linked to
share memory resources (discussed below in 3.6). Every PE has a destination
table generated from the dependency graph. It lists the desired destinations and
corresponding tokens to those destinations. Each router is connected to its local
PE and four neighbouring tiles. Each router also has a routing table that is
checked for destination. The tokens are broken into packets and forwarded to
the neighboring tile towards the destination. A crossbar switch is used as the
switching fabric in the router. Fig. 3 shows the structure of a StreetNet.
3.2 Dependency Graph
The mapping of PEs onto network architecture is based on a dependency graph.
A dependency graph is a directed graph, where each vertex pi represents a PE.
Directed arcs represent non-zero communication paths between two PEs and
are assigned a weight characterising the communication rate between the PEs.
Fig. 4 shows an example of a dependency graph of nine PEs. This graph is used
to map the PEs onto the network so that the most dependent PEs are placed
close together in the expectation this will reduce communication latency and
power consumption. The PEs are sorted by total incoming and outgoing traffic
that was recorded during runtime, and mapped in this order. This is useful since
the positions of the PEs with high traffic requirement have higher impact on the
overall energy consumption. This dependency graph is updated during a sleep
period (described in subsection 3.5) depending on runtime traffic statistics.
P3
0.0391742 0.0728398
P2 0.0256243 P4
0.0612313
0.027465
0.0130523
P1 0.0468928 P6
0.0448455 0.0322667
0.00800304
P5 P7
0.0285391
P8
P0 0.0256137
Fig. 4. An example of dependency graph with nine PEs
3.3 Mapping Techniques
In [8], an energy aware mapping technique is proposed for networks-on-chip
(NoC) with a regular architecture. This technique is adopted in the StreetNet.
The average energy consumed in sending one bit of data from a tile to a neigh-
boring tile is calculated as
E = ES + EB + EL (1)
where ES , EB and EL are the energy consumed at the switch, buffer and link.
Since the energy consumed for buffering is negligible compared to EL [8], (1)
becomes
E = ES + EL (2)
Now, if the bit traverses n hops to reach tile tj from tile ti , the average energy
consumption is
Eti ,tj = n × ES + (n − 1) × EL (3)
For a system that involves a large number of processing elements, it is impor-
tant to adopt efficient mapping and routing techniques so that total energy
consumption is minimized and communication traffic does not exceed available
bandwidth. Two different mapping techniques have been considered in this work:
one is based on Simulated Annealing (SA) and the second is based on Branch-
and-bound (BB) technique.
Simulated Annealing based Mapping Simulated Annealing (SA) [9] is a
well known technique for solving optimization problems. It effectively optimizes
solutions over large state spaces by making iterative improvement. It has a con-
cept of temperature which is initially very high and keeps reducing in every step
until it reaches the minimum temperature. For each temperature, it starts with
current_temp=MAX_TEMPERATURE;
previous_cost=INITIAL_COST;
current_mapping=randomMapping();
current_cost=cost(current_mapping);
do{
while(attempts