<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Performance Analysis of a Simple Runtime System for Actor Programming in C++</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S.V. Vostokin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E.G. Skoryupina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>34 Moskovskoe Shosse, 443086, Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>62</fpage>
      <lpage>67</lpage>
      <abstract>
        <p>In this paper, we propose the Templet - a runtime system for actor programming of high performance computing in C++. We provide a compact source code of the runtime system, which uses standard library of C++ 11 only. We demonstrate how it differs from the classic implementations of the actor model. The practical significance of the Templet was examined by comparative study on the performance of three applications: the reference code in C ++, managed by the OpenMP; the actor code in C ++, managed by the Templet; the actor code in Java, managed by the Akka. As a test problem we used a numerical algorithm for solving the heat equation.</p>
      </abstract>
      <kwd-group>
        <kwd>performance analysis</kwd>
        <kwd>message-oriented middleware</kwd>
        <kwd>actor framework</kwd>
        <kwd>high performance computing</kwd>
        <kwd>C++ language</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. The Method of Efficiency Evaluation: Heat Equation Test</title>
      <sec id="sec-2-1">
        <title>1 void op(int i)</title>
        <p>2 {
3 for (int j=1; j&lt;W-1; j++)
field[i][j]=(field[i][j-1]+field[i][j+1]+
field[i-1][j]+field[i+1][j])*0.25;
6 }</p>
      </sec>
      <sec id="sec-2-2">
        <title>Listing 1. Elementary operation of the heat equation benchmark.</title>
      </sec>
      <sec id="sec-2-3">
        <title>High-Performance Computing / S.V. Vostokin, E.G. Skoryupina</title>
        <p>1 double seq_alg()</p>
        <p>2 {
3 for (int t=1;t&lt;=T;t++)
4 for (int i=1;i&lt;H-1;i++) op(i);
5 }</p>
      </sec>
      <sec id="sec-2-4">
        <title>Listing 2. Sequential algorithm for the heat equation benchmark.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The Templet Runtime System</title>
      <p>The Templet actor computing system consists of three main parts: two primitive operations (send and access) and a function
for worker threads that process messages (tfunc). In the following listings we illustrate the mechanism for parallel executio n of
actors in the shared memory using the C ++ standard library threads.</p>
      <p>A message sending operation source code is shown in Listing 3. To send a message is to place the message in a shared queue
and notify a thread, which may expect the message in the empty queue (line 6). The queue is protected by a mutex. It is captured
in line 5. The message contains a reference to the actor-destination and the sign of being sent. They are initialized in line 4. Line
3 presents a guard of re-sending the message. Resending the message is an emergency situation, indicating that an error occurred
in the application code.</p>
      <p>1 inline void send(engine*e, message*m, actor*a)</p>
      <p>2 {
3 if (m-&gt;sending) return;
4 m-&gt;sending = true; m-&gt;a = a;
5 std::unique_lock&lt;std::mutex&gt; lck(e-&gt;mtx);
6 e-&gt;ready.push(m); e-&gt;cv.notify_one();</p>
      <p>7 }</p>
      <sec id="sec-3-1">
        <title>Listing 3. Primitive operation ‘send’.</title>
        <p>The access to the message object during the actor processing procedure is allowed if the function access (see Listing 4)
returns "true". The access is granted if (1) the message refers to the actor, which calls the function; (2) the message is no t on
delivery (line 3). This condition is an invariant during the processing of a message in the context of a particular actor, otherwise
the message will not be sent by the send operation. Sending messages is allowed only if the actor has access to the message (see
Listing 4).</p>
      </sec>
      <sec id="sec-3-2">
        <title>1 inline bool access(message*m, actor*a) 2 { 3 return m-&gt;a == a &amp;&amp; !m-&gt;sending; 4 }</title>
      </sec>
      <sec id="sec-3-3">
        <title>Listing 4. Primitive operation ‘access’.</title>
        <p>
          The source code of the worker thread is shown in Listing 5. It implements a thread pool pattern [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The task in terms of the
pattern is a message which is in the state of delivery. The sending field value of the message is true.
        </p>
        <p>The worker thread polls the task from the queue (line 16) and starts the actor’s, message processing procedure (recv). The
recv procedure is prepared by several steps: (1) determining the message’s destination actor (line 19); (2) setting the lock on the
actor (line 21); (3) changing the delivery sign of message to sending = false (line 22); activating the recv procedure to pro cess
the message (line 23).</p>
        <p>7
11
8
High-Performance Computing / S.V. Vostokin, E.G. Skoryupina
17 e-&gt;ready.pop();</p>
        <p>18 }
19 a = m-&gt;a;</p>
        <p>20 {
21 std::unique_lock&lt;std::mutex&gt; lck(a-&gt;mtx);
22 m-&gt;sending = false;
23 a-&gt;recv(m, a);</p>
        <p>}</p>
      </sec>
      <sec id="sec-3-4">
        <title>Listing 5. Worker thread’s function and ‘recv’ callback invocation.</title>
        <p>Note that the captured locks are released implicitly when the thread leaves the syntactic scope of the object lock lck. The actor
system computations are stopped when there are no active working threads.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Parallel Algorithms for the Heat Equation Test</title>
      <p>We implemented three parallel versions of the code in Listings 1, 2. All these versions are driven by the following rules of
parallelization: t is allowed to start iteration t along the time axis and i along the space axis (t, i), if (1) the iteration (t-1, i + 1)
and (t, i-1) have been completed; or (2), if t = 1 and iteration (t, i-1) has been completed. We assume that if an iteration has no
i+1 or i-1 neighboring iterations, the neighboring iteration is completed. The first iteration of the calculation (1,1) is performed
disregarding these conditions. The algorithm stops when T iterations are performed for each coordinate. The considered
computing algorithm can be implemented on the basis of OpenMP, as shown in Listing 6. The idea of parallelization is as
follows: either even or odd iterations i can be calculated simultaneously on each count t. A strict compliance with the rules of
calculation is guaranteed by the additional check in lines 5, 10 and 15 in Listing 6.</p>
      <p>The actor implementations of the algorithm in listing 1, 2 enable using the rules of parallelism explicitly. For this reason,
each space coordinate i is matched by an actor. There are N = H-2 actors used in both actor algorithms.</p>
      <p>In the Templet implementation, the rules of parallelization presented in lines 5-7 of Listing 7. In lines 11 and 12, the actor
informs its’ neighbors i-1 and i+1 (if any) of the completion of the iteration (t, i) by sending messages.</p>
      <sec id="sec-4-1">
        <title>High-Performance Computing / S.V. Vostokin, E.G. Skoryupina</title>
        <p>11 if (id != 0) send(&amp;e, &amp;ms[id - 1], &amp;as[id - 1]);
12 if (id != N - 1) send(&amp;e, &amp;ms[id], &amp;as[id + 1]);
13 }
14 }</p>
        <p>Listing 7. Actor based parallel algorithm for the heat equation benchmark, Templet runtime.</p>
        <p>In the Akka implementation, the rules of the parallelization are declared in lines 7-9 of Listing 8. In lines 13-20 the actor
informs its neighbors i-1 and i+1 (if any) that the iteration is completed (t, i) by sending messages. Note that the message
handling code in Listings 7 and 8 is implemented identically for the convenience of comparison. The code block in lines 22-24
is stops the computations.</p>
        <p>22
1 public void onReceive(Object message) {</p>
        <p>2 if (((Integer) message) == id - 1)
3 access_ms_id_minus_1 = true;
4 if (((Integer) message) == id)
5 access_ms_id = true;</p>
        <p>6
7 if ((id == 0 || access_ms_id_minus_1) &amp;&amp;
8 (id == N - 1 || access_ms_id) &amp;&amp;
9 (Main.time[id] &lt;= Main.T)) {</p>
        <p>10
11 Main.op(id + 1); Main.ts[id]++;</p>
        <p>12
13 if (id != 0) {</p>
        <p>Main.actors[id - 1].tell(id - 1, getSelf());
15 access_ms_id_minus_1 = false;</p>
        <p>16 }
17 if (id != Main.N - 1) {
18 Main.actors[id + 1].tell(id, getSelf());
19 access_ms_id = false;
20 }
21 }
if (Main.time[id] == Main.T + 1 &amp;&amp; id == Main.N - 1) {
23 Main.system.terminate();
24 }</p>
        <p>24 }</p>
        <p>Listing 8. Actor based parallel algorithm for the heat equation benchmark, Akka.</p>
        <p>Both actor algorithms have an initialization code, which is not considered in the paper. Complete code of the Actor Templet
library and the test cases are available at https://github.com/Templet-language/newtemplet/ .</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>Computational experiments were performed on a computer with an Intel (R) Core (TM) i3-3220T RAM 4GB, Windows 10
x64. C ++ programs compiled in Microsoft Visual 2015. For Java programs we used the JDK version 1.8 and the Akka library
version 2.4.17 deployed on the same computer.</p>
      <p>The complexity of the problem may be denoted by H. There are two space-time domain parameters of the calculation:
W=H*2, T=H*2. Both depend on H. Note that H also determines the granularity of computing. The bigger the H parameter is,
the bigger chunks of data are processed sequentially.</p>
      <p>Columns of Table 1 indicate the duration time of the algorithm in seconds: T1JAVA - a sequential Java implementation;
T1NATIVE - a sequential C++ implementation; TpAKKA - a parallel Java implementation based on Akka; TpTEMPLET - a parallel C ++
implementation based on the Templet; TpOPENMP - a parallel C ++ implementation based on OpenMP.</p>
      <p>To account for temporary fluctuations, the data presented in Table 1 has been statistically pre-processed. Each value in Table
1 is calculated by series of 19 experiments. The value includes only the significant digits, guaranteeing them from getting into
the interval [min, max] with confidence factor of 90% (min - minimum, max - maximum time in a series of 19 experiments).</p>
      <p>Table 2 shows the efficiency of the test implementation based on the proposed runtime system by the example of the
implementations based on Akka and OpenMP. The EAKKA and EOPENMP values show the percentage of the Templet acceleration
of reference implementations based on Akka and OpenMP.</p>
      <sec id="sec-5-1">
        <title>High-Performance Computing / S.V. Vostokin, E.G. Skoryupina Table 1. Experimental computation time of the heat equation benchmarks.</title>
        <p>T1JAVA, s</p>
        <sec id="sec-5-1-1">
          <title>T1NATIVE, s</title>
        </sec>
        <sec id="sec-5-1-2">
          <title>TpAKKA, s</title>
        </sec>
        <sec id="sec-5-1-3">
          <title>TpTEMPLET, s</title>
        </sec>
        <sec id="sec-5-1-4">
          <title>TpOPENMP, s</title>
          <p>H
400
500
600
700
800
900
1000
H
400
500
600
700
800
900
1000</p>
          <p>Correctness of the parallelization was checked by piecemeal test for equality of the temperature field values calculated by
sequential and parallel method. We used equal random initial field values for parallel and sequential method. The physical
interpretation of the calculation results was not carried out, since it is beyond the scope of this study.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>The experiments confirmed the high efficiency of the proposed simple implementation of the Templet runtime system for
actor calculations. The Templet system has only a slight performance gap in tests performed using OpenMP, and for the small H
parameter values (400..600) it is not far behind the Akka, or even exceeds it.</p>
      <p>The advantage of actor algorithms for the Templet and Akka is the simplicity of implementation and debugging. The
parallelism of the system is described in terms of a simple behavior of each individual actor. Using the OpenMP requires the
understanding of the global state of computing at each time, resulting in complex boundary conditions of the algorithm cycles in
Listing 6.</p>
      <p>
        Our algorithm is not inferior to the implementation of OpenMP. This result is obtained despite the fact that we used the
expressive possibilities of C ++ Standard Library 11 to simplify the code, neglecting the efficiency. If necessary, it can be
optimized by using the primitive compare-and-swap, as proposed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and by work stealing algorithms [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The source of the simplicity of our actor model implementation is a departure from the classical approach proposed by Agha
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and implemented in the famous actor frameworks and languages, for example, Erlang [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Scala [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], CAF [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and others.
      </p>
      <p>Agha’s approach assumes that the messages are some values that are passed between the actors. They are accumulated in the
mailbox - a special system structure associated with the actor. The actor has an access to the message values.</p>
      <p>In our implementation, messages are treated as variables that store values. A programmer is not bounded to syntactic rules of
access to the message from any actor at any time. However, an access is meaningful and does not lead to violations of logic
provided that the function access to the message-actor pair has returned a true value. This approach does not require the
implementation of complex logic of copying values between the mailbox and the actor call frame from the runtime system.</p>
      <p>The test also showed that despite the fact the native implementation of the test is superior to the Java implementation in terms
of performance, the parallel implementation using Akka is the best for the dimensions of the test problem when the H parameter
value is 600 or more. This can be attributed to the fact that the scheduling algorithm offered by Akka is more sophisticated than
the one offered by the
OpenMP.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <sec id="sec-7-1">
        <title>High-Performance Computing / S.V. Vostokin, E.G. Skoryupina</title>
        <p>Templet system, as well as the scheduling algorithm selected to test the implementation based on</p>
        <p>We propose a simple implementation of the actor computation model in C ++ 11, and the possibility of its usage in high
performance computing. The test example of the heat equation illustrates the high effectiveness of the proposed implementation.
It approximates to the effectiveness of OpenMP, and in some cases is superior to Akka. Apart from that, it reduces the
complexity of the coding.</p>
        <p>
          The runtime library is used in the object-oriented Templet language [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] for the implementation of parallel computing
patterns. These patterns are used in solving problems of nonlinear dynamics in the design of spacecraft [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>This work is partially supported by the Russian Foundation for Basic Research (RFBR#15-08-05934-A), and by the Ministry
of education and science of the Russian Federation in the framework of the State Assignments program (№ 9.1616.2017/ПЧ).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Hewitt</surname>
            <given-names>C.</given-names>
          </string-name>
          <article-title>A universal modular ACTOR formalism for artificial intelligence</article-title>
          .
          <source>Proceedings of the 3rd IJCAI. SanFrancisco</source>
          , CA, USA: Morgan Kaufmann Publishers Inc.
          <year>1973</year>
          :
          <fpage>235</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Charousset</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hiesgen</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            <given-names>TC</given-names>
          </string-name>
          .
          <article-title>Revisiting Actor Programming in C++</article-title>
          .
          <source>Computer Languages, Systems &amp; Structures</source>
          <year>2016</year>
          ;
          <volume>56</volume>
          :
          <fpage>105</fpage>
          -
          <lpage>131</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Lightbend</given-names>
            <surname>Inc</surname>
          </string-name>
          . Akka. URL: http://akka.io
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Charousset</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dominik</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            <given-names>TC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hiesgen</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wählisch</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Native Actors - A Scalable Software</surname>
          </string-name>
          <article-title>Platform for Distributed Heterogeneous Environments</article-title>
          .
          <source>Proceedings of the 4rd ACM SIGPLAN Conference on Systems Programming and Applications (SPLASH '13) Workshop AGERE</source>
          . New York, NY, USA: ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Schmidt</surname>
            <given-names>DC</given-names>
          </string-name>
          .
          <article-title>Pattern-Oriented Software Architecture</article-title>
          .
          <source>Patterns for Concurrent and Networked Objects. John Wiley &amp; Sons</source>
          <year>2013</year>
          ;
          <volume>2</volume>
          : 700 p.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Blumofe</surname>
            <given-names>RD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leiserson</surname>
            <given-names>CD</given-names>
          </string-name>
          .
          <article-title>Scheduling multithreaded computations by works tealing</article-title>
          .
          <source>Proceedings of the 35th annual symposium on foundations of computer science (FOCS)</source>
          <year>1994</year>
          :
          <fpage>356</fpage>
          -
          <lpage>368</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Agha</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mason</surname>
            <given-names>IA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talcott</surname>
            <given-names>C</given-names>
          </string-name>
          .
          <article-title>Towards a theory of actor computation</article-title>
          .
          <source>Proceedings of CONCUR. Lecture notes on computer science</source>
          . Heidelberg: Springer-Verlag
          <year>1992</year>
          ;
          <volume>630</volume>
          :
          <fpage>565</fpage>
          -
          <lpage>579</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Armstrong</surname>
            <given-names>J. Erlang -</given-names>
          </string-name>
          <article-title>a survey of the language and its industrial applications</article-title>
          .
          <source>Proceedings of the symposium on industrial applications of Prolog (INAP96)</source>
          .
          <year>Hino1996</year>
          :
          <fpage>16</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Haller</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Odersky</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Scala actors: unifying thread-based and event-based programming</article-title>
          .
          <source>Theor Comput Sci</source>
          <year>2009</year>
          ;
          <volume>410</volume>
          (
          <issue>23</issue>
          ):
          <fpage>202</fpage>
          -
          <lpage>220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Vostokin</surname>
            <given-names>SV</given-names>
          </string-name>
          .
          <article-title>Templet: a markup language for concurrent actor oriented programming</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <year>2016</year>
          ;
          <volume>1638</volume>
          :
          <fpage>460</fpage>
          -
          <lpage>468</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Doroshin</surname>
            <given-names>AV. Heteroclinic</given-names>
          </string-name>
          <string-name>
            <surname>Chaos</surname>
          </string-name>
          and
          <article-title>Its Local Suppression in Attitude Dynamics of an Asymmetrical Dual-Spin Spacecraft and Gyrostat-Satellites. The Part II - The heteroclinic chaos investigation</article-title>
          ,
          <source>Communications in Nonlinear Science and Numerical Simulation</source>
          <year>2016</year>
          ;
          <volume>31</volume>
          (
          <issue>1-3</issue>
          ):
          <fpage>171</fpage>
          -
          <lpage>196</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>