<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <fpage>123</fpage>
      <lpage>126</lpage>
      <abstract>
        <p>Invasive Computing enables a resource-aware programming style, which includes adapting to external resource changes similar to malleability. We introduce asynchronously-malleable applications, which can adapt at any time without synchronizing the whole application, in contrast to speci c synchronization points. We show how master-slave applications meet the requirements for asynchronous malleability and how Invasive Computing supports it.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>1.1
1.2</p>
      <p>
        Malleability
There has been a considerable amount of work on malleable applications in high-performance computing, roughly
starting with the work by Feitelson et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] who coined the word malleable for applications that can adapt their
degree of parallelism at runtime in response to external requests.
      </p>
      <p>
        We demonstrated [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that our invasive language is expressive enough to write typical iterative numerical
algorithms, in this case a multigrid solver, in a malleable style. Here, we use reinvade to modify the resource
Copyright c by the paper's authors. Copying permitted for private and academic purposes.
val ilet = ( id : I n c a r n a t i o n I D )= &gt;{
for ( job in q u e u e ) {
if ( q u e u e . c h e c k T e r m i n a t i o n ( id )) b r e a k ;
job . do (); } }
val resizeHandler = ( add : List [ PE ] , r e m o v e : List [ PE ])= &gt;{
for ( pe in add ) q u e u e . a d d W o r k e r ( pe , ilet );
q u e u e . a d a p t ();
for ( pe in r e m o v e ) q u e u e . s i g n a l T e r m i n a t i o n ( pe ); }
val c o n s t r a i n t s = new P E Q u a n t i t y (4 ,10)
&amp;&amp; new Malleable ( resizeHandler )
&amp;&amp; new S c a l a b i l i t y H i n t ( s p e e d u p C u r v e );
val c l a i m = C l a i m . invade ( c o n s t r a i n t s );
q u e u e . a d a p t T o ( c l a i m );
c l a i m . infect ( ilet );
c l a i m . retreat ();
claim at speci c points during each iteration. We ran multiple instances of the malleable multigrid solver
concurrently and measured an improved throughput compared to concurrent non-malleable multigrid instances.
      </p>
      <p>
        However, algorithms exist that are even more exible regarding recon guration. Flick et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have enhanced
multi-way merge sort with malleability. Multi-way merge sort is a popular parallel sorting approach, which
contains a phase where it partitions the data into work packages that it sorts internally and independently of
each other. They use a central work queue for sorting jobs and a number of worker threads that fetch jobs from
the queue. Workers can be added and removed without interrupting other workers. If the work packages are
small enough, the sorting process as a whole becomes malleable. Moreover, with small work packages, response
to recon guration requests is nearly instantaneous. For distinction, we label this asynchronously-malleable.
      </p>
      <p>We argue that applications following a master-slave pattern always have this option, if creating and destroying
slaves has low cost. Additionally, we propose an asynchronous programming style for handling recon guration.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Asynchronously-Malleable Invasive Application</title>
      <p>In an asynchronously-malleable invasive application, the system can decide at any time to resize a claim. The
system is only required to meet the invasion constraints. For example, requiring an asynchronously-malleable
claim of 4 to 10 PEs, means the system can resize to 5 or 9 PEs at any time. In contrast to normal claims,
an asynchronously-malleable claim must be adaptable even within an active infect phase. The application is
responsible for starting and terminating i -lets with respect to added and removed PEs.</p>
      <p>Figure 1 shows an example application. The resize handler is a function, which takes a list of PEs that were
added and a list of PEs that will get removed. It does not return a value but adapts the application as a side
e ect. When the system calls the handler in the example, it rst starts additional i -lets according to its input.
Then the PEs to remove get signaled. The i -let code checks this signal and just stops executing jobs if needed.</p>
      <p>When the application has adapted to the change order, it noti es the system of the successful adaptation by
returning from the handler function. The system updates its resource state and operating system. From the
application programmer's point of view, the system 1) adds additional resources to the claim, 2) executes the
resize handler, and 3) removes resources from the claim as planned. This means the resize handler can assume
to own the union of the claim resources before and after resizing. This is necessary, because it must terminate
i -lets on PEs to remove and also must start i -lets on new PEs. Also, infect must not nish while resizing. If
infect nishes, the main activity will continue and for example retreat the claim. Meanwhile, there must be
no resize handler running, which starts additional i -lets. Thus, the claim must track how many child activities
are running and the resize handler counts as one of them. So, infect waits for all child activities to terminate
before it returns.</p>
      <p>We considered to let the application decide which PEs to free, but dropped the idea. The system should
handle such resource decisions exclusively, otherwise we end up with duplicated code and con icting decisions.
The system has a global view of resource needs and thus is in the best position for de nitive decisions. It is the
application's responsibility to receive information about the system's decisions and to adapt.</p>
      <p>Our example contains a subtle circular dependency. A claim object requires constraints which in turn require
the resize handler as an argument. This ensures via the type system that the programmer provides a resize
handler whenever he declares malleability. However, the resize handler requires the claim object to start more
i -lets. In Figure 1, we break this cycle via the global queue, which stores the claim it gets via adaptTo and
provides the addWorker method.</p>
      <p>The system shall handle errors in the resize handler in a safe manner. The handler might fail to adapt to the
decided changes, e.g. by not terminating i -lets on PEs to remove. If the system ignores this \sit-in", it could
give the PEs to another application, which nds itself unable to use them. Alternatively, the system could kill
the blocking i -lets, but this might result in a deadlock of the whole application. So, the only sensible reaction
is to kill the whole application, which failed to adapt to the decided changes. For another source of error, the
(X10) resize handler might fail by throwing an exception. The system is not implemented in X10, so handling
the exception is not an option. Like the rst case, the application state is unknown at this point, so recovery is
impossible and the system must kill the application. These error cases show the crucial role of the resize handler.
Reproducing and debugging errors is hard, because a call to the resize handler depends on the behavior of other
applications and is e ectively non-deterministic.</p>
      <p>Our system does not support virtualization (preemption, address spaces, etc) and inter-tile migration of i -lets
so far. Thus, the system cannot remove an application from a tile during reinvade. The application would lose
tile-local data without a chance to copy it elsewhere. So, the system ensures that at least one processing element
per tile remains. However, the resize handler provides a migration chance, so asynchronously-malleable claims
can lose tiles.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Adaptive MPI [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] uses processor virtualization to achieve malleability. Computations are divided into a large
number of virtual MPI processes, which are mapped to the physical resources via a runtime system with object
migration support. In contrast, we assign resources exclusively by default and programs cannot rely on resource
virtualization to hide underutilization.
      </p>
      <p>
        Leopold et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] present a case study of making an existing MPI application malleable. The studied
application is an iterative numerical simulation using a master-slave design. The authors extend this design with
a server process that runs concurrently to the master and keeps track of newly started started slave processes
during runtime; removing slaves is not handled. New slaves become usable to the master starting with the next
iteration.
      </p>
      <p>
        Maghraoui et al. [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] present a general approach for extending existing iterative MPI applications with
malleability features. The authors broaden the de nition of malleable to also vary the number of application
processes, so they should rely not only on dynamic load balancing via resource virtualization.
Implementationwise, they extended the middleware library PCM (Process Checkpointing and Migration) with functions regarding
malleability. In particular, they support splitting and merging processes via collective operations and handle
migration via saving and restoring the current state.
      </p>
      <p>In contrast to this work, we do not constrain ourselves to iterative applications, as shown by the
asynchronously-malleable sorting example, nor do we have to support legacy MPI applications. Furthermore,
we look at malleable applications in the context of many-core architectures on a single chip whereas prior work
focused distributed high-performance computing.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Future Work</title>
      <p>Our design proposal is improvable. For example, much of the application boilerplate or even the whole queue
implementation should be generalized into the framework.</p>
      <p>The termination signal in our example is setting a ag across a distributed system. It might pay o to use
special (hardware or OS) mechanisms to speed this up. Furthermore, should the resize handler wait for the
termination to succeed? If a PE keeps executing code, while the system assigns it to another application, this
might or might not pose a problem.
This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative
Research Center \Invasive Computing" (SFB/TR 89). Special thanks go to Jochen Speck, Sebastian Kobbe,
and Martin Schreiber for extensive discussions about this. We also thank the anonymous reviewers for their
suggestions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Buchwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohr</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zwinkau</surname>
          </string-name>
          . Dynamic X10:
          <article-title>Resource-Aware Programming for Higher E ciency</article-title>
          .
          <source>Technical Report 8</source>
          , Karlsruhe Institute of Technology,
          <year>2014</year>
          . X10 '
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.-J.</given-names>
            <surname>Bungartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Riesinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Snelting, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zwinkau</surname>
          </string-name>
          .
          <article-title>Invasive computing in HPC with X10</article-title>
          .
          <source>In Proceedings of the third ACM SIGPLAN X10 Workshop</source>
          , X10 '
          <volume>13</volume>
          , pages
          <fpage>12</fpage>
          {
          <fpage>19</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>El Maghraoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Desell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Szymanski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Varela</surname>
          </string-name>
          .
          <article-title>Dynamic Malleability in Iterative MPI Applications</article-title>
          .
          <source>In Seventh IEEE International Symposium on Cluster Computing and the Grid</source>
          , pages
          <volume>591</volume>
          {
          <fpage>598</fpage>
          , May
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>El Maghraoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Desell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Szymanski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Varela</surname>
          </string-name>
          .
          <source>Malleable Iterative MPI Applications</source>
          . volume
          <volume>21</volume>
          , pages
          <fpage>393</fpage>
          {
          <fpage>413</fpage>
          ,
          <string-name>
            <surname>Mar</surname>
          </string-name>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Feitelson</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          .
          <article-title>Towards convergence in job schedulers for parallel supercomputers</article-title>
          .
          <source>In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, IPPS '96</source>
          , pages
          <fpage>1</fpage>
          {
          <fpage>26</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Flick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sanders</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Speck</surname>
          </string-name>
          .
          <source>Malleable Sorting. IEEE 27th International Symposium on Parallel and Distributed Processing</source>
          , pages
          <volume>418</volume>
          {
          <fpage>426</fpage>
          , May
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Heisswolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zaib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zwinkau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kobbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Weichslgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Teich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Henkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Snelting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herkersdorf</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Becker</surname>
          </string-name>
          . CAP:
          <article-title>Communication Aware Programming</article-title>
          .
          <source>In Design Automation Conference (DAC)</source>
          ,
          <year>2014</year>
          51th ACM / EDAC / IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lawlor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Kal</surname>
          </string-name>
          .
          <source>Adaptive MPI. In Languages and Compilers for Parallel Computing</source>
          , volume
          <volume>2958</volume>
          of Lecture Notes in Computer Science, pages
          <volume>306</volume>
          {
          <fpage>322</fpage>
          . Springer Berlin Heidelberg,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kale</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>Performance Evaluation of Adaptive MPI</article-title>
          .
          <source>In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming</source>
          , pages
          <volume>12</volume>
          {
          <fpage>21</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Leopold</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Su , and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Breitbart</surname>
          </string-name>
          .
          <article-title>Programming for malleability with hybrid MPI-2 and OpenMP: Experiences with a simulation program for global water prognosis</article-title>
          .
          <source>Proceedings of the European Conference on Modelling and Simulation</source>
          , pages
          <volume>665</volume>
          {
          <fpage>670</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V.</given-names>
            <surname>Saraswat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bloom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Peshansky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Tardieu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Grove</surname>
          </string-name>
          .
          <article-title>X10 Language Speci cation</article-title>
          .
          <source>Technical report, IBM</source>
          ,
          <year>February 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Teich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Henkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herkersdorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schmitt-Landsiedel</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <article-title>Schroder-</article-title>
          <string-name>
            <surname>Preikschat</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Snelting. Invasive Computing</surname>
          </string-name>
          :
          <article-title>An Overview</article-title>
          .
          <source>In Multiprocessor System-on-Chip { Hardware Design and Tool Integration</source>
          , pages
          <volume>241</volume>
          {
          <fpage>268</fpage>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zwinkau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Buchwald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Snelting</surname>
          </string-name>
          .
          <source>InvadeX10 Documentation v0.5. Technical Report 7</source>
          , Karlsruhe Institute of Technology,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>