<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>M odel- D r iven T r a nsfor ma t ions fo r M ap p i ng Pa r a ll el A lgo r it h ms on Pa r a ll el C omp u t i ng Pla t fo r ms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ethem Arkin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bedir Tekinerdogan</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aselsan MGEO</institution>
          ,
          <addr-line>Ankara</addr-line>
          ,
          <country country="TR">Turkey !"</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Bilkent University, Dept. of Computer Engineering</institution>
          ,
          <addr-line>Ankara, Turkey 0!1%</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>K eywords: Model Driven Software Development, Parallel Computing, High Performance Computing, Domain Specific Language</institution>
          ,
          <addr-line>Tool Support</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <abstract>
        <p>A bstr act.One of the important problems in parallel computing is the mapping of the parallel algorithm to the parallel computing platform. Hereby, for each parallel node the corresponding code for the parallel nodes must be implemented. For platforms with a limited number of processing nodes this can be done manually. However, in case the parallel computing platform consists of hundreds of thousands of processing nodes then the manual coding of the parallel algorithms becomes intractable and error-prone. Moreover, a change of the parallel computing platform requires considerable effort and time of coding. In this paper we present a model-driven approach for generating the code of selected parallel algorithms to be mapped on parallel computing platforms. We describe the required platform independent metamodel, and the model-to-model and the model-to-text transformation patterns. We illustrate our approach for the parallel matrix multiplication algorithm.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The famous MoRVUH¶ ODZLKF WDHV KHW SURDIFQPH RIHWK SURLFHVQJ SRZG
ubles every eighteen months is coming to an end due to the physical limitations of a single
processor [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. To keep increasing the performance of the processing power the current trend
is towards applying parallel computing on multiple nodes. Unlike serial computing in which
instructions are executed serially, multiple processing elements are used to execute the
program instructions in parallel. An important challenge in parallel computing is the mapping of
the parallel algorithm to the parallel computing platform. The mapping of the algorithm
requires the analysis of the algorithm, writing the code for the algorithm and deploying it on the
nodes of the parallel computing parallel computing platform. This mapping can be done
manually in case we are dealing with a limited number of processing nodes. However, the current
trend shows the dramatic increase of the number of processing nodes for parallel computing
platforms with now about hundreds of thousands of nodes providing petascale to exascale
level processing power [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. As a consequence mapping the parallel algorithm to computing
platforms has become intractable for the human parallel computing engineer.
      </p>
      <p>Once the mapping has been realized in due time the parallel computing platform might
need to evolve or change completely. In that case the overall mapping process must be redone
from the beginning requiring lots of time and effort.</p>
      <p>In this paper we provide a model-driven approach for both the mapping of parallel
algorithms to parallel computing platform, and the evolution of the parallel computing platform. In
essence our approach is based on the model-driven architecture design paradigm that makes a
distinction between platform independent models and platforms specific models or code. We
provide a platform independent metamodel for parallel computing platform and define the
model-to-model transformation patterns for realizing the platform specific parallel computing
platforms. Further we provide the model-to-text transformation patterns for realizing the code
from the platform specific models.</p>
      <p>The remainder of the paper is organized as follows. In section 2, we describe the problem
statement. Section 3 presents the implementation approach for mapping the parallel algorithm
to parallel computing platform by the help of model transformations. Section 4 presents the
related work and finally we conclude the paper in section 5.
2</p>
      <p>
        P roblem Statement
To define a feasible mapping the parallel algorithm needs to be analyzed and a proper
configuration of the given parallel computing platform is required to meet the corresponding quality
requirements for power consumption, efficiency and memory usage. To illustrate the problem
we will use the parallel matrix multiplication algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The pseudo code of the
algorithm is shown inFig.1a. The matrix multiplication algorithm recursively decomposes the
matrix into subdivisions and multiplies the smaller matrices to be summed up to find the
resulting matrix. The algorithm is actually composed of three different sections. The first serial
section is the multiplication of subdivision matrix elements (line 3), which is followed by a
recursive multiplication call for each subdivision (line 5-15). The final part of the algorithm
defines the summation of the multiplication results for each subdivision (line 13-16).
      </p>
      <p>
        Given a physical parallel computing platform consisting of a set of nodes, we need to
define the mapping of the different sections to the nodes. In this context, the logical
configuration is a view of the physical configuration that defines the logical communication structure
among the physical nodes. Typically, for the same physical configuration we can have many
different logical configurations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. An example of a logical configuration is shown inFig.1b.
In this paper we assume that a feasible logical configuration is selected and the mapping of the
code need to be realized.
      </p>
      <p>!" #$%&amp;'()$'*+,-$./0+)1-.213456*76*89:*
;" .&lt;*8=!*-&gt;'?*
@" A*=*5*B*7*
C" '?(.&lt;*
D" #E*=*+,-$./0+)1-.21345EE6*7EE6*80!9*
F" #!*=*+,-$./0+)1-.21345E!6*7!E6*80!9*
G" #;*=*+,-$./0+)1-.21345EE6*7E!6*80!9*
H" #@*=*+,-$./0+)1-.21345E!6*7!!6*80!9*
I" #C*=*+,-$./0+)1-.21345!E6*7!!6*80!9*
!E" #D*=*+,-$./0+)1-.21345!!6*7!E6*80!9*
!!" #F*=*+,-$./0+)1-.21345!E6*7E!6*80!9*
!;" #G*=*+,-$./0+)1-.21345!!6*7!!6*80!9*
!@" AEE*=*#E*J*#!*
!C" AE!*=*#;*J*#@*
!D" A!E*=*#C*J*#D*
!F" A!!*=*#F*J*#G*</p>
      <p>a) b)</p>
      <p>F ig.1.Matrix Multiplication Algorithm (a) to be mapped on (b) logical configuration platform
the code for node 0 is defined which sends the sub matrices to the other nodes (1,2,3). Lines 9
to 14 define the code for receiving the matrices in node 1. A similar code is implemented for
the nodes 2 and 3 (not shown in the figure). Line 16 defines a so-called barrier to let the
process wait until all the sub-matrices have been distributed and received by all the nodes. After
the distribution of the sub-matrices to the nodes, each node runs the code as defined in line
1718 and, as such, multiplies, the received sub-matrices. Once the multiplication is finalized the
results are submitted to node 0, which is shown in line 19-22 for node 1 (code for node 2 and
3 is not shown). Line 23 to 25 defines again the collection of the results in node 0. Line 27
defines again a barrier to complete this process. Finally in line 28 to 33 the results are summed
in node 0 to compute the resulting matrix C.
model-driven development approaches. The overall approach is shown inFig.3. In the first
step of the approach the parallel computing algorithm is analyzed to define and characterize
the sections that need to be allocated to the nodes of the parallel computing platform. In the
second step, the plan is defined for allocating the algorithm sections to the corresponding
nodes of the logical computing platform. In the third step the code for each serial section is
manually implemented. The fourth step includes the implementation or reuse of predefined
model transformations to generate the code for parallel sections. The final step includes the
deployment of the code on the physical configuration platform. The details of the steps are
described in the following sub-sections.</p>
      <p>!"#$%&amp;'()*#$'+,-./01
&lt;"#3*:.%*#/0*#7'&amp;%#:,-#/0*#$'',9&amp;/.,%#,:#</p>
      <p>/0*#$'+,-./01#=*9/.,%8#
E"#?14'*1*%/#/0*#=*-.&amp;'#5,6*#8*9/.,%8</p>
      <p>&gt;"#?14'*1*%/@A*;8*#B,6*'#
C-&amp;%8:,-1&amp;/.,%8#/,#D*%*-&amp;/*#5,6*
2"#3*4',(#/0*#5,6*#,%#/0*#70(8.9&amp;'#</p>
      <p>5,%:.+;-&amp;/.,%#7'&amp;/:,-1</p>
      <p>F ig.3.Approach for Generating/Developing and Deployment of Parallel Algorithm Code
3.1</p>
      <p>A nalyze A lgorithm
The analysis of the parallel algorithm identifies the separate sections of the algorithm and
characterizes these as serial or parallel sections. Here, a section is defined as a coherent set of
instructions in the algorithm. A serial section defines the part of the algorithm that needs to
run serially on nodes without interacting with other nodes. A parallel section defines the part
of the algorithm that runs on each node and interacts with other nodes. For example the matrix
multiplication algorithm (Fig.1a) has four main sections as shown in Table 1.</p>
      <p>T able 1.Analysis of algorithm sections
!"# $%&amp;'()*+,#-./*)'0# -./*)'0#123.#
4# !"#$%"&amp;'$()$*()#'&amp;+,-$%".(#) /01)
5# 2)3)0)4)5) 671)
6# 2899(.$),-$%":),'9$";9&lt;)%(#'9$#) /01)
=) 2&gt;&gt;)3)/&gt;)?)/@) 671)
2&gt;@)3)/A)?)/B)
2@&gt;)3)/=)?)/C)
2@@)3)/D)?)/E)
The first section defines the distribution of the sub-matrices to the different nodes. This
section is characterized as a parallel section (PAR). The second section is characterized as serial
(SER) and defines the set of instructions for the multiplication of the sub-matrices. The third
section is a parallel section and defines the collection of the results of the matrix
multiplications. Finally, the fourth section is characterized as serial and defines the summation of the
result to derive the final matrix.
3.2</p>
      <p>
        Define the Plan for the A llocation of the A lgorithm Sections
The next step of the implementation approach is to define the plan for mapping the algorithm
sections to logical configurations. Usually many different logical configurations can be
derived for a given parallel algorithm and parallel computing platform. We refer to our earlier
paper [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in which we define the overall approach for deriving feasible logical configuration
alternatives with respect to speed-up and efficiency metrics. In this paper we assume that a
feasible logical configuration has been selected and elaborate on the generation of the
implementation of the algorithm sections.
      </p>
    </sec>
    <sec id="sec-2">
      <title>T able 2.Plan for allocating sections to nodes</title>
      <p>!"# $%&amp;'()*+,#-./*)'0# -./*)'0#123.# 4%50#
!" #$%&amp;'$()&amp;*"&amp;+*"%)(,-.&amp;'$/*%" 012" [-1,0] [0,0]
3"
="
B"
4"5"1"6"7"
4;&gt;&gt;*/&amp;"-.&amp;'$?"-)&gt;&amp;$@&gt;A"
'*%)&gt;&amp;%"
4CC"5"0C"D"0!"
4C!"5"03"D"0="
4!C"5"0B"D"0E"
4!!"5"0F"D"0G"</p>
      <p>The allocation of the sections to the nodes depends on the type of the sections. The plan for
the matrix multiplication algorithm is shown in the fourth column of Table 2. Here we assume
that each serial section runs on each node (section 2 and 4). The plan for allocating the parallel
sections is defined as a pattern of nodes. The rectangles represent the nodes; the arrows
represent the interactions (distribution or collection) among the nodes. Further, each node is
assigned an id defining the coordinate of the node in the logical configuration. For section 1 the
distribution of the data is represented as a pattern of four nodes in which the dominating node
is the node with coordinate (0, 0). The arrows in the pattern show the distribution of the
submatrices from the dominating node to the other nodes. For section 3 the pattern represents the
collection of the results of the multiplications to provide the final matrix.</p>
      <p>
        In the given example we have assumed a logical configuration consisting of four nodes. Of
course for larger configurations defining the allocation plan becomes more difficult. Hereby,
the required plan is not drawn completely but defined as a set of patterns that can be used to
generate the actual logical configuration. For example, scaling the patterns of Table 2can be
used to generate the logical configuration ofFig.1b. For more details about the generation of
larger logical configurations from predefined patterns we refer to our earlier paper [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
3.3
      </p>
      <sec id="sec-2-1">
        <title>I mplement the Serial Code Sections</title>
        <p>Once the plan for allocating the algorithm sections to the logical configuration is defined we
can start the implementation of the algorithm sections. Hereby, the code for the serial sections
is implemented manually.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>T able 3.Implementation of the serial sections</title>
      <p>!"# $%&amp;'()*+,#-./*)'0# 6,3%.,.0*5*)'0#
!" #$%&amp;'$()&amp;*"&amp;+*"%)(,-.&amp;'$/*%" H$&gt;&gt;"(*"I*:*'.&amp;*&lt;"
3" 4"5"1"6"7" 4C"5"1JC"6"7JC"
4!"5"1J!"6"7J!"
=" 4;&gt;&gt;*/&amp;"-.&amp;'$?"-)&gt;&amp;$@&gt;A"'*%)&gt;&amp;%" H$&gt;&gt;"(*"I*:*'.&amp;*&lt;"
B" 4CC"5"0C"D"0!" 4CC"5"0JC"D"0J!"
4C!"5"03"D"0=" 4C!"5"0J3"D"0J="
4!C"5"0B"D"0E" 4!C"5"0JB"D"0JE"
4!!"5"0F"D"0G" 4!!"5"0JF"D"0JG"
The code for the parallel sections are generated using the model-transformation patterns as
defined in the next sub-section. The third column of Table 3 shows the implementation of the
serial sections of the matrix multiplication algorithm. Note that the implementation is
alignment with the complete implementation of the algorithm as shown in Fig.2.</p>
      <sec id="sec-3-1">
        <title>3.4 M odel T ransfor mations</title>
        <p>After analyzing the algorithm, implementing the code for serial algorithm sections and
defining the plan for mapping these sections to the logical configuration, the code for the parallel
sections will be generated. To support platform independence this code generation process is
realized in two steps using model-to-model transformation and model-to-text transformation.
These transformation steps are described below.</p>
      </sec>
      <sec id="sec-3-2">
        <title>M odel-to-M odel T r ansfor mation.</title>
        <p>
          For different parallel computing platforms, there are several parallel programming
languages such as, MPI, OpenMP, MPL, CILK [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. According to the characteristic of the
parallel computing platforms, different programming languages can be selected. Later on in case of
changing requirements a different platform might need to be selected. To cope with the
platform independence and the platform evolution problem we apply the concepts as defined in
the Model-Driven Architecture (MDA) paradigm [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Accordingly, we make a distinction
between platform independent models (PIM), platform specific models (PSM) and the source
code. The generic model-to-model transformation process is shown in Fig.4.
!"#"$$%$&amp;'$()#*+,-&amp;
."//*0(&amp;.%+"-)1%$
2)03)#-4&amp;+)
!"#"$$%$&amp;'$()#*+,-&amp;
."//*0(&amp;.)1%$
        </p>
        <p>.8.
9#"043)#-"+*)0
!"#"$$%$&amp;5)-/6+*0(&amp;!$"+3)#-&amp;
7/%2*3*2&amp;.%+"-)1%$</p>
        <p>2)03)#-4&amp;+)
!"#"$$%$&amp;5)-/6+*0(&amp;!$"+3)#-&amp;</p>
        <p>7/%2*3*2&amp;.)1%$</p>
        <p>F ig.4.Model-to-model transformation.</p>
        <p>Here the transformation process takes as input a platform independent model called,
parallel algorithm mapping model. This model defines the mapping of the algorithm sections to the
logical configuration. The model conforms to the parallel algorithm mapping metamodel
which we will explain later in the section. The output of the transformation process is a
platform specific model, called parallel computing platform specific model. Similarly this model
conforms to its own metamodel, which typically represents the model of the language of the
platform (e.g. MPI metamodel). The platform specific model will be later used to generate the
code using model-to-text transformation patterns.</p>
        <p>!"#$%&amp;'()*+,-.'&amp;'/,+.0)-+1+23+,4,56-7'&amp;$.6819-7'&amp;$.:;+,&lt;,=+
9-7'&amp;$.*,0&gt;6'%07'+-.'&amp;'/,+.0)-+1+23+,4,56-7'&amp;$.6819-7'&amp;$.:;+,&lt;,=+
9-%&amp;0"9-7'&amp;$.*+
++,-.'&amp;'/,+.0)-+1+23+5,-?'-.@6,+6AB-%C/B-+1+D!"#$%&amp;'E;F+,4,7$@-+1+9CG2HI+,&lt;,=+
J0%0""-"9-7'&amp;$.*+
++,-.'&amp;'/,+.0)-+1+23+5,-?'-.@6,+6AB-%C/B-+1+D!"#$%&amp;'E;F+,4,'&amp;"-6+81+J0''-%.:,&lt;,=+
K$#&amp;70"L$.M&amp;#A%0'&amp;$.*,-.'&amp;'/,+.0)-+1+23+,4,5'&amp;"-681C&amp;"-:;,&lt;,=+
C&amp;"-*,0&gt;6'%07'+-.'&amp;'/,+.0)-+1+23+,4,,&lt;,=+
L$%-*,-.'&amp;'/,+.0)-+1+23+5,-?'-.@6,+6AB-%C/B-+1+D(%)"E;F,4,&amp;+1+2HCN+1+2HC+,&lt;,=+
J0''-%.*,-.'&amp;'/,+.0)-+1+23+5,-?'-.@6,+6AB-%C/B-+1+D(%)"E;F+
,4,+'&amp;"-6+81+C&amp;"-:@$)&amp;.0'&amp;$.#6+81+C&amp;"-7$))6+81+L$))A.&amp;70'&amp;$.:+
?6&amp;O-+1+2HC/6&amp;O-+1+2HC,&lt;,=+
L$))A.&amp;70'&amp;$.*,-.'&amp;'/,+.0)-+1+23+,4,+
++++M%$)+1+L$%-'$+1+L$%-"-#6&amp;O-+1+2HCM%$)30'0+1+30'0'$30'0+1+30'0+,&lt;,=+</p>
        <p>F ig.5.Concrete Syntax of the Parallel Algorithm Mapping Metamodel (PAMM)</p>
        <p>The grammar for the parallel algorithm mapping metamodel is defined in XText in the
Eclipse IDE and shown in Fig.5. Here, Algorithm consists of Sections, which can be either a
ParallelSection or SerialSection. Each section can itself have other sections. In the grammar
the serial sections are related to code implementations in the code block. The parallel sections
include the data about the mapping plan that is determined with the logical configuration.
LogicalConfiguration consists of Tile entity which can be either a single Core (processing
unit) or a Pattern with tiles and communications between these tiles. The assets related with
the logical configuration with cores and patterns compose the plan for mapping algorithm to
logical configuration.</p>
        <p>Fig.6 shows, for example, the parallel algorithm mapping model for the matrix
multiplication algorithm. In the figure two serial sections MultiplyBlock and SumBlock are defined. In
the MultiplyBlock section the matrices are divided into sub-matrices and scattered by using the
B2S pattern. The B2S pattern is a predefined pattern in the toolset indicating the pattern for
section 1 as defined in the fourth column of Table 2. This multiply block also contains a
Multiply serial section which contains the serial implementation of the multiply operation. In the
SumBlock section, the resulting matrices are gathered by the pattern B2G which is predefined
for section 3 as shown in the fourth column of Table 2. The SumBlock serial section contains
the serial code for summation of the resulting sub-matrices.</p>
        <p>F ig.6.Parallel Algorithm Mapping Model for the Matrix Multiplication Algorithm</p>
        <p>Once the platform independent parallel algorithm mapping model is defined we can
transform it to the required platform specific model. We assume, for example, that the aim is to
generate a MPI model. Fig.7shows the grammar of the MPI metamodel that is again defined
using XText. In the metamodel each MPI model consists of a group of entities, which include
MPISection, Process, Node, and Communication. Each section consists of processes and
communication among these processes. Each Process allocates to a Node. Each
communication defines the destination and target process.</p>
        <p>!"#!$%&amp;'()&amp;*+#+,)-*./&amp;-0-12-)3)456$7"890!"#:6$7";&lt;)=)&gt;!"#:6$7"()&amp;*+#+,)-*./&amp;-0-12-)3)48&amp;?+#$*890!"#@&amp;?+#$*;&lt;4*$%&amp;890A$%&amp;&lt;)=)&gt;!"#@&amp;?+#$*()&amp;*+#+,)-*./&amp;-0-12-)3)----48&amp;?+#$*890!"#@&amp;?+#$*;&lt;4"6$?&amp;88&amp;890B6$?&amp;88;&lt;----4?$//7*#?.+#$*890C$//7*#?.+#$*;&lt;?$%&amp;-0-@DE1A:)=)&gt;B6$?&amp;88()&amp;*+#+,)-*./&amp;-0-12-)3)6.*F-0-1AD.''$?.+&amp;80A$%&amp;)=)&gt;A$%&amp;()&amp;*+#+,)-*./&amp;-0-12-)3))=)&gt;C$//7*#?.+#$*()&amp;*+#+,)-*./&amp;-0-12-)3)G6$/-0-B6$?&amp;88+$-0-B6$?&amp;88-)=)&gt;</p>
        <p>F ig.7.Grammar of the MPI Metamodel</p>
        <p>
          The model-driven transformation rules refer to elements of both the PAMM and the
parallel computing platform specific metamodel, in this case the MPI Metamodel. The M2M
transformation rules are implemented using the ATL [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] transformation language. The
transformation rules are shown in Fig.8. As shown in the figure we have implemented four different
rules which define the transformations of mapping patterns to MPI sections, cores to processes
and communications to MPI communications.
        </p>
        <p>The rule Algorithm2MpiModel , is defined as the main rule of the transformation. The rule
Pattern2Section transforms the algorithm pattern sections to MpiSection within the MpiGroup.
The rule Core2Process transforms the cores as defined in the patterns to the processes in
MpiSection. Each process is transformed from the core with the data of rank calculated from
the index values of the core. Similarly, Comm2Comm transforms the communications that are
defined in the patterns, to the communications in MPISection.</p>
        <p>The MPI model which is the result of the model-to-model transformation is shown in Fig.9.
The MPI model includes the MpiSection with processes that will run on each node,
communications from a destination process to target process and the serial code section
implementation. This MPI model is now ready for model-to-text transformation to generate the final MPI
source code.</p>
        <p>F ig.9.Part of the MPI model generated by model-to-model transformation
M odel-to- T ext T ransfor mation
The generated PSM includes the mapping of the processes specific to the parallel computing
platform. Subsequently, this PSM is used to generate the source code. The model-to-text
transformation pattern for this is shown inFig.10.</p>
        <p>!"#$!%&amp;'()*%+
!"#$!)*%+
,)-.)/(0$
&amp;)</p>
        <p>!45
5/'-0.)/('&amp;6)</p>
        <p>!"#$
1)2/,%$3)*%</p>
        <p>
          F ig.10.Example model transformation chain of MPI model
XPand [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] transformation language. To map the sections to the parallel computing platform,
for each section the communication operations for the data is generated for target and
destination process ranks (line 6 to 11). Subsequently, the serial code implementation is imported to
the source code in line 13. For each section, a barrier code is implemented to synchronize the
section processes (line 14). The resulting code of the transformation is the code as defined in
!" #!"#$%&amp;'$%&amp;'(
)" «03,LQWDO]RVG\SHI (
*" #($%)*+,(+,-.%/(*-(+,-.%'(
0" #($%)*+,'+,-.%"/123&amp;-4/(*-(/123&amp;-4'(
5" #($%)*+,'/123&amp;-4"2-$$.4&amp;263&amp;-4/(*-'2-$$'(
7" &amp;89,64:(;;(#2-$$"8,-$",64:'&lt;(=(
&gt;" ?@ABA/14C9#2-$$"8,-$D636"46$1'E#2-$$"8,-$D636"/&amp;F1'E?@AB#2-$$"8,-$D636"3G%1'E(
H" ((((((((((#2-$$"3-",64:'E#2-$$"8,-$",64:'E?@ABIJ??BKJLMDEN,1O.1/3&lt;PQ(
R" &amp;89,64:(;;(#2-$$"3-",64:'&lt;(=(
!S" ?@ABA,12T9#2-$$"3-D636"46$1'E#2-$$"8,-$D636"/&amp;F1'E?@AB#2-$$"3-D636"3G%1'E((
!!" ((((((((((#2-$$"8,-$",64:'E?@ABUVWBXUYE?@ABIJ??BKJLMDEN,1O.1/3&lt;PQ(
!)" #)./($%)*+,'(
!*" #/123&amp;-4"2-C1'(
!0" ?@ABZ6,,&amp;1,9?@ABIJ??BKJLMD&lt;P(
!5" #)./($%)*+,'#)./($%)*+,'(
!7" «)LQDOFRGH (
        </p>
        <p>F ig.11. Transformation template from MPI metamodel to MPI source code
3.5</p>
        <p>
          Deploy Code on Physical Configuration
The resulting code of the previous steps needs to be deployed on the physical configuration.
The deployment can be done manually or using tool support in case of large configurations. In
the literature various tools can be found which concern the automatic deployment of the code
to the nodes of a parallel computing platform. We refer to, for example, [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ][
          <xref ref-type="bibr" rid="ref15">15</xref>
          ][
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for further
details.
4
        </p>
        <sec id="sec-3-2-1">
          <title>R elated W or k</title>
          <p>
            Several papers have been published in the domain of model-transformations for parallel
computing. Palyart et. al. [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ] propose an approach for using model-driven engineering in high
performance computing. They focus on automated support for the design of a high
performance computing application based on the distinction of different domain expertise like
physical configuration, numerical computing, application architecture etc.
          </p>
          <p>
            Bigot and Perez [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] adopt HLCM a hierarchical and generic component model with
connectors originally designed for high performance applications. The authors represent on their
experience with metamodeling and model transformation to implement HLCM. WLp*DPHO
[
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] introduced the GASPARD design framework systems that use model transformations for
massively parallel embedded systems. They refined the MARTE models based on Model
Driven Engineering paradigm. They provide tool support to automatically generate code with
high-level specifications. Taillard et.al [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] implemented a graphical framework for
integrating new metamodels to GASPARD framework. They used MDE paradigm to generate
OpenMP, Fortran or C code.
          </p>
          <p>Similar to our approach the above studies generate source code for high performance
computing. The main difference of our approach is focus on the mapping of algorithm sections to
parallel computing platforms.
5</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>C onclusion</title>
          <p>In this paper we have described the model transformations needed to implement the mapping
of a parallel algorithm to a parallel computing platform. In alignment with the MDA paradigm
the approach is based on separating the platform independent parallel computing model from
the platform specific parallel computing model and the source code. The model
transformations do not only helps the parallel programming engineer to generate code but it also
provides support for easier portability in case of platform evolution. We have illustrated the
approach for the MPI platform but the approach is generic. In our future work we will elaborate
on the application of model-driven approaches to parallel computing platform and focus on
optimizing the values for metrics which are important for mapping parallel algorithms to
parallel computing platforms.
R efe r ences</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. ATL:
          <article-title>ATL Transformation Language</article-title>
          . http://www.eclipse.org/atl/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Arkin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tekinerdogan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Imre</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>Model-Driven Approach for Supporting the Mapping of Parallel Algorithms to Parallel Computing Platforms</article-title>
          .
          <source>Proc. of the ACM/IE E E 16th International Conference on Model Driven Engineering Languages and Systems</source>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bigot</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>On Model-Driven Engineering to implement a Component Assembly Compiler RUI</article-title>
          K
          <string-name>
            <surname>+LJ QR3DFUIHP R&amp;SXWLQPJ R-'</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <source>LVOUQJHp SUD HOV 0RGOHVq '</source>
          ,
          <volume>0</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cumberland</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herban</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Irvine,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Shuey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            , and
            <surname>Luisier</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Rapid parallel systems deployment: techniques for overnight clustering</article-title>
          .
          <source>In Proceedings of the 22nd conference on Large installation system administration conference (LISA'08)</source>
          .
          <source>USENIX Association</source>
          , Berkeley, CA, USA,
          <fpage>49</fpage>
          -
          <lpage>57</lpage>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Foster</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering</article-title>
          .
          <string-name>
            <surname>Addison-Wesley Longman</surname>
          </string-name>
          Publishing Co. , Inc., Boston, MA, USA. (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          <article-title>The physical limits of computing</article-title>
          .
          <source>Computing in Science &amp; Engineering</source>
          , vol.
          <volume>4</volume>
          , no.
          <issue>3</issue>
          , pp.
          <volume>16</volume>
          ,
          <issue>26</issue>
          ,
          <string-name>
            <surname>May</surname>
            <given-names>-June</given-names>
          </string-name>
          <year>2002</year>
          . (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. WLDpP* $ H/ X%H[
          <volume>6</volume>
          3LHO
          <string-name>
            <surname>( Q%H WLDOK$</surname>
          </string-name>
          <article-title>5 (WLHQ $ UT0DXWH 3</article-title>
          QDG
          <string-name>
            <given-names>UV</given-names>
            <surname>\N'H -</surname>
          </string-name>
          $
          <article-title>Model-Driven Design Framework for Massively Parallel Embedded Systems</article-title>
          .
          <source>ACM Trans. Embed. Comput. Syst</source>
          .
          <volume>10</volume>
          ,
          <issue>4</issue>
          , Article 39. (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hoffmann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neubauer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Deployment and configuration of distributed systems</article-title>
          .
          <source>In Proceedings of the 4th international SDL and MSC conference on System Analysis and Modeling (SAM'04)</source>
          , Daniel Amyot and Alan W. Williams (Eds.). Springer-Verlag, Berlin, Heidelberg,
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kogge</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borkar</surname>
          </string-name>
          , S.,
          <string-name>
            <surname>Campbell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carlson</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dally</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denneau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franzon</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrod</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hiller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karp</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keckler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lucas</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richards</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scarpelli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scott</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snavely</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sterling</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yelick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borkar</surname>
          </string-name>
          , S.,
          <string-name>
            <surname>Campbell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carlson</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dally</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denneau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franzon</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrod</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hiller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keckler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Yelick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Exascale Computing Study:
          <article-title>Technology Challenges in Achieving Exascale Systems</article-title>
          . DARPA. (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>Scalable parallel matrix multiplication on distributed memory parallel computers</article-title>
          .
          <source>Parallel and Distributed Processing Symposium</source>
          ,
          <year>2000</year>
          .
          <source>IPDPS 2000. Proceedings. 14th Inte rnational</source>
          , vol., no., pp.
          <volume>307</volume>
          ,
          <fpage>314</fpage>
          . (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>G.E.</given-names>
          </string-name>
          <string-name>
            <surname>Cramming</surname>
          </string-name>
          <article-title>More Components Onto Integrated Circuits</article-title>
          .
          <source>Proceedings of the IE E E</source>
          , vol.
          <volume>86</volume>
          , no.
          <issue>1</issue>
          , pp.
          <volume>82</volume>
          ,
          <fpage>85</fpage>
          . (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>MPI</surname>
          </string-name>
          : A Message-Passing Interface Standart, version
          <volume>1</volume>
          .1. http://www.mpi-forum.org/docs/mpi-11
          <source>- html/mpi-report.html.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Object Management Group (OMG).
          <source>Model Driven Architecture (MDA)</source>
          , ormsc/2001-07-01.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Palyart</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lugato</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ober</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bruel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>MDE4HPC: an approach for using model-driven engineering in high-performance computing</article-title>
          .
          <source>In Proceedings of the 15th international conference on Integrating System and Software Modeling (SDL'11)</source>
          , Iulian Ober and Ileana Ober (Eds.).
          <source>SpringerVerlag</source>
          , Berlin, Heidelberg,
          <fpage>247</fpage>
          -
          <lpage>261</lpage>
          . (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Stawinska</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurzyniec</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stawinski</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sunderam</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <source>Automated Deployment Support for Parallel Distributed Computing, Parallel, Distributed and Network-Based Processing</source>
          ,
          <year>2007</year>
          . PDP '
          <volume>07</volume>
          . 15th
          <string-name>
            <given-names>E</given-names>
            <surname>UROMICRO International</surname>
          </string-name>
          <article-title>Conference on</article-title>
          , vol., no., pp.
          <volume>139</volume>
          ,
          <fpage>146</fpage>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Taillard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guyomarc'h</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dekeyser</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>A Graphical Framework for High Performance Computing Using An MDE Approach</article-title>
          .
          <source>In Proc. of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP '08)</source>
          . IEEE Computer Society, Washington, DC, USA,
          <fpage>165</fpage>
          -
          <lpage>173</lpage>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Talia</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Models and Trends in Parallel Programming</article-title>
          .
          <source>Parallel Algorithms and Applications</source>
          <volume>16</volume>
          , no.
          <volume>2</volume>
          :
          <fpage>145</fpage>
          -
          <lpage>180</lpage>
          . (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Xpand</surname>
          </string-name>
          , Open Architectureware. http://wiki.eclipse.org/Xpand.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kakulapati</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kale</surname>
            ,
            <given-names>L.V.</given-names>
          </string-name>
          <article-title>BigSim: a parallel simulator for performance prediction of extremely large parallel machines</article-title>
          .
          <source>Parallel and Distributed Processing Symposium</source>
          ,
          <year>2004</year>
          .
          <source>Proc.. 18th International</source>
          , vol., no., pp.
          <fpage>78</fpage>
          ,,
          <fpage>26</fpage>
          -
          <lpage>30</lpage>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>