=Paper=
{{Paper
|id=Vol-1536/paper16
|storemode=property
|title=
Parallel Discrete Event Simulation as a Paradigm for Large Scale Modeling Experiments
|pdfUrl=https://ceur-ws.org/Vol-1536/paper16.pdf
|volume=Vol-1536
|dblpUrl=https://dblp.org/rec/conf/rcdl/ShchurS15
}}
==
Parallel Discrete Event Simulation as a Paradigm for Large Scale Modeling Experiments
==
Parallel Discrete Event Simulation as a Paradigm for
Large Scale Modeling Experiments
© Lev Shchur © Liudmila Shchur
Science Center in Chernogolovka,
142432 Russia
shchur@chg.ru , lvs@chg.ru
Abstract assigns a logical process (LP) to each subsystem.
Therefore, each LP is just a sequential event-driven
Parallel Discrete Event Simulation (PDES) is a simulator. Sending an event to a LP is equivalent to
method introduced to conduct simulation of a complex scheduling that event in the receiving list of LP pending
system, which consists of a large number of objects. The events.
PDES has been known for about 30 years and we The PDES is a space-parallel approach with an
discuss why further analysis of the method is important execution of a single discrete event simulation program
nowadays. We briefly introduce the main features of the on a parallel computer or on a cluster of computers [1].
method and discuss the current state of PDES. We It is widely used in economics and logistics, and
present a survey of the method as well as promising sometimes applied in physics and computer science. A
applications. system that should be simulated is divided into disjoint
subsystems, which are mapped onto the programming
1 Introduction objects [2]. Subsystems are not isolated during the
simulations, and dependencies between objects should
Methods of parallel algorithms developed in the
be resolved properly. There are three main essentials of
middle of the last century are weakly applicable on the
PDES. First, it is assumed that changes of possible, or
modern supercomputers. During the last two decades
potential, dependencies occur at some particular
the high-performance computing evolved from the big
moments of time. It is supposed that these moments of
mainframes with a complicated central processing unit
time are spread on a scale that is large enough compared
(CPU) into the parallel processing of large numbers of
to an elementary unit of simulation time. Therefore,
CPUs and GPGPUs (general-purpose graphics
changes are considered to be discrete (although random)
processing units). Running hundreds of thousands or
in time, and are called discrete events. The objects
even millions of CPUs in a single task is one of the
evolve independently in time, as soon as there are no
challenging problems of high-performance computing
dependencies generated. We need to have some protocol
(HPC). We will focus our discussion on how the parallel
in order to process dependencies correctly, in other
discrete event simulation (PDES) method may be used
words we would like to keep causality. Using general
in large-scale simulations.
language we can say that we need some protocol for
There are two ways to parallelize a simulation model:
synchronization of discrete events.
time-parallel and space-parallel [1]. The time-parallel
The main idea of PDES is to use an asynchronous
approach partitions the simulated time axis into intervals
mechanism and to implement the concept of Virtual
[T1,T2], [T2,T3], …, [Ti,Ti+1], … and assigns each Time [2]. A local virtual time variable Wi is associated
interval to a separate process. A process i computes a with each object i. As soon as a new time event is
sample path for the interval [Ti,Ti+1]. The final state of generated by an object i, that event is stamped with the
the system in the sample path [Ti-1,Ti] must be the local virtual time (LVT) Wi. A message containing
initial state in the sample path for [Ti,Ti+1]. It is clear information on the event is generated and is sent to
that the time-parallel simulations are restricted to special other objects. A message sending mechanism is the
models that fulfil that requirement. second essential feature of PDES. The third essential
The space-parallel approach partitions the system feature of PDES - there is no information exchange
being modelled into a collection of subsystems, and between subsystems through common variables and
there is no access of objects to a shared memory.
Instead, all the information is distributed via messages.
Proceedings of the XVII International Conference
The ensemble of LVT Wi generates a profile of local
«Data Analytics and Management in Data Intensive
virtual times. A minimal value of the profile defines
Domains» (DAMDID/RCDL’2015), Obninsk, Russia,
time horizon, and an evaluation of this value defines
October 13 - 16, 2015
Global Virtual Time (GVT) of simulations. An evolution
107
of LVT profile and of GVT in time depends on a case of local algorithms it is possible to prove existence
particular scheme of managing causality, and the of a lower bound of that speed [10].
analysis of time profile properties (which is related to We use an extension [11] of Virtual Time concept on
the problem of synchronization!) is one of subjects of the current multi-core and multi-thread architecture of
the present paper. hardware and software. In our discussion, the LVT is
Originally, the PDES was designed to deal with associated with logical processes (LP) and not with
complex and heavy problems. Suppose, one has to processing elements, as it was the case in “before-core
simulate a complex system with the condition that era” of hardware [9]. A processing element (PE) is
simulation is resource consuming (huge processor time, associated with hardware, and a LP is more flexible
big RAM, and big data storage, large number of because it is the virtual. Nowadays this extension is
sensors) and could not be fit in a single computer. An even more important with GPGPU in which threads
appropriate simulation method should be able to handle may run LP in parallel and concurrently.
any problem. The main requirement was not in the Let us define an abstract model of simulation in
effective use of CPU time, or storage, or other hardware PDES. The whole physical system is decomposed into
loading level. The main requirement was just to have N subsystems. Each physical subsystem is associated
some possibility to simulate. It is not a usual with a logical process. Each logical process develops in
requirement of scalability, which is traditionally time, changing its internal state at some moments of
associated with the effective use of hardware. Contrary, local simulation time. Time of simulation is measured in
scalability in the PDES is just a possibility to evolve a the Global Virtual Time [2], which is a minimal time of
system regardless of how large and complex this system the LVT profile. We call a change in the internal state of
is. During the last years, the PDES method attracts simulated subsystem i as an event that happens at logical
interest of a computational physics community, which is process i, it is denoted LP(i). We suppose that intervals
looking for methods to simulate huge systems that can of time between events are distributed according to a
be simulated on computers with millions of nodes Poisson law. A logical process generates a message with
[4,5,6]. A discussion of that issue is another subject of information on the changes in the internal state, marked
the paper. up (stamped) by the value of LVT. Sequence of LVT
The third subject of the paper is a discussion of the times Wi is not decreasing. Each logical process LP(i)
possibility to use the PDES method on a lower level, on receives a message and may include the received
a level of system software that will link together billions information in simulation, or ignore information if it is
of nodes (computational units, CPUs, cores, sensors, not necessary for simulation. In the case when LPs
etc.) within one task. This was mentioned in the White perform simulations in parallel, the following problem
Paper “Performance Technologies for Peta-Scale may occur. It may happen that LVT Wj of LP(j) is higher
Systems” prepared by a group of researchers from US than Wi in the received message from LP(i), and LP(j)
National Laboratories and associated Universities [7]. does depend on the information contained in the
The paper is organized as follows. In Section 2 we message generated by LP(i). Therefore, causality is
describe briefly the PDES method. In Section 3 we violated! Fujimoto classified PDES algorithms into two
review the research on the evolution of LVT time groups [1,9]. They are called optimistic algorithms and
profiles. In Section 4 we discuss an application of PDES conservative algorithms reflecting the way in which
for in physics in informatics. In Section 5 we discuss a synchronization is performed. FaS algorithm is
possible application of the PDES method in system introduced and described in the paper [10], it may be
software. In Section 6 we summarize our discussion and viewed as representatives of the third group of PDES.
point out possible future research.
2.1 Conservative PDES algorithm
2 Time evolutions in Parallel Discrete Event
In the case of a conservative algorithm, a subsystem
Simulation method
waits for all events to happen in every subsystem from
We discuss an evolution of the local time profile and each it depends on, before proceed in time. Clearly it is
of the GVT in PDES algorithms [8,9] using simplified possible only for the simulations in which we know in
models of that evolution. In a sense, it is a modelling of advance all dependencies between subsystems in the
the possible PDES algorithms. As we already physical system (named as arcs on graphs of LPs).
emphasized, the evolution of the LVT and the GVT is Therefore, in a conservative algorithm LVT Wi of LP(i)
associated with synchronization problems. The should not be greater than LVT Wk of all LPs from which
importance of the model-of-model approach becomes LP(i) depends on [1].
clear if we take into account that using that approach we In realizations, one has to define matrix of
may prove, for example, that deadlocks are never dependencies M(i,j) with elements equal to 0 or 1. If an
happens in conservative algorithms. In terms of LVT element M(i,j) equals to 1, than LP(j) does may depend
profile time evolution, it means that the speed of the on events generated by LP(i). This matrix may be
time profile is always positive because there exists at viewed as a matrix of sender-receiver for message
least one object with the LVT lower than LVTs of other sending. For each LP(i) the corresponding column of a
objects, therefore that object may proceed in time. In the matrix is defined, and each time event happens in
108
simulations of LP(i), it sends messages to all LP(j) for with the logical processes (LP) rather than with the
which element of matrix M(i,j) is equal to one. processing elements (PE). Second, it is assumed that it
Accordingly, LP(j) will never go forward in simulation is possible to handle several LPs within one PE, and that
with LVT W j larger than minimal W k in its queue of communication within PE may be efficient using
messages stamped with the LVTs. conservative algorithm. In that case we can use
In subsection 3.1 we will show that a conservative correspondence of time profile evolution in the
algorithm never leads to a deadlock and will conservative PDES [10] with the evolution of surface
demonstrate how simulation speed of GVT behaves for growth on substrate [12], described by Kardar-Parisi-
some simple cases Zhang partial differential equation, named KPZ
equation. There is one-to-one correspondence between
2.2 Optimistic PDES algorithm classification of boundary conditions in KPZ equation
In an optimistic algorithm, all LPs are evaluated in and classification of PDES algorithms [11]. In KPZ
time with assumption that causality is fulfilled. Clearly, equation, we may divide space into domains, and
LVT of subsystems increased independently, and it may classify possible boundary conditions (BC) between
happen that LP(j) will receive message with time stamp domains. There are three possibilities of BC:
W i smaller than W l in the last processed message. continuous, free, and fixed. Mapping of KPZ onto
Causality is broken! LP(j) proceeds using the wrong PDES is as follows. Domain corresponds to PE, and BC
information, while ignoring an event which happened between domains corresponds to the communication
protocol. Therefore, with many LPs running within one
earlier in simulation time, W i < W l. An optimistic
PE and updated using conservative algorithm, there are
algorithm introduces a protocol of rollback in time that
three possible algorithms for communication protocol
resolves the problem of causality. This is done again
between PEs (associated with corresponding BC):
using a message sending procedure. The most well
conservative, optimistic, and FaS. FaS is abbreviation of
known optimistic protocol is Time Warp paradigm [2].
the Freeze-and-Shift algorithm.
The event causing rollback is called a straggler. A
Let us consider the simplest parallel simulation task,
recovery is organized by undoing effects of all events
in which LPs are placed along the horizontal lines (see
that have been processed prematurely by the LP
Figure 1), and send time-stamped messages only to the
receiving the straggler.
left and right LP. Let us assume we distribute l logical
In addition to the Input Message Queue and Output
processes on each of N processor elements.
Message Queue, each LP keeps State Queue, which is a
list of the recently processed events. It gives a
possibility to restore an old state vector on rollback. LP
sends anti-message which annihilates the original one
when it reaches its destination LP. So, rollback
generates an anti-message (negative), which annihilates
while meeting the corresponding primary (positive)
message. This happens like annihilation in particle
physics, with an electron and a positron. If LP(i)
receives an anti-message that corresponds to a positive Fig. 1. Possible realization of the first two “steps” of the
message, which it has already processed, then the time profile evolution in the conservative algorithm.
process must be rolled back itself to undo the effect of Number of LPs is equal to the number of PEs. Each LP
processing that positive message. In the case if both an sends message only to the neighboring LP at the left and
anti-message and a positive message are in LP(i) queue right side.
they just annihilate within queue and do not cause any
rollback by LP(i). Recursive repetition of this procedure Figure 1 shows time profile of simulation with 12 LPs
allows cancelling all erroneous calculations. This on the 12 PEs. We start with all LVT W i = 0. The system
recursive procedure may lead to the avalanche in the evaluates to the second raw of black LPs. At that
rollback, but the length of the avalanche is limited by “moment” GVT is defined by the process with the
GVT. No event with time-stamp smaller than GVT will minimal LVT time, it is process number 3. At the
ever be rolled back, so LPs with GVT may discard that “next” step of simulation only LPs with the minimal
event from the State Queue. local virtual time (number 1, 3, 6, 8, 10, and 12) are
In subsection 3.2 we will describe a simple model of allowed to proceed to the possible time marked by pink,
evolution of LVT profile, and demonstrate an analogy according with the conservative algorithm. The third
with a process known in physics as an unrestricted “step” consists in possible evaluation of those processes,
surface growth. which are sitting in local minima of LVT profile; it is
processes with number 4, 7, 9, and 11.
2.3 Freeze-and-Shift (FaS) algorithm
To make the classification of the PDES algorithms
possible, two steps are performed. First, concept of
Virtual Time [2] was generalized, as LVT is associated
109
by group of Korniss, mainly on the conservative
algorithm. Next, we discuss results of our research with
Mark Novotny, mainly for the optimistic algorithm.
3.1 Conservative PDES and Kardar-Parisi-Zhang
equation
Let us consider the simplest case of application of
conservative algorithm to the chain of LPs
Fig. 2. Possible realization of the first two “steps” of the
communicating with neighbors only as already
time profile evolution in the FaS algorithm. Number of
considered in Subsection 2.3 and depicted schematically
LP=4 in each of PE=3. Each LP sends message only to
in the Figure 1. LVT Wi is associated with each LP(i). We
the neighboring LPs at the left and right side: it is
conservative algorithm between LP in each PE, and start with all Wi = 0 (LVT time profile is flat at t = 0,
boundary LPs are frozen. where t is artificial time, and the discrete one for
simplicity1). Evolution of LVT profile may be written
Figure 2 demonstrate FaS algorithm applied to the chain as iterative process: W i (t+1) = W i (t)+K i(t) if W i (t) d min
of LPs communicating only with the neighbors. 12 LP {Wi-1 (t),W i+1 (t)}, and W i (t+1) = W i (t), otherwise. Here K i(t)
distributed over 3 PE, with 4 LP in each PE. Boundary are random variables. When LP(i) advanced in time it
LP (number 1,4,5,8,9, and 12) is not allowed to proceed sends messages to the right LP(i+1) and left LP(i-1)
after step 1 - they are fixed. On the “next step” only LP stamped with the LVT W i (t+1). In this process causality
in the local minima and not on the border of PE are is never violated.
allowed to proceeds simulation, it is processes 3, 6, and This algorithm is free of deadlock. This can be seen
10, marked by pink. Compare Figure 2 and Figure 1 – qualitatively from the fact that statistically there are four
the difference only on the border of PE. We have to possible local configurations for LVT of LP(i), and only
note, that no LP could proceed after that step – there are one of them with the local minima. Therefore, average
no local minima of LVT time profile. The next step of speed of GVT (time horizon) should be 1/4. In fact,
FaS algorithm is to perform shift phase, i.e. perform numerical simulation of Korniss et al model gives
exchange of LPs between PEs. For the case depicted in average speed of the time horizon is 0.246 410(7) [10]
Figure 2, we have to “shift” LPs cyclically by two for the Poisson distribution of Ki(t), 0.267(4) [11] for the
positions. So, PE number 1 will keep LPs 3,4,5, and 6; uniformly distributed Ki(t), and 0.258(5) [11] for the
PE number 2 – LPs 7,8,9, and 10; PE number 3 – LPs Gaussian noise. So, average utilization of CPU time in
11,12,1, and 2. So, from that boundary LPs (the fixed this simplified local version of conservative algorithm is
ones) will be 3, 6, 7, 10, 11, and 2, this is the freeze close to 25 per cent.
phase of FaS algorithm. After that, processes 4, 8, 12, Korniss et al argued that in the continuum limit (the
and 1 could proceed. increment of the artificial time t is infinitesimal)
The forward process depends on the number l of iteration process for the LVT time profile obeys the
logical processes on each of N processor elements, and equation introduced by Kardar-Parisi-Zhang (KPZ) for
time to stop can be estimated as l2/4 for the linear chain the surface growth on the solid substrate [10,11]. This is
of LPs discussed for simplicity. very important observation because it provides mapping
FaS allows for an effective realization on the parallel of conservative PDES algorithm with local message
computers, clusters of computers, and in grid exchange onto KPZ problem. In turn, KPZ problem is
computing. It effectively decouples the computation well investigated [12], and solution demonstrates
phase and communication phase for these parallel universal properties of the profile fluctuations
computations. This allows, for example, the depending on the time and system size.
programmer to utilize fast block-memory-transfer First, from the fluctuations of surface profile it is
commands. Furthermore, it should allow the efficient possible to estimate the width growth of the profile.
simultaneous execution of both computation part and Mapping this result from KPZ onto the properties of
communication part when they are performed by LVT time profile in PDES shows us that width of LVT
independent hardware. grows with time as t3/2. This means that LPs are
unsynchronized with simulation time growth. This is a
3 Evolution of time profile in PDES bad news, but from KPZ analysis it follows, that width
growth is saturated at some moment of time. Saturation
In this section we will discuss evolution of LVT time level of LVT width is proportional to the square root
profile in PDES. We do not discuss case studies in from the number of LPs. This is a good news.
which particular realization of PDES are investigated
empirically. Instead, we will discuss properties of 3.2 Optimistic PDES and unrestricted surface
models which mimics essential features of PDES and
which can be investigated using methods of applied 1
mathematics and mathematical physics. First, we One should not confuse artificial time t we introduce for the sake of
simplicity with the time at which events happens and time-stamped
discuss results of the research done in series of papers messages are generated.
110
growth papers that may be of general interest.
SPPARKS is the project running by Sandia National
Model for the evolution of time profile, which mimics
Laboratory and developing kinetic Monte Carlo
essential features of optimistic algorithm, was
simulator [16]. SPPARKS runs on single processors or
introduced first in [13]. Let us consider again for the
in parallel using message-passing techniques and a
sake of simplicity the linear chain of LPs. The first step
spatial-decomposition of the simulation domain. The
is the optimistic unrestricted evolution of LVT at which
code is designed to be easy to modify or extend with
simulation evolves forward in time and the second is the
new functionality. Main idea is in partitioning of
rollback (or backward) algorithm of sending anti-
simulation space into the computational domains, and
messages. For this purpose one may introduce two
performing simulations in parallel within some time
parameters, F and B associated with two steps of
window. In some sense, this idea is similar to FaS
optimistic scheme. Namely, at the first step one
algorithm. It implements several KMC solvers whose
evaluates the time horizon by updating time of the
serial computational complexity ranges from O(N) to
randomly chosen LPs with value F following the
O(NlogN) to O(1) in the number of events N owned by a
Poisson distribution, and every LP can be chosen with
processor. In a generic sense the solvers are catalog a
the non-zero probability. Then one have to relax
list of "events", each with an associated probability,
(rollback!) LVT profile B times: for all LP(i) value of
choose a single event to perform, and advance time by
LVT time changed to the value of LVT time of the
the correct amount. Events may be chosen individually
nearest left LP(i-1) or right LP(i+1). Value of B satisfies
at random, or by sweeping over sites in a more ordered
the Poisson distribution. The average value of the speed
fashion. Problems they addressed are magnetic spin
profile u(t) evolves proportional to (q-qc)a, with
models, surface growth on substrate, etc. One of the
q=1/(1+B), with a value of a=1.74 close to the critical
simulations is connected with the microstructural
exponent of directed percolation and describing
evolution during sintering [17]. The model is a grain
roughening transition in one-dimensional unrestricted
growth with physical effects of particle diffusion,
growth process [14]. The value of qc=0.23(3)
vacancy diffusion, and vacancy annihilation. This
corresponds to the value of the parameter B=3.3. It
sintering model is able to capture all the necessary
should be mentioned that the width of the LVT profile
mechanism to simulate simple solid-state sintering
acts better for values q far away from the critical value
correctly. It was demonstrated by comparing it to the
qc. When close to qc the system practically does not
three--dimensional, in-situ images taken in a high-
evolve in time. The growth of width provides an
energy synchrotron during the sintering of Cu particles.
analogy with the roughening transition, which is in the
Similar project is running by Lawrence Livermore
same universality class as a directed percolation [14].
National Laboratory, also for the parallel kinetic Monte
Practically this means that if the length of avalanche
Carlo [18,19]. It is based on ideas of virtual time and
(see subsection 2.2) is larger than B=3.3 in average,
random order of simulating computational domains
optimistic algorithm will not proceed in time but will be
within time window. It is tested on the model of billion
rather stock. It is quite important to perform case studies
atoms. The approach was tested with the Ising model
of optimistic algorithm in order to understand that
(the simplest ferromagnetic model) and demonstrated
prediction in details.
that scalability with the number of nodes is close to the
ideal one.
4 Using PDES in physics and informatics A number of projects connected with simulation of
The most important feature of PDES for using in communication networks are using PDES. One of such
physics is that PDES algorithms are scaled very projects is running by INRIA, and it is connected with
naturally with the physical system size (number of the simulation of Border Gateway Protocol using
objects, or logical processes) and with the hardware size optimistic PDES [20] with number of nodes from 10 to
(number of nodes, cores, threads) with the only 100 thousands. In the last case simulation lasts up to 36
requirement that the time for message distribution is hours on 4-core Intel Xeon W5580 3.2Ghz with 64GB
smaller than computation time. This is valid for the of RAM running 64-bit Fedora 12 on a Linux kernel
simulation in which time to simulate object is much 2.6.32.
larger than time to send message. It is clear that more
complex and hard simulations the more gain can be 5 PDES and system software
reached.
The goal of high end computing (HEC) initiative in
Last years a number of papers with the analysis of
US and Europe is to deploy large-scale computing
implementation of PDES ideas in the large-scale
platforms with hundreds thousands of nodes. In the
computing of models in physics have been published.
White Paper “Performance Technologies for Peta-Scale
Most of discussions were done for the analysis of
Systems” the group of US researchers emphasized
application of PDES algorithm to the Ising model,
importance of the research on the program tools and
which is the simplest model of ferromagnetism, and
software facilities, and in particular in system
which is standard model to test algorithms in Monte
simulation [7]. They see that as “an open-source
Carlo simulations [15]. Here we review some of the
architectural simulation framework and API that enables
111
plug-and-play between separately-developed simulators 6 Summary and discussion
for different architectural features… and would also
enable zoom-out and zoom-in between statistically- Parallel Discrete Event Simulation paradigm is one of
based and cycle-accurate simulation techniques”. the paradigms for the large-scale high performance
Solution seen by them as “the simulations will be computing. It is under extensive development by many
decomposed into logical processes, and will be groups of researches. PDES was successfully tested on
synchronized by either conservative or optimistic the new architectures, from Blue Gene and conventional
methods … as developed in PDES community”. The clusters and CPUs to the Virtual Machine/Cloud
reason is that by such approach the high degree of approach. Efficiency of PDES does depend on the
computational parallelism in the simulation will problem and it is promising to investigate properties of
perfectly match the high degree of real parallelism of synchronization, scalability and efficiency.
HEC systems.
There are several simulators available as open source 7 Acknowledgments
software.
The most famous realization of Time Warp Russian Science Foundation supported this work
synchronization algorithm is Rensselaer's Optimistic under the grant number 14-21-00158.
Simulation System, named as ROSS simulator [21]. It is
public domain software, which can be installed on the References
cluster. Recently group of researchers from Lawrence
Livermore National Laboratory, Rensselaer Polytechnic [1] R.M. Fujimoto. Parallel Discrete Event
Institute and University of Illinois at Urbana- Simulation. Comm. of the ACM, 33(10), p. 30-53,
Champaign tested ROSS on the Blue Gene architecture 1990.
with successful use of almost 2 billions of cores [22] [2] A.M. Law and W.D. Kelton. Simulation modeling
yielding speed of 504 billion events per second. and analysis. McGraw-Hill. Third edition. 2000.
Modification of optimistic algorithm named shock- [3] D.R. Jefferson. Virtual Time. ACM Trans.
resistant Time WARP (SRTW) designed by group in Programming Languages and Systems, 7(3), p.
Westminster University and analysis was performed 404-425, 1985.
using the Grid’5000 platform [23]. [4] S. Miller and S. Luding. Event-Driven Molecular
Group from Bologna University developed PDES Dynamics in Parallel. J. Comp. Phys. 193(1), p.
software named ErlangTW based on Time Warp 306-316, 2004.
algorithm. Work was presented at 1st ACM SIGPLAN
[5] D.C. Richardson, K.J. Walsh, N. Murdoch, and P.
workshop on Functional high-performance computing
Michel P. Numerical simulations of granular
(FHPC '12) [24], where some preliminary performance
dynamics: I. Hard-sphere discrete element method
results on multicore and distributed architectures using
and tests. Icarus 212, p. 427-437, 2011.
the PHOLD benchmark.
There are at least two attempts to implement PDES in [6] J.P. Nilmeier, J. Marian J. A Rigorous Sequential
Cloud architecture. Fujimoto group introduced TW- Update Strategy for Parallel Kinetic Monte Carlo
SMIP (the Time Warp Straggler Message Identification Simulation. (accepted to Communications in
Protocol) protocol resigned for environment with shared Computer Physics. 2014) arXiv link:
hardware resources for which it is known that traditional http://arxiv.org/pdf/1403.4674.pdf
Time Warp (TW) algorithm shows poor performance [7] D.H. Bailey et al. Performance Technologies for
[25]. TW-SMIP protocol defines dynamic Peta-Scale Systems: A White Paper Prepared by
synchronization points for individual LPs based on the Performance Evaluation Research Center and
straggler messages (a new event with time stamp Collaborators. May 14 2003.
smaller than other events it has already processed), [8] R. Fujimoto. Parallel and Distributed Simulation
which improves efficiency of simulation. Systems. Wiley Interscience, 2000.
Second example is realization of PDES simulator on [9] G. Korniss, Z. Toroczkai, M.A. Novotny, and
Cloud/Virtual machine platforms developed in Oak P.A. Rikvold. From Massively Parallel
Ridge National Laboratory [26]. The scalability of Algorithms and Fluctuating Time Horizons to
virtual time scheduler has been tested on 128 virtual Nonequilibrium Surface Growth. Phys. Rev. Lett.,
machines multiplexed on 32 cores, showing 84, p. 1351-1354, 2000.
improvement in the runtime relative to the default [10] L.N. Shchur and M.A. Novotny. Evolution of time
Cloud/VM scheduler. Authors believe that algorithmic horizons in parallel and grid simulations. Phys.
design, observations, and results of their work are Rev. E, 70, 026703, 2004.
timely for emerging cloud/VM-based installations,
[11] M. Kardar, G. Parisi, and Y.C. Zhang. Dynamic
highlighting the need for PDES-specific support in high
Scaling of Growing Interfaces. Phys. Rev. Lett.,
performance PDES on the cloud/VM platforms.
56, p. 889-892, 1986.
[12] L.N. Shchur, M.A. Novotny. Evolution of time
horizon in parallel discrete event simulations.
112
Proceedings of Conference “Scientific services in [20] ROSS - Rensselaer's Optimistic Simulation
Internet: multicore computer world”, Abrau- System. http://www.cs.rpi.edu/~chrisc/ross.html
Dyurso, p. 197-200, 2007 (in Russian). [21] P.D. Barnes Jr, D.R. Jefferson, M. Schordan, D.
[13] U. Alon, M.R. Evans, H. Hinrinchsen, and D. Quinlan, C.D. Carothers, L.V. Kale. Extreme scale
Mukamel. Roughening Transition in a One- optimistic parallel discrete event simulation with
Dimensional Growth Process. Phys. Rev. Lett., dynamic load balancing. Proceedings of the 2014
76, p. 2746-2749, 1996. Winter Simulation Conference. 2014.
[14] D.P. Landau and K. Binder. A Guide to Monte [22] G. Krafft and V. Getov. Transaction-oriented
Carlo Simulations in Statistical Physics. 3d simulation in Ad Hoc Grids: design and
edition. Cambridge: Cambridge University Press. experience. Procedings of HPCS, pp. 38-44, 2008.
2013. [23] L. Toscano, G. D’Angelo, and M. Marzolla.
[15] SPPARKS Kinetic Monte Carlo Simulator Parallel discrete event simulation with Erlang.
[16] . Sandia National Laboratory. FHPC '12 Proceedings of the 1st ACM SIGPLAN
http://spparks.sandia.gov/index.html workshop on Functional high-performance
[17] C. G. Cardona, V. Tikare, S. J. Plimpton. Parallel computing. p. 83-92, 2012.
Simulation of 3D Sintering. Int J Comp Materials [24] A.W. Malik, A.J. Park, R.M. Fujimoto. An
Science and Surface Engineering, 4, p. 37-54, Optimistic Parallel Simulation Protocoal for Cloud
2011. Computing Environments. SCS M&S Magazin, 4,
[18] J.P. Nimeier and J. Marian. A rigorous sequential p. 1-9, 2010.
update strategy for parallel kinetic Monte Carlo [25] S.B. Yoginati and K.S. Permulla. Efficient
simulation. Computer Physics Communications, Parallel Discrete Event Simulation on
185, p. 2479-2486, 2014. Cloud/Virtual Machine platform. ACM
[19] D. Coudert, L. Hogie, A. Lancin, D. Transactions on Modeling and Computer
Papadimitrou, S. Perennes, and Isaam Tahiri. Simulation. 2014.
Feasibility study on distributed simulation of BGP.
Preprint arxiv: 1304.4750.
113