<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christian de Schryver</string-name>
          <email>schryver@eit.uni-kl.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Henning Marxen</string-name>
          <email>marxen@mathematik.uni-kl.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Schmidt</string-name>
          <email>schmidt@eit.uni-kl.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Micrelectronic Systems Design Department, University of Kaiserslautern Erwin-Schroedinger-Strasse</institution>
          ,
          <addr-line>67663 Kaiserslautern</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Stochastics and Financial Mathematics Department, University of Kaiserslautern Erwin-Schroedinger-Strasse</institution>
          ,
          <addr-line>67663 Kaiserslautern</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-Modern financial mathematics consume more and more computational power and energy. Finding efficient algorithms and implementations to accelerate calculations is therefore a very active area of research. We show why interdisciplinary cooperation such as (CM)2 are key in order to build optimal designs. For option pricing based on the state-of-the-art Heston model, no implementation on dedicated hardware is known, yet. We are currently designing a highly parallel architecture for field programmable gate arrays based on the multi-level Monte Carlo method. It is optimized for high throughput and low energy consumption, compared to GPGPUs. In order to be able to evaluate different algorithms and their implementations, we present a benchmark set for this application. We will show a very promising outlook on future work, including dedicated ASIPs, fixed-point research and real-time applications. Index Terms-finance, benchmarking, hardware acceleration</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
    </sec>
    <sec id="sec-2">
      <title>Nowadays, financial markets are as vivid as never before. In</title>
      <p>modern electronic markets, stock prices may change several
times within a few milliseconds. Participating traders (that can
also be computers) have to evaluate the prices and react very
quickly in order to get the highest profit, which requires a lot
of computational effort.</p>
      <p>
        In general, running these computations on servers or clusters
with standard CPUs is not feasible due to either long run times
or high energy consumption. Using general purpose graphics
processing units (GPGPUs) as accelerators helps to increase
the speed, but still requires a lot of energy. Besides, at the
moment energy efficiency becomes more and more crucial for
the reason of high energy costs and - even more critical - a
limited supply of energy that can be provided. For example,
in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] it is stated that the City of London (with its new
financial center Canary Wharf where a lot of leading institutes
are located) does not provide additional energy until after
the Olympic winter games in 2012, that have higher priority.
Financial institutes are currently outsourcing all computing
systems not used for pricing computations (such as storage
or backup) out of the critical area.
      </p>
      <p>This leads to the dilemma of needing faster computations on
the one hand and limited energy resources on the other hand.
PC</p>
      <sec id="sec-2-1">
        <title>GPGPU</title>
      </sec>
      <sec id="sec-2-2">
        <title>FPGA</title>
        <p>
          Fig. 1. PC vs. GPGPU vs. FPGA
Moving away from GPGPUs to dedicated hardware
accelerators can help to drastically reduce the power consumption
at the same or even higher throughput. For different
application domains, some comparisons between CPU, GPGPU and
programmable hardware units (field programmable gate arrays,
FPGAs) have already been shown in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], highlighting
the enormous potential of energy savings for FPGAs. Figure 1
shows that standard software implementations require the
least effort for implementation and can provide the highest
flexibility, while dedicated hardware solutions on FPGAs are
hard to design and - once finished - not easy to be changed
again. From a different view, FPGAs can save up to about
99% of energy compared to a software implementation on a
standard PC and allow a much higher throughput. GPGPUs
are located between standard PCs and FPGAs. Between each
neighboring architectures, one can expect a difference of about
one order of magnitude on average for power consumption and
throughput [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Although most financial institutes are relying
on GPGPUs at the moment for the reason of standardized
software development toolkits and their flexibility, FPGAs
are an interesting alternative because of their higher energy
efficiency.
        </p>
        <p>A big challenge is the complexity of many models used
to estimate the future price behavior of financial products. In
many cases no mathematical closed-form solution exists so
that approximation methods like Monte Carlo simulations or
the finite difference method must be employed. Though, it is
necessary to precisely specify a solution right at the beginning
of the design process. Re-designing a nearly finished hardware
implementation can require a very high amount of effort. The
Center of Mathematical and Computational Modeling (CM)2
of the University of Kaiserslautern is a perfect forum for an
interdisciplinary cooperation to tackle this issue.</p>
        <p>For this project, we have developed a design methodology
that helps to select a feasible parameter set for a hardware
accelerator in question that we present in Section III. In order to
make implementations transparently comparable, we propose
to use standardized benchmark sets - we elaborate on this in
Section IV. By applying this methodology and our benchmark,
we have developed some reference implementation designs
that we show in Section V, together with the status quo of our
research and our contributions up to now. In Section VI we
give an overview of open issues and what we plan to examine
in the future. Section VII concludes the paper.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>II. STATE-OF-THE-ART AND RELATED WORK</title>
    </sec>
    <sec id="sec-4">
      <title>Mathematical finance basically has two different directions.</title>
      <p>One is concerned with the evaluation of optimal investment
strategies under certain market conditions and the other
direction is the pricing of derivatives. The basic idea of pricing
options is to assume some sort of model for the underlying
price process and take the discounted expected value - under
a certain measure - of an option as the option price.</p>
      <p>A very common problem treated is the calculation of
option prices based on the Black-Scholes model from 1973.
This model relies on one stochastic differential equation and
describes the price development of an option over the time,
depending on market parameters such as riskless interest rate,
long term drift and a constant volatility.</p>
      <p>
        Accelerator design for financial mathematics is a very active
research area, and several FPGA implementations have been
published in the past. At the FPL 2008 Woods and VanCourt
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] presented a hardware accelerator for multiple,
quasirandom, standard Brownian motions suitable for the
acceleration of quasi-Monte Carlo simulation of financial derivatives.
For credit risk modelling, Thomas and Luk could gain a
speed-up of more than 90 times compared to a 2.4 GHz
Pentium-4 Core2 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. An accelerator for Monte Carlo based
credit derivative pricing was developed by Kaganov, Chow
and Lakhany [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] in 2008 and showed to be 63 times faster
than their software model. Wynnyk and Magdon-Ismail [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
presented an FPGA accelerator for American option pricing
based on the Black-Scholes model in 2009 and could achieve
a speedup of eleven up to 73 times compared to a software
implementation running on a standard PC.
      </p>
      <p>
        However, nowadays the Black-Scholes model is no longer
up to date and does not provide an accurate reflection of
modern financial market behaviors, mostly because of the
volatility not being constant in reality. Furthermore,
closedform solutions for the Black-Scholes model exist and it only
has demonstration purposes to apply stochastic solution
methods such as Monte Carlo simulations or the finite difference
method [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Nevertheless, it is still very common to publish
accelerator implementations based on that model, at least in
the electrical engineering community.
      </p>
      <p>
        In 1993, Steven L. Heston presented a more accurate model
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] that extends the model from Black and Scholes by a
second stochastic differential equation for stochastic volatility
variations. This significantly increases the complexity of the
calculation and of the implementation thereof. Nevertheless,
the Heston model reflects the real behavior of current stock
markets much better and is nowadays widely accepted in the
financial mathematics community. But - to the best of our
knowledge - no hardware accelerator for that model has been
published up to now. For GPGPUs, the first implementations
have been presented in the last year. Bernemann, Schreyer
and Spanderen from the german bank WestLB [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] showed
that they could achieve a speeup of 50 times over CPU by
using GPGPUs for simulating the Heston model. Zhang and
Oosterlee published a technical report [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] in March 2010
where they even showed speedups of more than 100 times.
      </p>
      <p>The presented speed-ups look very impressive. However,
unfortunately we were not able to fairly decide which solution
seems to be the most promising for further research and
refinements. We will go a bit more into the details of that
problem in Section IV.</p>
    </sec>
    <sec id="sec-5">
      <title>III. HOW TO CHOOSE THE RIGHT DESIGN</title>
    </sec>
    <sec id="sec-6">
      <title>For many fields of applications, finding the most efficient</title>
      <p>design under certain constraints is a difficult job. The main
reason for this is a large design space. The design space is
made up of all possible parameter choices for the design, that
means all possible implementation instances.</p>
      <p>Most parameters are not adjustable independently, since
they are mutually linked. For example, fixing the target
architecture to FPGAs one the one hand has a large impact on
the selection of suitable algorithms and number systems, and
on the other hand affects many performance metrics such as
energy consumption, throughput and numerical precision.</p>
      <p>Furthermore, the parameters within the design space are in
many cases not limited to a single domain of expertise, but
require interdisciplinary know-how and decisions. This makes
not only the choice of the right values a challenge, but also
the evaluation and comparison of different implementations.
Besides speedup, more characteristics such as energy
efficiency, convergence rate or numerical precision may be very
important. This especially holds true for financial mathematics
accelerators.</p>
      <p>During our research we have seen a lot of papers that
show elaborate implementations of a specific algorithm (see
Section II) that is not questioned in the papers anymore.
However, we claim that the algorithm itself is in fact not the
most important selection. An accelerator should be designed to
solve a specific problem - it does not matter which algorithm is
used, as long as the result is calculated correctly. We therefore
propose to distinguish clearly between three terms:
the problem that is tackled (what to solve)
the employed model (how to solve)
the solution (how to build)</p>
      <p>To clarify the situation, we use the problem “calculate the
price of an option with two barriers for a given duration” as
an example. European knock-out barrier options pay a certain
amount of money at a fixed maturity time depending on the
value of the underlying asset. This amount is only paid when
the barrier is not crossed up to the maturity time. If one of the
barriers is hit, the option becomes worthless. Thus it needs to
be checked whether the barrier was ever hit or not. Figure 2
illustrates the typical random behavior of different realizations
of an asset price over the time.</p>
      <p>It is obvious that the problem description itself does not
yet give any suggestions to the solution. Since the price of an
option is tightly coupled to the price of a certain stock at the
market, we need a model that provides the stock price behavior.
For our chosen example, suitable models are for example
the Black-Scholes model (outdated nowadays) or the Heston
model. The model in general gives a formal and abstract view
of (a certain aspect of) the problem.</p>
      <p>The solution finally is a dedicated approach for solving a
(modeled) problem. It is characterized by a specific algorithm
and its implementation. For evaluating the Heston model,
for example, finite difference methods or stochastic Monte
Carlo simulations can be used. They may be implemented for
example on standard PC clusters, GPGPUs or on FPGAs.</p>
      <p>The parameters of the design space can basically be divided
into two groups: the algorithmic parameters that are mostly
selected by mathematicians, and the implementation parameters
determined by the hardware designers. However, as mentioned
before correlations exist between several parameters, so that
the selection should be optimally made by having a generative
exchange between experts of both groups.</p>
      <p>
        For the rest of the paper, we will use the
problem-modelsolution triple “calculate the price of an option with two
barriers for a given duration with the Heston model by using
Monte Carlo methods” as a showcase. Even for that specific
selection, the design space is still very large. An extract of
the related design space is shown in Figure 31. We cannot
explain every parameter here (for details see [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]), but
in a nutshell it is obvious that even for a very specific task
a huge amount of possible accelerator implementations may
exist. In Figure 3 we see two different views of the same
tree. These trees are symbolic for the design space. The left
one is the view of a mathematician that has mostly to do
with algorithmic and numerical aspects. The right view is the
1http://www.sxc.hu/photo/285734 We thank sxc user vxdigital for sharing this image of the oak tree and allowing the
use of it in this paper. He holds all copyrights to this image.
one of an electrical engineer who may not understand the
mathematician’s concerns in detail, but is wondering what the
best decisions with respect to hardware efficiency might be.
      </p>
    </sec>
    <sec id="sec-7">
      <title>IV. BENCHMARKING - FAIRLY COMPARING IMPLEMENTATIONS</title>
    </sec>
    <sec id="sec-8">
      <title>Comparing different implementations is a non-trivial task.</title>
      <p>Many attributes can be considered, including speed, accuracy
and energy consumption. This becomes even more difficult
when it is not clear which algorithm was used. Furthermore,
in many cases it is not possible to distinct whether a presented
algorithm or implementation has the displayed behavior only
with the employed example or in a more general setting.
Nevertheless, it is important to be able to compare various
algorithms and different implementations, also over various
target architectures.</p>
      <p>
        Therefore the need for a benchmark set arises. This set
should be independent of the algorithm and implementation
used. For option pricing in financial mathematics, this need
has already been claimed by Morris and Aubury in 2007 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
We are not aware of any progress made since that paper was
published. We therefore have decided to develop a completely
new benchmark that will enable us to fairly compare different
algorithms, e.g. multi-level and single-level Monte Carlo, on
different hardware. Thus we propose a benchmark based on
the problem/model combination. In our case it is the pricing
of double barrier options in the Heston model. It is clear that
independently of the used algorithm and implementation the
result must be the same. Therefore the final prices of the
different options in the benchmark set have to be provided.
      </p>
      <p>With the benchmark set it is possible to use different
metrics, like speed (that is now the real time until the results
are available), accuracy and power consumption, for the
calculations leading to the right (or approximate) result without
actually looking at the implementation and the algorithm itself.
This allows a fair and publicly traceable comparison of the
solution part of the problem/model/solution triplet.</p>
      <p>
        The benchmark itself consists of different combinations
of parameters for the Heston model and for barrier options,
including the prices. The data for the Heston parameters is
taken from different recent publications ( [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]) and
are enlarged by an extremer case. The benchmark parameters
span a wide range of possible combinations used in this field.
For some options of the benchmark closed form solutions
exist that allow to obtain the exact results. This is important
to verify that simulations converge to the correct values and
makes it easier to compare the results. For the other cases the
exact prices are not known and are therefore provided as close
approximations.
      </p>
      <p>For further publications we not only encourage the authors
to use the presented benchmark but also give details of the
algorithm and the implementation used. Thus it is possible
to see where an increase in performance comes from. This
is essential in order to evaluate the contribution of a certain
result and to find ways to improve it even further. To achieve
a higher transparency we will publish the code we used to</p>
      <p>MC Simulation</p>
      <p>Heston Model
(a) Algorithmic parameters</p>
      <p>Path-Serial</p>
      <p>Path-Parallel</p>
      <p>Calculation
(b) Implementation parameters
analyze the algorithms and the one we implemented on the
FPGA.</p>
      <p>The benchmark was directly used when comparing different
Monte Carlo algorithms with the metric of computational
complexity. In our special problem/model combination there
are a lot more adjustments to the algorithms than seen in
figure 3(a). Many of them can be combined, what leads to
a huge design space that now can be handled by applying
the benchmark set, so that we haven been able to choose a
specific algorithm to be efficiently implemented on dedicated
hardware. The multi-level Monte Carlo method provides a
better asymptotic convergence behavior, using our benchmark we
checked whether this method is beneficial for our application.
We will show our first results in the next section.</p>
    </sec>
    <sec id="sec-9">
      <title>V. STATUS QUO AND FIRST RESULTS</title>
      <p>We have started this cooperation within (CM)2 about one
year ago now. Both participating chairs have experience of
more than ten years in their respective field of research, so
that we can profit from a lot of knowledge in the areas of
efficient hardware design respectively stochastics and financial
mathematics.</p>
      <p>
        After evaluating the state-of-the-art, we decided to focus on
accelerators for option pricing based on the Heston model - it
seems to be a very promising topic since no implementations
(either on GPGPUs or FPGAs) have been available one year
ago. In contrast to that, the Heston model is already widely
spread within the financial community. From former research
done in the group of Prof. Korn, multi-level Monte Carlo
methods [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] seemed to provide a better convergence behavior
than standard single-level Monte Carlo or finite difference
methods. Monte Carlo methods also have the advantage of
being very flexible. A barrier that is only relevant on a certain
time interval to evaluate an option price for example can be
easily implemented. Furthermore, multi-dimensional problems
can also be solved. This is needed in the case that an option has
more than one underlying asset. Nevertheless, for our project
we will stick to one asset.
      </p>
      <p>A. A New Random Number Generator for Non-Uniform
Distributions</p>
    </sec>
    <sec id="sec-10">
      <title>Inherently, Monte Carlo simulations always consume a huge</title>
      <p>amount of random numbers. To obtain the maximum hardware
efficiency for our implementation, we have developed a new
random number generator for non-uniform distributions
tailored to our application.</p>
      <p>For our option pricing accelerators, we need two
independent, normally distributed random numbers for each time
step of a single simulated stock price path. In general,
nonuniformly distributed random numbers are generated in two
steps: First, a uniform random number generator creates
uniformly distributed values, and in a second step this number is
transformed into the desired target distribution.</p>
      <p>
        For the uniform random number generation, a lot of research
has already been made leading to efficient and well-proven
implementations, such as the Mersenne Twister MT19937 that
we use. The three main approaches for obtaining non-uniform
distributions are transformation, rejection, and inversion
methods [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>For FPGAs, inversion methods are the usual way to go.</p>
      <p>
        They combine many desireable properties: by applying the
respective inverse cumulative distribution function (ICDF),
they transform every input sample x 2 (0; 1) from a uniform
distribution to one output sample y = icdf (x) of the desired
output distribution by using piecewise polynomial
approximation of the ICDF. The works of Woods and VanCourt [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and
Cheung et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] show FPGA implementations of the inversion
method.
      </p>
      <p>
        However, both implementations use fixed-point number
representations at the input. This leads to a loss of precision
in the tail regions where the probability of a value lying there
is very low. But these extreme events can have a large impact,
for example for options with barriers it is crucial to know if a
barrier was hit or not, since it completely changes the refund values for a higher step width, this means for a lower Monte
conditions. We have therefore developed a new implementa- Carlo simulation level. In the configuration memory, all model
tion based on floating-point representation that provides the parameters are stored.
same precision over the whole ICDF implementation at much Due to the inherent parallelism of Monte Carlo simulations,
lower hardware costs. This work has been presented at the it is not only feasible but self-evident to instantiate as many
2010 International Conference on ReConFigurable Computing of these circuits as possible on an FPGA in order to increase
and FPGAs (ReConFig) in December in Cancu´n, Mexico [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. the simulation throughput.
      </p>
      <p>Our random number converter unit requires only about half The accelerator is implemented on a Xilinx ML-605
evalulaof the area compared to other state-of-the-art implementations tion board equipped with a Xilinx Virtex-6 FPGA. The board
by even higher numerical precision. is connected via Gigabit ethernet to a host PC running the</p>
      <p>To validate our work, it was crucial to develop a new testing user interface and the program that calculates and sets the
methodology, since standardized test suites do not exist for configuration values for the accelerator based on the retrieved
non-uniform distributions. This work has been carried out in simulation results.
cooperation with Elke Korn who has a lot of knowledge in the Synthesis and benchmarking results will be available soon.
field of random numbers. The methodology and the validation We are currently supporting single- and double-precision
results for our implementation are also presented in the paper. floating-point computations and are working on a fixed-point
Our random numbers did not show any noticeable problems implementation as well.
in the stochastic tests and also perfectly passed two different
application simulations. VI. OUTLOOK AND FUTURE WORK</p>
    </sec>
    <sec id="sec-11">
      <title>Also for the intended future work a close cooperation</title>
      <p>B. Fully Parallel Hardware Accelerator between financial mathematics and electrical engineering will</p>
      <p>To the best of our knowlege, no hardware implementations be mandatory, since we are planning to research aspects out
of option price accelerators based on the Heston model exist of both fields.
at the moment. We have therefore started with the first imple- One characteristic of the Monte Carlo method is the inherent
mentation, that is nearly finished now. The hardware is fully- capability to parallelize the calculation. It therefore makes no
parallel, fully-pipelined and designed for high throughput. difference whether several calculations are done in parallel but
slowly or only one calculation is done at high speed, as long
Uniform Random Uniform Random as the number of calculations remains equal altogether. At the
Number Generator Number Generator moment, our parallel implementation allows simulating one</p>
      <p>User Interface time step on many assets simultaneously. The whole procedure</p>
      <p>Floating-Point Floating-Point of calculating one time step is fixed on the FPGA and limited
Gaussian Converter Gaussian Converter to a single algorithm that is set at design time.
price variations volatility One possible way to improve this is to sequentially
comHeston StUepnitGenerator Multi-Level pute the basic calculations needed for one time step on a</p>
      <p>Monte Carlo Application-Specific Instruction-Set Processor (ASIP) within
TeVmapluoerasry CSotaerpse CoMnfeigmuorarytion Controller tchaen FrePdGuAce, tih.ee. rietqiusirerudnatirmeae-apnrodgarallmowmsabtole.caTlchuislatperovcaerdiouures
Memory Unit algorithms since the functionality is defined in a corresponding
program. It is therefore sufficient to load a different program
IEnttheerfranceet ESthtearcnket without changing the hardware. The ASIP will occupy much</p>
      <p>less area than the parallel implementation presented in
SecFPGA Board Host PC tion V-B, therefore many ASIPs can be instantiated in parallel.</p>
      <p>We are currently investigating the necessary instruction set.</p>
      <p>Fig. 4. Fully Parallel Accelerator Structure To increase the speed of the implementation even more,
the floating-point computation can be replaced by fixed point</p>
      <p>Figure 4 shows the structure of one pipelined accelerator computations. In order to do so, errors resulting from the use
circuit. In each clock cycle, our unit consumes two normally of fixed-point calculations have to be approximated. This task
distributed random numbers, one for the stock price variation will also require both theoretical and practical expertise.
and one for the volatility variation. The Heston step generator Besides working on the implementation, the benchmark
unit calculates the price and volatility values for the next is very important to evaluate the designs. It will also allow
time step based on a multi-level Monte Carlo algorithm. The to research fixed-point solutions with respect to necessary
pipeline depth is about 60 stages. In order to maximally precision. Further steps will be to publish the analysis of the
utilize the pipelined hardware, it computates one time step algorithms with the benchmark in a journal. To increase the
for 60 assets in parallel, before moving to the next time transparency even more, we are currently setting up a web
step. The values for the respective time steps are stored in site offering all the program code used to create the analysis.
a memory temporarily. The coarse step memory holds interim It will contain an implementation of the multi-level and the
crude Monte Carlo algorithm with a focus on the calculation
complexity rather than the implementation.</p>
    </sec>
    <sec id="sec-12">
      <title>To provide more benchmarking results, also for different</title>
      <p>architectures, we are working on a GPGPU implementation.
It provides more flexibility and less implementation work than
hardware designs do, but also requires higher energy
consumption. In order to verify this assumption the implementation is
required. The Monte Carlo simulation algorithm for the Heston
model is currently carried out as a student work.</p>
    </sec>
    <sec id="sec-13">
      <title>Moreover, we are currently researching on real-time accel</title>
      <p>eration of financial calculations. This means that hardware or
GPGPU accelerators are linked to real-time data streams. This
approach seems to be very promising for keeping track with
the prices changing quickly in high-frequency trading.</p>
    </sec>
    <sec id="sec-14">
      <title>VII. CONCLUSION</title>
      <p>The financial world is running faster and faster and the
importance of energy consumption increases drastically. To
address this challenge the question of pricing double barrier
options in the Heston setting is faced. As the model is more
complex than the famous Black-Scholes model and these
types of options are path dependent, the algorithms for the
calculations are more distinct and also the implementation
thereof. In order to be able to cope with the strong connection
between the algorithm and the implementation, a combined
mathematical and electrical engineering view is needed. (CM)2
provides a perfect framework to do so.</p>
      <p>To approximate the pricing process Monte Carlo simulations
are used. For a good implementation a fast algorithm with
an adjusted implementation thereof is needed. In order to
distinguish the different algorithms we have created a
benchmark set for double barrier options. This benchmark allows to
fairly analyze and compare the diverse algorithms and designs,
which is a very important issue due to the big differences in
the convergence speed of these algorithms.</p>
    </sec>
    <sec id="sec-15">
      <title>For the target architecture, using FPGAs is the hardware of</title>
      <p>choice if implementation time is not considered. It allows fast
computation with low energy consumption. Nevertheless,
optimal FPGA designs require deep understanding of the FPGA
characteristics and the calculations need to be optimized for it.
One commonality of all the Monte Carlo algorithms is the use
of (pseudo-)random numbers. In the Heston setting standard
normal random numbers are used. There are procedures to
create these running efficiently on GPGPUs and CPUs. A
new</p>
      <p>method was presented which is very efficient for an
implementation on a FPGA.</p>
      <p>The detailed analysis of the diverse algorithms was used to
make an efficient implementation. Therefore the algorithm is
implemented on an FPGA. This should allow a fast
computation with low energy consumption. As far as we know, this
will be the first implementation of a Monte Carlo simulation in
the Heston model on a FPGA. Furthermore it will be the first
implementation of the multi-level Monte Carlo method on this
hardware. Thus this work expands the field of implementations
of financial mathematical problems on dedicated hardware in
several ways as new concepts are taken into consideration.</p>
    </sec>
    <sec id="sec-16">
      <title>ACKNOWLEDGEMENTS</title>
    </sec>
    <sec id="sec-17">
      <title>We gratefully acknowledge the partial financial support</title>
      <p>from Center of Mathematical and Computational Modeling
(CM)2 of the University of Kaiserslautern.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Akers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Bica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Randall</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Young. SciFinance: A Program Synthesis</surname>
          </string-name>
          <article-title>Tool for Financial Modeling</article-title>
          .
          <source>AI Magazine</source>
          ,
          <volume>22</volume>
          (
          <issue>2</issue>
          ):
          <fpage>27</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Andersen</surname>
          </string-name>
          .
          <article-title>Efficient simulation of the heston stochastic volatility model</article-title>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schreyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Spanderen</surname>
          </string-name>
          .
          <article-title>Pricing Structured Equity Products on GPUs</article-title>
          .
          <source>In High Performance Computational Finance (WHPCF)</source>
          ,
          <source>2010 IEEE Workshop on</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          , Nov.
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R. C. C.</given-names>
            <surname>Cheung</surname>
          </string-name>
          , D.-
          <string-name>
            <given-names>U.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Luk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Villasenor</surname>
          </string-name>
          .
          <article-title>Hardware Generation of Arbitrary Random Number Distributions From Uniform Distributions Via the Inversion Method</article-title>
          .
          <source>IEEE Transactions on Very Large Scale Integration (VLSI) Systems</source>
          ,
          <volume>15</volume>
          (
          <issue>8</issue>
          ):
          <fpage>952</fpage>
          -
          <lpage>962</lpage>
          , Aug.
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Cope</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. Y.</given-names>
            <surname>Cheung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Luk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Witt</surname>
          </string-name>
          .
          <article-title>Have GPUs made FPGAs redundant in the field of Video Processing</article-title>
          ? In Field-Programmable
          <string-name>
            <surname>Technology</surname>
          </string-name>
          ,
          <year>2005</year>
          . Proceedings. 2005 IEEE International Conference on, pages
          <fpage>111</fpage>
          -
          <lpage>118</lpage>
          , Dec.
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>C. de Schryver</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Wehn</surname>
            , E. Korn,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Marxen</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Korn</surname>
          </string-name>
          .
          <article-title>A New Hardware Efficient Inversion Based Random Number Generator for Non-Uniform Distributions</article-title>
          .
          <source>In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig)</source>
          , pages
          <fpage>190</fpage>
          -
          <lpage>195</lpage>
          , Dec.
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Giles</surname>
          </string-name>
          .
          <article-title>Multilevel Monte Carlo path simulation</article-title>
          .
          <source>Operations Research-Baltimore</source>
          ,
          <volume>56</volume>
          (
          <issue>3</issue>
          ):
          <fpage>607</fpage>
          -
          <lpage>617</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Haastrecht</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Pelsser</surname>
          </string-name>
          .
          <article-title>Efficient, almost exact simulation of the Heston stochastic volatility model</article-title>
          .
          <source>International Journal of Theoretical and Applied Finance</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Heston</surname>
          </string-name>
          .
          <article-title>A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options</article-title>
          .
          <source>Review of Financial Studies</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <fpage>327</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaganov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chow</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Lakhany. FPGA</surname>
          </string-name>
          <article-title>Acceleration of MonteCarlo based Credit Derivative Pricing</article-title>
          .
          <source>In Proc. Int. Conf. Field Programmable Logic and Applications FPL</source>
          <year>2008</year>
          , pages
          <fpage>329</fpage>
          -
          <lpage>334</lpage>
          , Sept.
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Lord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Koekkoek</surname>
          </string-name>
          , and
          <string-name>
            <surname>D. van Dijk.</surname>
          </string-name>
          <article-title>A comparison of biased simulation schemes for stochastic volatility models</article-title>
          .
          <source>Quantitative Finance</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ):
          <fpage>177</fpage>
          -
          <lpage>194</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Morris</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Aubury</surname>
          </string-name>
          .
          <article-title>Design Space Exploration of the European Option Benchmark using Hyperstreams</article-title>
          .
          <source>In Field Programmable Logic and Applications</source>
          ,
          <year>2007</year>
          . FPL 2007. International Conference on, pages
          <fpage>5</fpage>
          -
          <lpage>10</lpage>
          , Aug.
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>D. B. Thomas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Howes</surname>
            , and
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Luk</surname>
          </string-name>
          .
          <article-title>A Comparison of CPUs, GPUs</article-title>
          , FPGAs, and
          <article-title>Massively Parallel Processor Arrays for Random Number Generation</article-title>
          .
          <article-title>In Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays</article-title>
          ,
          <source>FPGA '09</source>
          , pages
          <fpage>63</fpage>
          -
          <lpage>72</lpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Thomas</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Luk</surname>
          </string-name>
          .
          <article-title>Credit Risk Modelling using Hardware Accelerated Monte-Carlo Simulation</article-title>
          .
          <source>In Proc. 16th Int. Symp. FieldProgrammable Custom Computing Machines FCCM '08</source>
          , pages
          <fpage>229</fpage>
          -
          <lpage>238</lpage>
          , Apr.
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>D. B. Thomas</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Luk</surname>
            ,
            <given-names>P. H.</given-names>
          </string-name>
          <string-name>
            <surname>Leong</surname>
            , and
            <given-names>J. D.</given-names>
          </string-name>
          <string-name>
            <surname>Villasenor</surname>
          </string-name>
          .
          <source>Gaussian Random Number Generators. ACM Comput. Surv.</source>
          ,
          <volume>39</volume>
          (
          <issue>4</issue>
          ):
          <fpage>11</fpage>
          ,
          <string-name>
            <surname>Oct</surname>
          </string-name>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Warren</surname>
          </string-name>
          .
          <article-title>City business races the Games for power</article-title>
          .
          <source>The Guardian</source>
          , May
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Woods</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>VanCourt. FPGA</surname>
          </string-name>
          <article-title>Acceleration of Quasi-Monte Carlo in Finance</article-title>
          .
          <source>In Proc. Int. Conf. Field Programmable Logic and Applications FPL</source>
          <year>2008</year>
          , pages
          <fpage>335</fpage>
          -
          <lpage>340</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wynnyk</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Magdon-Ismail</surname>
          </string-name>
          .
          <article-title>Pricing the American Option using Reconfigurable Hardware</article-title>
          .
          <source>In Computational Science and Engineering</source>
          ,
          <year>2009</year>
          . CSE '09. International Conference on, volume
          <volume>2</volume>
          , pages
          <fpage>532</fpage>
          -
          <lpage>536</lpage>
          , Aug.
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Oosterlee</surname>
          </string-name>
          .
          <source>Acceleration of Option Pricing Technique on Graphics Processing Units</source>
          .
          <source>Technical Report 10-03</source>
          , Delft University of Technology, Feb.
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Shu</surname>
          </string-name>
          .
          <article-title>Pricing s&amp;p 500 index options with heston's model</article-title>
          .
          <source>In Proc. IEEE Int Computational Intelligence for Financial Engineering Conf</source>
          , pages
          <fpage>85</fpage>
          -
          <lpage>92</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>