=Paper= {{Paper |id=None |storemode=property |title=Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking |pdfUrl=https://ceur-ws.org/Vol-750/yrs08.pdf |volume=Vol-750 }} ==Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking== https://ceur-ws.org/Vol-750/yrs08.pdf
 Hardware Accelerators for Financial Mathematics -
     Methodology, Results and Benchmarking
                                 Christian de Schryver # , Henning Marxen ∗ , Daniel Schmidt #
                             #
                                 Micrelectronic Systems Design Department, University of Kaiserslautern
                                     Erwin-Schroedinger-Strasse, 67663 Kaiserslautern, Germany
                                                    schryver@eit.uni-kl.de
                                                     schmidt@eit.uni-kl.de
                       ∗
                           Stochastics and Financial Mathematics Department, University of Kaiserslautern
                                    Erwin-Schroedinger-Strasse, 67663 Kaiserslautern, Germany
                                                 marxen@mathematik.uni-kl.de


   Abstract—Modern financial mathematics consume more and
more computational power and energy. Finding efficient algo-
rithms and implementations to accelerate calculations is therefore
a very active area of research. We show why interdisciplinary
cooperation such as (CM)2 are key in order to build optimal
designs.
   For option pricing based on the state-of-the-art Heston model,
no implementation on dedicated hardware is known, yet. We                     PC                 GPGPU                  FPGA
are currently designing a highly parallel architecture for field
programmable gate arrays based on the multi-level Monte Carlo                         Fig. 1.   PC vs. GPGPU vs. FPGA
method. It is optimized for high throughput and low energy
consumption, compared to GPGPUs. In order to be able to              Moving away from GPGPUs to dedicated hardware acceler-
evaluate different algorithms and their implementations, we          ators can help to drastically reduce the power consumption
present a benchmark set for this application. We will show a very    at the same or even higher throughput. For different applica-
promising outlook on future work, including dedicated ASIPs,         tion domains, some comparisons between CPU, GPGPU and
fixed-point research and real-time applications.
                                                                     programmable hardware units (field programmable gate arrays,
   Index Terms—finance, benchmarking, hardware acceleration
                                                                     FPGAs) have already been shown in [13] and [5], highlighting
                                                                     the enormous potential of energy savings for FPGAs. Figure 1
                      I. I NTRODUCTION
                                                                     shows that standard software implementations require the
   Nowadays, financial markets are as vivid as never before. In      least effort for implementation and can provide the highest
modern electronic markets, stock prices may change several           flexibility, while dedicated hardware solutions on FPGAs are
times within a few milliseconds. Participating traders (that can     hard to design and - once finished - not easy to be changed
also be computers) have to evaluate the prices and react very        again. From a different view, FPGAs can save up to about
quickly in order to get the highest profit, which requires a lot     99% of energy compared to a software implementation on a
of computational effort.                                             standard PC and allow a much higher throughput. GPGPUs
   In general, running these computations on servers or clusters     are located between standard PCs and FPGAs. Between each
with standard CPUs is not feasible due to either long run times      neighboring architectures, one can expect a difference of about
or high energy consumption. Using general purpose graphics           one order of magnitude on average for power consumption and
processing units (GPGPUs) as accelerators helps to increase          throughput [13]. Although most financial institutes are relying
the speed, but still requires a lot of energy. Besides, at the       on GPGPUs at the moment for the reason of standardized
moment energy efficiency becomes more and more crucial for           software development toolkits and their flexibility, FPGAs
the reason of high energy costs and - even more critical - a         are an interesting alternative because of their higher energy
limited supply of energy that can be provided. For example,          efficiency.
in [16] it is stated that the City of London (with its new              A big challenge is the complexity of many models used
financial center Canary Wharf where a lot of leading institutes      to estimate the future price behavior of financial products. In
are located) does not provide additional energy until after          many cases no mathematical closed-form solution exists so
the Olympic winter games in 2012, that have higher priority.         that approximation methods like Monte Carlo simulations or
Financial institutes are currently outsourcing all computing         the finite difference method must be employed. Though, it is
systems not used for pricing computations (such as storage           necessary to precisely specify a solution right at the beginning
or backup) out of the critical area.                                 of the design process. Re-designing a nearly finished hardware
   This leads to the dilemma of needing faster computations on       implementation can require a very high amount of effort. The
the one hand and limited energy resources on the other hand.         Center of Mathematical and Computational Modeling (CM)2
of the University of Kaiserslautern is a perfect forum for an       second stochastic differential equation for stochastic volatility
interdisciplinary cooperation to tackle this issue.                 variations. This significantly increases the complexity of the
   For this project, we have developed a design methodology         calculation and of the implementation thereof. Nevertheless,
that helps to select a feasible parameter set for a hardware ac-    the Heston model reflects the real behavior of current stock
celerator in question that we present in Section III. In order to   markets much better and is nowadays widely accepted in the
make implementations transparently comparable, we propose           financial mathematics community. But - to the best of our
to use standardized benchmark sets - we elaborate on this in        knowledge - no hardware accelerator for that model has been
Section IV. By applying this methodology and our benchmark,         published up to now. For GPGPUs, the first implementations
we have developed some reference implementation designs             have been presented in the last year. Bernemann, Schreyer
that we show in Section V, together with the status quo of our      and Spanderen from the german bank WestLB [3] showed
research and our contributions up to now. In Section VI we          that they could achieve a speeup of 50 times over CPU by
give an overview of open issues and what we plan to examine         using GPGPUs for simulating the Heston model. Zhang and
in the future. Section VII concludes the paper.                     Oosterlee published a technical report [19] in March 2010
                                                                    where they even showed speedups of more than 100 times.
       II. S TATE - OF - THE -A RT AND R ELATED W ORK                  The presented speed-ups look very impressive. However,
   Mathematical finance basically has two different directions.     unfortunately we were not able to fairly decide which solution
One is concerned with the evaluation of optimal investment          seems to be the most promising for further research and
strategies under certain market conditions and the other di-        refinements. We will go a bit more into the details of that
rection is the pricing of derivatives. The basic idea of pricing    problem in Section IV.
options is to assume some sort of model for the underlying
                                                                            III. H OW T O C HOOSE T HE R IGHT D ESIGN
price process and take the discounted expected value - under
a certain measure - of an option as the option price.                  For many fields of applications, finding the most efficient
   A very common problem treated is the calculation of              design under certain constraints is a difficult job. The main
option prices based on the Black-Scholes model from 1973.           reason for this is a large design space. The design space is
This model relies on one stochastic differential equation and       made up of all possible parameter choices for the design, that
describes the price development of an option over the time,         means all possible implementation instances.
depending on market parameters such as riskless interest rate,         Most parameters are not adjustable independently, since
long term drift and a constant volatility.                          they are mutually linked. For example, fixing the target ar-
   Accelerator design for financial mathematics is a very active    chitecture to FPGAs one the one hand has a large impact on
research area, and several FPGA implementations have been           the selection of suitable algorithms and number systems, and
published in the past. At the FPL 2008 Woods and VanCourt           on the other hand affects many performance metrics such as
[17] presented a hardware accelerator for multiple, quasi-          energy consumption, throughput and numerical precision.
random, standard Brownian motions suitable for the accelera-           Furthermore, the parameters within the design space are in
tion of quasi-Monte Carlo simulation of financial derivatives.      many cases not limited to a single domain of expertise, but
For credit risk modelling, Thomas and Luk could gain a              require interdisciplinary know-how and decisions. This makes
speed-up of more than 90 times compared to a 2.4 GHz                not only the choice of the right values a challenge, but also
Pentium-4 Core2 [14]. An accelerator for Monte Carlo based          the evaluation and comparison of different implementations.
credit derivative pricing was developed by Kaganov, Chow            Besides speedup, more characteristics such as energy effi-
and Lakhany [10] in 2008 and showed to be 63 times faster           ciency, convergence rate or numerical precision may be very
than their software model. Wynnyk and Magdon-Ismail [18]            important. This especially holds true for financial mathematics
presented an FPGA accelerator for American option pricing           accelerators.
based on the Black-Scholes model in 2009 and could achieve             During our research we have seen a lot of papers that
a speedup of eleven up to 73 times compared to a software           show elaborate implementations of a specific algorithm (see
implementation running on a standard PC.                            Section II) that is not questioned in the papers anymore.
   However, nowadays the Black-Scholes model is no longer           However, we claim that the algorithm itself is in fact not the
up to date and does not provide an accurate reflection of           most important selection. An accelerator should be designed to
modern financial market behaviors, mostly because of the            solve a specific problem - it does not matter which algorithm is
volatility not being constant in reality. Furthermore, closed-      used, as long as the result is calculated correctly. We therefore
form solutions for the Black-Scholes model exist and it only        propose to distinguish clearly between three terms:
has demonstration purposes to apply stochastic solution meth-          • the problem that is tackled (what to solve)
ods such as Monte Carlo simulations or the finite difference           • the employed model (how to solve)
method [1]. Nevertheless, it is still very common to publish           • the solution (how to build)
accelerator implementations based on that model, at least in           To clarify the situation, we use the problem “calculate the
the electrical engineering community.                               price of an option with two barriers for a given duration” as
   In 1993, Steven L. Heston presented a more accurate model        an example. European knock-out barrier options pay a certain
[9] that extends the model from Black and Scholes by a              amount of money at a fixed maturity time depending on the
                                                                                                                          one of an electrical engineer who may not understand the
                                                                                                                          mathematician’s concerns in detail, but is wondering what the
                                                                                                                          best decisions with respect to hardware efficiency might be.
                                                                                                                                  IV. B ENCHMARKING - FAIRLY C OMPARING
                                                                                                                                            I MPLEMENTATIONS
                                                                                                                             Comparing different implementations is a non-trivial task.
                                                                                                                          Many attributes can be considered, including speed, accuracy
                                                                                                                          and energy consumption. This becomes even more difficult
                                                                                                                          when it is not clear which algorithm was used. Furthermore,
                                                                                                                          in many cases it is not possible to distinct whether a presented
                                                                                                                          algorithm or implementation has the displayed behavior only
                      Fig. 2.        Barrier testing for a Brownian motion                                                with the employed example or in a more general setting.
                                                                                                                          Nevertheless, it is important to be able to compare various
value of the underlying asset. This amount is only paid when                                                              algorithms and different implementations, also over various
the barrier is not crossed up to the maturity time. If one of the                                                         target architectures.
barriers is hit, the option becomes worthless. Thus it needs to                                                              Therefore the need for a benchmark set arises. This set
be checked whether the barrier was ever hit or not. Figure 2                                                              should be independent of the algorithm and implementation
illustrates the typical random behavior of different realizations                                                         used. For option pricing in financial mathematics, this need
of an asset price over the time.                                                                                          has already been claimed by Morris and Aubury in 2007 [12].
   It is obvious that the problem description itself does not                                                             We are not aware of any progress made since that paper was
yet give any suggestions to the solution. Since the price of an                                                           published. We therefore have decided to develop a completely
option is tightly coupled to the price of a certain stock at the                                                          new benchmark that will enable us to fairly compare different
market, we need a model that provides the stock price behavior.                                                           algorithms, e.g. multi-level and single-level Monte Carlo, on
For our chosen example, suitable models are for example                                                                   different hardware. Thus we propose a benchmark based on
the Black-Scholes model (outdated nowadays) or the Heston                                                                 the problem/model combination. In our case it is the pricing
model. The model in general gives a formal and abstract view                                                              of double barrier options in the Heston model. It is clear that
of (a certain aspect of) the problem.                                                                                     independently of the used algorithm and implementation the
   The solution finally is a dedicated approach for solving a                                                             result must be the same. Therefore the final prices of the
(modeled) problem. It is characterized by a specific algorithm                                                            different options in the benchmark set have to be provided.
and its implementation. For evaluating the Heston model,                                                                     With the benchmark set it is possible to use different
for example, finite difference methods or stochastic Monte                                                                metrics, like speed (that is now the real time until the results
Carlo simulations can be used. They may be implemented for                                                                are available), accuracy and power consumption, for the cal-
example on standard PC clusters, GPGPUs or on FPGAs.                                                                      culations leading to the right (or approximate) result without
   The parameters of the design space can basically be divided                                                            actually looking at the implementation and the algorithm itself.
into two groups: the algorithmic parameters that are mostly se-                                                           This allows a fair and publicly traceable comparison of the
lected by mathematicians, and the implementation parameters                                                               solution part of the problem/model/solution triplet.
determined by the hardware designers. However, as mentioned                                                                  The benchmark itself consists of different combinations
before correlations exist between several parameters, so that                                                             of parameters for the Heston model and for barrier options,
the selection should be optimally made by having a generative                                                             including the prices. The data for the Heston parameters is
exchange between experts of both groups.                                                                                  taken from different recent publications ( [2], [11], [20]) and
   For the rest of the paper, we will use the problem-model-                                                              are enlarged by an extremer case. The benchmark parameters
solution triple “calculate the price of an option with two                                                                span a wide range of possible combinations used in this field.
barriers for a given duration with the Heston model by using                                                              For some options of the benchmark closed form solutions
Monte Carlo methods” as a showcase. Even for that specific                                                                exist that allow to obtain the exact results. This is important
selection, the design space is still very large. An extract of                                                            to verify that simulations converge to the correct values and
the related design space is shown in Figure 31 . We cannot                                                                makes it easier to compare the results. For the other cases the
explain every parameter here (for details see [11] [8]), but                                                              exact prices are not known and are therefore provided as close
in a nutshell it is obvious that even for a very specific task                                                            approximations.
a huge amount of possible accelerator implementations may                                                                    For further publications we not only encourage the authors
exist. In Figure 3 we see two different views of the same                                                                 to use the presented benchmark but also give details of the
tree. These trees are symbolic for the design space. The left                                                             algorithm and the implementation used. Thus it is possible
one is the view of a mathematician that has mostly to do                                                                  to see where an increase in performance comes from. This
with algorithmic and numerical aspects. The right view is the                                                             is essential in order to evaluate the contribution of a certain
   1
                                                                                                                          result and to find ways to improve it even further. To achieve
     http://www.sxc.hu/photo/285734 We thank sxc user vxdigital for sharing this image of the oak tree and allowing the
use of it in this paper. He holds all copyrights to this image.                                                           a higher transparency we will publish the code we used to
                                                                                                                                  Path-Serial
          Quasi
                                                                                                                             Path-Parallel




                                MC Simulation
                                Heston Model                                                        Calculation


                     (a) Algorithmic parameters                                                   (b) Implementation parameters

                                                  Fig. 3.   Two views of the same solution tree

analyze the algorithms and the one we implemented on the                  more than one underlying asset. Nevertheless, for our project
FPGA.                                                                     we will stick to one asset.
   The benchmark was directly used when comparing different               A. A New Random Number Generator for Non-Uniform Dis-
Monte Carlo algorithms with the metric of computational                   tributions
complexity. In our special problem/model combination there
are a lot more adjustments to the algorithms than seen in                    Inherently, Monte Carlo simulations always consume a huge
figure 3(a). Many of them can be combined, what leads to                  amount of random numbers. To obtain the maximum hardware
a huge design space that now can be handled by applying                   efficiency for our implementation, we have developed a new
the benchmark set, so that we haven been able to choose a                 random number generator for non-uniform distributions tai-
specific algorithm to be efficiently implemented on dedicated             lored to our application.
hardware. The multi-level Monte Carlo method provides a bet-                 For our option pricing accelerators, we need two inde-
ter asymptotic convergence behavior, using our benchmark we               pendent, normally distributed random numbers for each time
checked whether this method is beneficial for our application.            step of a single simulated stock price path. In general, non-
We will show our first results in the next section.                       uniformly distributed random numbers are generated in two
                                                                          steps: First, a uniform random number generator creates uni-
           V. S TATUS Q UO AND F IRST R ESULTS                            formly distributed values, and in a second step this number is
                                                                          transformed into the desired target distribution.
   We have started this cooperation within (CM)2 about one                   For the uniform random number generation, a lot of research
year ago now. Both participating chairs have experience of                has already been made leading to efficient and well-proven
more than ten years in their respective field of research, so             implementations, such as the Mersenne Twister MT19937 that
that we can profit from a lot of knowledge in the areas of                we use. The three main approaches for obtaining non-uniform
efficient hardware design respectively stochastics and financial          distributions are transformation, rejection, and inversion meth-
mathematics.                                                              ods [15].
   After evaluating the state-of-the-art, we decided to focus on             For FPGAs, inversion methods are the usual way to go.
accelerators for option pricing based on the Heston model - it            They combine many desireable properties: by applying the
seems to be a very promising topic since no implementations               respective inverse cumulative distribution function (ICDF),
(either on GPGPUs or FPGAs) have been available one year                  they transform every input sample x ∈ (0, 1) from a uniform
ago. In contrast to that, the Heston model is already widely              distribution to one output sample y = icdf (x) of the desired
spread within the financial community. From former research               output distribution by using piecewise polynomial approxima-
done in the group of Prof. Korn, multi-level Monte Carlo                  tion of the ICDF. The works of Woods and VanCourt [17] and
methods [7] seemed to provide a better convergence behavior               Cheung et al. [4] show FPGA implementations of the inversion
than standard single-level Monte Carlo or finite difference               method.
methods. Monte Carlo methods also have the advantage of                      However, both implementations use fixed-point number
being very flexible. A barrier that is only relevant on a certain         representations at the input. This leads to a loss of precision
time interval to evaluate an option price for example can be              in the tail regions where the probability of a value lying there
easily implemented. Furthermore, multi-dimensional problems               is very low. But these extreme events can have a large impact,
can also be solved. This is needed in the case that an option has         for example for options with barriers it is crucial to know if a
barrier was hit or not, since it completely changes the refund              values for a higher step width, this means for a lower Monte
conditions. We have therefore developed a new implementa-                   Carlo simulation level. In the configuration memory, all model
tion based on floating-point representation that provides the               parameters are stored.
same precision over the whole ICDF implementation at much                      Due to the inherent parallelism of Monte Carlo simulations,
lower hardware costs. This work has been presented at the                   it is not only feasible but self-evident to instantiate as many
2010 International Conference on ReConFigurable Computing                   of these circuits as possible on an FPGA in order to increase
and FPGAs (ReConFig) in December in Cancún, Mexico [6].                    the simulation throughput.
Our random number converter unit requires only about half                      The accelerator is implemented on a Xilinx ML-605 evalula-
of the area compared to other state-of-the-art implementations              tion board equipped with a Xilinx Virtex-6 FPGA. The board
by even higher numerical precision.                                         is connected via Gigabit ethernet to a host PC running the
   To validate our work, it was crucial to develop a new testing            user interface and the program that calculates and sets the
methodology, since standardized test suites do not exist for                configuration values for the accelerator based on the retrieved
non-uniform distributions. This work has been carried out in                simulation results.
cooperation with Elke Korn who has a lot of knowledge in the                   Synthesis and benchmarking results will be available soon.
field of random numbers. The methodology and the validation                 We are currently supporting single- and double-precision
results for our implementation are also presented in the paper.             floating-point computations and are working on a fixed-point
Our random numbers did not show any noticeable problems                     implementation as well.
in the stochastic tests and also perfectly passed two different
application simulations.                                                                VI. O UTLOOK AND F UTURE W ORK
                                                                               Also for the intended future work a close cooperation
B. Fully Parallel Hardware Accelerator
                                                                            between financial mathematics and electrical engineering will
   To the best of our knowlege, no hardware implementations                 be mandatory, since we are planning to research aspects out
of option price accelerators based on the Heston model exist                of both fields.
at the moment. We have therefore started with the first imple-                 One characteristic of the Monte Carlo method is the inherent
mentation, that is nearly finished now. The hardware is fully-              capability to parallelize the calculation. It therefore makes no
parallel, fully-pipelined and designed for high throughput.                 difference whether several calculations are done in parallel but
                                                                            slowly or only one calculation is done at high speed, as long
    Uniform Random          Uniform Random                                  as the number of calculations remains equal altogether. At the
   Number Generator        Number Generator                                 moment, our parallel implementation allows simulating one
                                                           User Interface
                                                                            time step on many assets simultaneously. The whole procedure
     Floating-Point          Floating-Point                                 of calculating one time step is fixed on the FPGA and limited
   Gaussian Converter      Gaussian Converter
                                                                            to a single algorithm that is set at design time.
      price        variations         volatility                               One possible way to improve this is to sequentially com-
              Heston Step Generator
                       Unit                                 Multi-Level     pute the basic calculations needed for one time step on a
                                                            Monte Carlo     Application-Specific Instruction-Set Processor (ASIP) within
                                                             Controller
                                Configuration
                                                                            the FPGA, i.e. it is runtime-programmable. This procedure
   Temporary      Coarse          Memory                                    can reduce the required area and allows to calculate various
     Values        Step
    Memory         Unit                                                     algorithms since the functionality is defined in a corresponding
                                                                            program. It is therefore sufficient to load a different program
                                   Ethernet                   Ethernet
                                   Interface                   Stack        without changing the hardware. The ASIP will occupy much
                                                                            less area than the parallel implementation presented in Sec-
                   FPGA Board                                 Host PC       tion V-B, therefore many ASIPs can be instantiated in parallel.
                                                                            We are currently investigating the necessary instruction set.
                Fig. 4.    Fully Parallel Accelerator Structure                To increase the speed of the implementation even more,
                                                                            the floating-point computation can be replaced by fixed point
   Figure 4 shows the structure of one pipelined accelerator                computations. In order to do so, errors resulting from the use
circuit. In each clock cycle, our unit consumes two normally                of fixed-point calculations have to be approximated. This task
distributed random numbers, one for the stock price variation               will also require both theoretical and practical expertise.
and one for the volatility variation. The Heston step generator                Besides working on the implementation, the benchmark
unit calculates the price and volatility values for the next                is very important to evaluate the designs. It will also allow
time step based on a multi-level Monte Carlo algorithm. The                 to research fixed-point solutions with respect to necessary
pipeline depth is about 60 stages. In order to maximally                    precision. Further steps will be to publish the analysis of the
utilize the pipelined hardware, it computates one time step                 algorithms with the benchmark in a journal. To increase the
for 60 assets in parallel, before moving to the next time                   transparency even more, we are currently setting up a web
step. The values for the respective time steps are stored in                site offering all the program code used to create the analysis.
a memory temporarily. The coarse step memory holds interim                  It will contain an implementation of the multi-level and the
crude Monte Carlo algorithm with a focus on the calculation                                 ACKNOWLEDGEMENTS
complexity rather than the implementation.                             We gratefully acknowledge the partial financial support
   To provide more benchmarking results, also for different         from Center of Mathematical and Computational Modeling
architectures, we are working on a GPGPU implementation.            (CM)2 of the University of Kaiserslautern.
It provides more flexibility and less implementation work than
hardware designs do, but also requires higher energy consump-                                     R EFERENCES
tion. In order to verify this assumption the implementation is       [1] R. L. Akers, I. Bica, E. Kant, C. Randall, and R. L. Young. SciFinance: A
required. The Monte Carlo simulation algorithm for the Heston            Program Synthesis Tool for Financial Modeling. AI Magazine, 22(2):27,
                                                                         2001.
model is currently carried out as a student work.                    [2] L. Andersen. Efficient simulation of the heston stochastic volatility
   Moreover, we are currently researching on real-time accel-            model. 2006.
eration of financial calculations. This means that hardware or       [3] A. Bernemann, R. Schreyer, and K. Spanderen. Pricing Structured
                                                                         Equity Products on GPUs. In High Performance Computational Finance
GPGPU accelerators are linked to real-time data streams. This            (WHPCF), 2010 IEEE Workshop on, pages 1 –7, Nov. 2010.
approach seems to be very promising for keeping track with           [4] R. C. C. Cheung, D.-U. Lee, W. Luk, and J. D. Villasenor. Hardware
the prices changing quickly in high-frequency trading.                   Generation of Arbitrary Random Number Distributions From Uniform
                                                                         Distributions Via the Inversion Method. IEEE Transactions on Very
                                                                         Large Scale Integration (VLSI) Systems, 15(8):952–962, Aug. 2007.
                      VII. C ONCLUSION                               [5] B. Cope, P. Y. Cheung, W. Luk, and S. Witt. Have GPUs made FPGAs
   The financial world is running faster and faster and the              redundant in the field of Video Processing? In Field-Programmable
                                                                         Technology, 2005. Proceedings. 2005 IEEE International Conference
importance of energy consumption increases drastically. To               on, pages 111 –118, Dec. 2005.
address this challenge the question of pricing double barrier        [6] C. de Schryver, D. Schmidt, N. Wehn, E. Korn, H. Marxen, and R. Korn.
options in the Heston setting is faced. As the model is more             A New Hardware Efficient Inversion Based Random Number Generator
                                                                         for Non-Uniform Distributions. In Proceedings of the International
complex than the famous Black-Scholes model and these                    Conference on ReConFigurable Computing and FPGAs (ReConFig),
types of options are path dependent, the algorithms for the              pages 190–195, Dec. 2010.
calculations are more distinct and also the implementation           [7] M. B. Giles. Multilevel Monte Carlo path simulation. Operations
                                                                         Research-Baltimore, 56(3):607–617, 2008.
thereof. In order to be able to cope with the strong connection      [8] A. V. Haastrecht and A. Pelsser. Efficient, almost exact simulation of the
between the algorithm and the implementation, a combined                 Heston stochastic volatility model. International Journal of Theoretical
mathematical and electrical engineering view is needed. (CM)2            and Applied Finance, 13(1):1–43, 2010.
                                                                     [9] S. L. Heston. A Closed-Form Solution for Options with Stochastic
provides a perfect framework to do so.                                   Volatility with Applications to Bond and Currency Options. Review of
   To approximate the pricing process Monte Carlo simulations            Financial Studies, 6(2):327, 1993.
are used. For a good implementation a fast algorithm with           [10] A. Kaganov, P. Chow, and A. Lakhany. FPGA Acceleration of Monte-
                                                                         Carlo based Credit Derivative Pricing. In Proc. Int. Conf. Field
an adjusted implementation thereof is needed. In order to                Programmable Logic and Applications FPL 2008, pages 329–334, Sept.
distinguish the different algorithms we have created a bench-            2008.
mark set for double barrier options. This benchmark allows to       [11] R. Lord, R. Koekkoek, and D. van Dijk. A comparison of biased sim-
                                                                         ulation schemes for stochastic volatility models. Quantitative Finance,
fairly analyze and compare the diverse algorithms and designs,           10(2):177–194, 2010.
which is a very important issue due to the big differences in       [12] G. W. Morris and M. Aubury. Design Space Exploration of the European
the convergence speed of these algorithms.                               Option Benchmark using Hyperstreams. In Field Programmable Logic
                                                                         and Applications, 2007. FPL 2007. International Conference on, pages
   For the target architecture, using FPGAs is the hardware of           5 –10, Aug. 2007.
choice if implementation time is not considered. It allows fast     [13] D. B. Thomas, L. Howes, and W. Luk. A Comparison of CPUs, GPUs,
computation with low energy consumption. Nevertheless, op-               FPGAs, and Massively Parallel Processor Arrays for Random Number
                                                                         Generation. In Proceeding of the ACM/SIGDA international symposium
timal FPGA designs require deep understanding of the FPGA                on Field programmable gate arrays, FPGA ’09, pages 63–72, New York,
characteristics and the calculations need to be optimized for it.        NY, USA, 2009. ACM.
One commonality of all the Monte Carlo algorithms is the use        [14] D. B. Thomas and W. Luk. Credit Risk Modelling using Hardware
                                                                         Accelerated Monte-Carlo Simulation. In Proc. 16th Int. Symp. Field-
of (pseudo-)random numbers. In the Heston setting standard               Programmable Custom Computing Machines FCCM ’08, pages 229–
normal random numbers are used. There are procedures to                  238, Apr. 2008.
create these running efficiently on GPGPUs and CPUs. A              [15] D. B. Thomas, W. Luk, P. H. Leong, and J. D. Villasenor. Gaussian
                                                                         Random Number Generators. ACM Comput. Surv., 39(4):11, Oct. 2007.
new method was presented which is very efficient for an             [16] P. Warren. City business races the Games for power. The Guardian,
implementation on a FPGA.                                                May 2008.
   The detailed analysis of the diverse algorithms was used to      [17] N. A. Woods and T. VanCourt. FPGA Acceleration of Quasi-Monte
                                                                         Carlo in Finance. In Proc. Int. Conf. Field Programmable Logic and
make an efficient implementation. Therefore the algorithm is             Applications FPL 2008, pages 335–340, 2008.
implemented on an FPGA. This should allow a fast compu-             [18] C. Wynnyk and M. Magdon-Ismail. Pricing the American Option using
tation with low energy consumption. As far as we know, this              Reconfigurable Hardware. In Computational Science and Engineering,
                                                                         2009. CSE ’09. International Conference on, volume 2, pages 532 –536,
will be the first implementation of a Monte Carlo simulation in          Aug. 2009.
the Heston model on a FPGA. Furthermore it will be the first        [19] B. Zhang and C. W. Oosterlee. Acceleration of Option Pricing Technique
implementation of the multi-level Monte Carlo method on this             on Graphics Processing Units. Technical Report 10-03, Delft University
                                                                         of Technology, Feb. 2010.
hardware. Thus this work expands the field of implementations       [20] J. E. Zhang and J. Shu. Pricing s&p 500 index options with heston’s
of financial mathematical problems on dedicated hardware in              model. In Proc. IEEE Int Computational Intelligence for Financial
several ways as new concepts are taken into consideration.               Engineering Conf, pages 85–92, 2003.