=Paper=
{{Paper
|id=None
|storemode=property
|title=Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking
|pdfUrl=https://ceur-ws.org/Vol-750/yrs08.pdf
|volume=Vol-750
}}
==Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking==
Hardware Accelerators for Financial Mathematics -
Methodology, Results and Benchmarking
Christian de Schryver # , Henning Marxen ∗ , Daniel Schmidt #
#
Micrelectronic Systems Design Department, University of Kaiserslautern
Erwin-Schroedinger-Strasse, 67663 Kaiserslautern, Germany
schryver@eit.uni-kl.de
schmidt@eit.uni-kl.de
∗
Stochastics and Financial Mathematics Department, University of Kaiserslautern
Erwin-Schroedinger-Strasse, 67663 Kaiserslautern, Germany
marxen@mathematik.uni-kl.de
Abstract—Modern financial mathematics consume more and
more computational power and energy. Finding efficient algo-
rithms and implementations to accelerate calculations is therefore
a very active area of research. We show why interdisciplinary
cooperation such as (CM)2 are key in order to build optimal
designs.
For option pricing based on the state-of-the-art Heston model,
no implementation on dedicated hardware is known, yet. We PC GPGPU FPGA
are currently designing a highly parallel architecture for field
programmable gate arrays based on the multi-level Monte Carlo Fig. 1. PC vs. GPGPU vs. FPGA
method. It is optimized for high throughput and low energy
consumption, compared to GPGPUs. In order to be able to Moving away from GPGPUs to dedicated hardware acceler-
evaluate different algorithms and their implementations, we ators can help to drastically reduce the power consumption
present a benchmark set for this application. We will show a very at the same or even higher throughput. For different applica-
promising outlook on future work, including dedicated ASIPs, tion domains, some comparisons between CPU, GPGPU and
fixed-point research and real-time applications.
programmable hardware units (field programmable gate arrays,
Index Terms—finance, benchmarking, hardware acceleration
FPGAs) have already been shown in [13] and [5], highlighting
the enormous potential of energy savings for FPGAs. Figure 1
I. I NTRODUCTION
shows that standard software implementations require the
Nowadays, financial markets are as vivid as never before. In least effort for implementation and can provide the highest
modern electronic markets, stock prices may change several flexibility, while dedicated hardware solutions on FPGAs are
times within a few milliseconds. Participating traders (that can hard to design and - once finished - not easy to be changed
also be computers) have to evaluate the prices and react very again. From a different view, FPGAs can save up to about
quickly in order to get the highest profit, which requires a lot 99% of energy compared to a software implementation on a
of computational effort. standard PC and allow a much higher throughput. GPGPUs
In general, running these computations on servers or clusters are located between standard PCs and FPGAs. Between each
with standard CPUs is not feasible due to either long run times neighboring architectures, one can expect a difference of about
or high energy consumption. Using general purpose graphics one order of magnitude on average for power consumption and
processing units (GPGPUs) as accelerators helps to increase throughput [13]. Although most financial institutes are relying
the speed, but still requires a lot of energy. Besides, at the on GPGPUs at the moment for the reason of standardized
moment energy efficiency becomes more and more crucial for software development toolkits and their flexibility, FPGAs
the reason of high energy costs and - even more critical - a are an interesting alternative because of their higher energy
limited supply of energy that can be provided. For example, efficiency.
in [16] it is stated that the City of London (with its new A big challenge is the complexity of many models used
financial center Canary Wharf where a lot of leading institutes to estimate the future price behavior of financial products. In
are located) does not provide additional energy until after many cases no mathematical closed-form solution exists so
the Olympic winter games in 2012, that have higher priority. that approximation methods like Monte Carlo simulations or
Financial institutes are currently outsourcing all computing the finite difference method must be employed. Though, it is
systems not used for pricing computations (such as storage necessary to precisely specify a solution right at the beginning
or backup) out of the critical area. of the design process. Re-designing a nearly finished hardware
This leads to the dilemma of needing faster computations on implementation can require a very high amount of effort. The
the one hand and limited energy resources on the other hand. Center of Mathematical and Computational Modeling (CM)2
of the University of Kaiserslautern is a perfect forum for an second stochastic differential equation for stochastic volatility
interdisciplinary cooperation to tackle this issue. variations. This significantly increases the complexity of the
For this project, we have developed a design methodology calculation and of the implementation thereof. Nevertheless,
that helps to select a feasible parameter set for a hardware ac- the Heston model reflects the real behavior of current stock
celerator in question that we present in Section III. In order to markets much better and is nowadays widely accepted in the
make implementations transparently comparable, we propose financial mathematics community. But - to the best of our
to use standardized benchmark sets - we elaborate on this in knowledge - no hardware accelerator for that model has been
Section IV. By applying this methodology and our benchmark, published up to now. For GPGPUs, the first implementations
we have developed some reference implementation designs have been presented in the last year. Bernemann, Schreyer
that we show in Section V, together with the status quo of our and Spanderen from the german bank WestLB [3] showed
research and our contributions up to now. In Section VI we that they could achieve a speeup of 50 times over CPU by
give an overview of open issues and what we plan to examine using GPGPUs for simulating the Heston model. Zhang and
in the future. Section VII concludes the paper. Oosterlee published a technical report [19] in March 2010
where they even showed speedups of more than 100 times.
II. S TATE - OF - THE -A RT AND R ELATED W ORK The presented speed-ups look very impressive. However,
Mathematical finance basically has two different directions. unfortunately we were not able to fairly decide which solution
One is concerned with the evaluation of optimal investment seems to be the most promising for further research and
strategies under certain market conditions and the other di- refinements. We will go a bit more into the details of that
rection is the pricing of derivatives. The basic idea of pricing problem in Section IV.
options is to assume some sort of model for the underlying
III. H OW T O C HOOSE T HE R IGHT D ESIGN
price process and take the discounted expected value - under
a certain measure - of an option as the option price. For many fields of applications, finding the most efficient
A very common problem treated is the calculation of design under certain constraints is a difficult job. The main
option prices based on the Black-Scholes model from 1973. reason for this is a large design space. The design space is
This model relies on one stochastic differential equation and made up of all possible parameter choices for the design, that
describes the price development of an option over the time, means all possible implementation instances.
depending on market parameters such as riskless interest rate, Most parameters are not adjustable independently, since
long term drift and a constant volatility. they are mutually linked. For example, fixing the target ar-
Accelerator design for financial mathematics is a very active chitecture to FPGAs one the one hand has a large impact on
research area, and several FPGA implementations have been the selection of suitable algorithms and number systems, and
published in the past. At the FPL 2008 Woods and VanCourt on the other hand affects many performance metrics such as
[17] presented a hardware accelerator for multiple, quasi- energy consumption, throughput and numerical precision.
random, standard Brownian motions suitable for the accelera- Furthermore, the parameters within the design space are in
tion of quasi-Monte Carlo simulation of financial derivatives. many cases not limited to a single domain of expertise, but
For credit risk modelling, Thomas and Luk could gain a require interdisciplinary know-how and decisions. This makes
speed-up of more than 90 times compared to a 2.4 GHz not only the choice of the right values a challenge, but also
Pentium-4 Core2 [14]. An accelerator for Monte Carlo based the evaluation and comparison of different implementations.
credit derivative pricing was developed by Kaganov, Chow Besides speedup, more characteristics such as energy effi-
and Lakhany [10] in 2008 and showed to be 63 times faster ciency, convergence rate or numerical precision may be very
than their software model. Wynnyk and Magdon-Ismail [18] important. This especially holds true for financial mathematics
presented an FPGA accelerator for American option pricing accelerators.
based on the Black-Scholes model in 2009 and could achieve During our research we have seen a lot of papers that
a speedup of eleven up to 73 times compared to a software show elaborate implementations of a specific algorithm (see
implementation running on a standard PC. Section II) that is not questioned in the papers anymore.
However, nowadays the Black-Scholes model is no longer However, we claim that the algorithm itself is in fact not the
up to date and does not provide an accurate reflection of most important selection. An accelerator should be designed to
modern financial market behaviors, mostly because of the solve a specific problem - it does not matter which algorithm is
volatility not being constant in reality. Furthermore, closed- used, as long as the result is calculated correctly. We therefore
form solutions for the Black-Scholes model exist and it only propose to distinguish clearly between three terms:
has demonstration purposes to apply stochastic solution meth- • the problem that is tackled (what to solve)
ods such as Monte Carlo simulations or the finite difference • the employed model (how to solve)
method [1]. Nevertheless, it is still very common to publish • the solution (how to build)
accelerator implementations based on that model, at least in To clarify the situation, we use the problem “calculate the
the electrical engineering community. price of an option with two barriers for a given duration” as
In 1993, Steven L. Heston presented a more accurate model an example. European knock-out barrier options pay a certain
[9] that extends the model from Black and Scholes by a amount of money at a fixed maturity time depending on the
one of an electrical engineer who may not understand the
mathematician’s concerns in detail, but is wondering what the
best decisions with respect to hardware efficiency might be.
IV. B ENCHMARKING - FAIRLY C OMPARING
I MPLEMENTATIONS
Comparing different implementations is a non-trivial task.
Many attributes can be considered, including speed, accuracy
and energy consumption. This becomes even more difficult
when it is not clear which algorithm was used. Furthermore,
in many cases it is not possible to distinct whether a presented
algorithm or implementation has the displayed behavior only
Fig. 2. Barrier testing for a Brownian motion with the employed example or in a more general setting.
Nevertheless, it is important to be able to compare various
value of the underlying asset. This amount is only paid when algorithms and different implementations, also over various
the barrier is not crossed up to the maturity time. If one of the target architectures.
barriers is hit, the option becomes worthless. Thus it needs to Therefore the need for a benchmark set arises. This set
be checked whether the barrier was ever hit or not. Figure 2 should be independent of the algorithm and implementation
illustrates the typical random behavior of different realizations used. For option pricing in financial mathematics, this need
of an asset price over the time. has already been claimed by Morris and Aubury in 2007 [12].
It is obvious that the problem description itself does not We are not aware of any progress made since that paper was
yet give any suggestions to the solution. Since the price of an published. We therefore have decided to develop a completely
option is tightly coupled to the price of a certain stock at the new benchmark that will enable us to fairly compare different
market, we need a model that provides the stock price behavior. algorithms, e.g. multi-level and single-level Monte Carlo, on
For our chosen example, suitable models are for example different hardware. Thus we propose a benchmark based on
the Black-Scholes model (outdated nowadays) or the Heston the problem/model combination. In our case it is the pricing
model. The model in general gives a formal and abstract view of double barrier options in the Heston model. It is clear that
of (a certain aspect of) the problem. independently of the used algorithm and implementation the
The solution finally is a dedicated approach for solving a result must be the same. Therefore the final prices of the
(modeled) problem. It is characterized by a specific algorithm different options in the benchmark set have to be provided.
and its implementation. For evaluating the Heston model, With the benchmark set it is possible to use different
for example, finite difference methods or stochastic Monte metrics, like speed (that is now the real time until the results
Carlo simulations can be used. They may be implemented for are available), accuracy and power consumption, for the cal-
example on standard PC clusters, GPGPUs or on FPGAs. culations leading to the right (or approximate) result without
The parameters of the design space can basically be divided actually looking at the implementation and the algorithm itself.
into two groups: the algorithmic parameters that are mostly se- This allows a fair and publicly traceable comparison of the
lected by mathematicians, and the implementation parameters solution part of the problem/model/solution triplet.
determined by the hardware designers. However, as mentioned The benchmark itself consists of different combinations
before correlations exist between several parameters, so that of parameters for the Heston model and for barrier options,
the selection should be optimally made by having a generative including the prices. The data for the Heston parameters is
exchange between experts of both groups. taken from different recent publications ( [2], [11], [20]) and
For the rest of the paper, we will use the problem-model- are enlarged by an extremer case. The benchmark parameters
solution triple “calculate the price of an option with two span a wide range of possible combinations used in this field.
barriers for a given duration with the Heston model by using For some options of the benchmark closed form solutions
Monte Carlo methods” as a showcase. Even for that specific exist that allow to obtain the exact results. This is important
selection, the design space is still very large. An extract of to verify that simulations converge to the correct values and
the related design space is shown in Figure 31 . We cannot makes it easier to compare the results. For the other cases the
explain every parameter here (for details see [11] [8]), but exact prices are not known and are therefore provided as close
in a nutshell it is obvious that even for a very specific task approximations.
a huge amount of possible accelerator implementations may For further publications we not only encourage the authors
exist. In Figure 3 we see two different views of the same to use the presented benchmark but also give details of the
tree. These trees are symbolic for the design space. The left algorithm and the implementation used. Thus it is possible
one is the view of a mathematician that has mostly to do to see where an increase in performance comes from. This
with algorithmic and numerical aspects. The right view is the is essential in order to evaluate the contribution of a certain
1
result and to find ways to improve it even further. To achieve
http://www.sxc.hu/photo/285734 We thank sxc user vxdigital for sharing this image of the oak tree and allowing the
use of it in this paper. He holds all copyrights to this image. a higher transparency we will publish the code we used to
Path-Serial
Quasi
Path-Parallel
MC Simulation
Heston Model Calculation
(a) Algorithmic parameters (b) Implementation parameters
Fig. 3. Two views of the same solution tree
analyze the algorithms and the one we implemented on the more than one underlying asset. Nevertheless, for our project
FPGA. we will stick to one asset.
The benchmark was directly used when comparing different A. A New Random Number Generator for Non-Uniform Dis-
Monte Carlo algorithms with the metric of computational tributions
complexity. In our special problem/model combination there
are a lot more adjustments to the algorithms than seen in Inherently, Monte Carlo simulations always consume a huge
figure 3(a). Many of them can be combined, what leads to amount of random numbers. To obtain the maximum hardware
a huge design space that now can be handled by applying efficiency for our implementation, we have developed a new
the benchmark set, so that we haven been able to choose a random number generator for non-uniform distributions tai-
specific algorithm to be efficiently implemented on dedicated lored to our application.
hardware. The multi-level Monte Carlo method provides a bet- For our option pricing accelerators, we need two inde-
ter asymptotic convergence behavior, using our benchmark we pendent, normally distributed random numbers for each time
checked whether this method is beneficial for our application. step of a single simulated stock price path. In general, non-
We will show our first results in the next section. uniformly distributed random numbers are generated in two
steps: First, a uniform random number generator creates uni-
V. S TATUS Q UO AND F IRST R ESULTS formly distributed values, and in a second step this number is
transformed into the desired target distribution.
We have started this cooperation within (CM)2 about one For the uniform random number generation, a lot of research
year ago now. Both participating chairs have experience of has already been made leading to efficient and well-proven
more than ten years in their respective field of research, so implementations, such as the Mersenne Twister MT19937 that
that we can profit from a lot of knowledge in the areas of we use. The three main approaches for obtaining non-uniform
efficient hardware design respectively stochastics and financial distributions are transformation, rejection, and inversion meth-
mathematics. ods [15].
After evaluating the state-of-the-art, we decided to focus on For FPGAs, inversion methods are the usual way to go.
accelerators for option pricing based on the Heston model - it They combine many desireable properties: by applying the
seems to be a very promising topic since no implementations respective inverse cumulative distribution function (ICDF),
(either on GPGPUs or FPGAs) have been available one year they transform every input sample x ∈ (0, 1) from a uniform
ago. In contrast to that, the Heston model is already widely distribution to one output sample y = icdf (x) of the desired
spread within the financial community. From former research output distribution by using piecewise polynomial approxima-
done in the group of Prof. Korn, multi-level Monte Carlo tion of the ICDF. The works of Woods and VanCourt [17] and
methods [7] seemed to provide a better convergence behavior Cheung et al. [4] show FPGA implementations of the inversion
than standard single-level Monte Carlo or finite difference method.
methods. Monte Carlo methods also have the advantage of However, both implementations use fixed-point number
being very flexible. A barrier that is only relevant on a certain representations at the input. This leads to a loss of precision
time interval to evaluate an option price for example can be in the tail regions where the probability of a value lying there
easily implemented. Furthermore, multi-dimensional problems is very low. But these extreme events can have a large impact,
can also be solved. This is needed in the case that an option has for example for options with barriers it is crucial to know if a
barrier was hit or not, since it completely changes the refund values for a higher step width, this means for a lower Monte
conditions. We have therefore developed a new implementa- Carlo simulation level. In the configuration memory, all model
tion based on floating-point representation that provides the parameters are stored.
same precision over the whole ICDF implementation at much Due to the inherent parallelism of Monte Carlo simulations,
lower hardware costs. This work has been presented at the it is not only feasible but self-evident to instantiate as many
2010 International Conference on ReConFigurable Computing of these circuits as possible on an FPGA in order to increase
and FPGAs (ReConFig) in December in Cancún, Mexico [6]. the simulation throughput.
Our random number converter unit requires only about half The accelerator is implemented on a Xilinx ML-605 evalula-
of the area compared to other state-of-the-art implementations tion board equipped with a Xilinx Virtex-6 FPGA. The board
by even higher numerical precision. is connected via Gigabit ethernet to a host PC running the
To validate our work, it was crucial to develop a new testing user interface and the program that calculates and sets the
methodology, since standardized test suites do not exist for configuration values for the accelerator based on the retrieved
non-uniform distributions. This work has been carried out in simulation results.
cooperation with Elke Korn who has a lot of knowledge in the Synthesis and benchmarking results will be available soon.
field of random numbers. The methodology and the validation We are currently supporting single- and double-precision
results for our implementation are also presented in the paper. floating-point computations and are working on a fixed-point
Our random numbers did not show any noticeable problems implementation as well.
in the stochastic tests and also perfectly passed two different
application simulations. VI. O UTLOOK AND F UTURE W ORK
Also for the intended future work a close cooperation
B. Fully Parallel Hardware Accelerator
between financial mathematics and electrical engineering will
To the best of our knowlege, no hardware implementations be mandatory, since we are planning to research aspects out
of option price accelerators based on the Heston model exist of both fields.
at the moment. We have therefore started with the first imple- One characteristic of the Monte Carlo method is the inherent
mentation, that is nearly finished now. The hardware is fully- capability to parallelize the calculation. It therefore makes no
parallel, fully-pipelined and designed for high throughput. difference whether several calculations are done in parallel but
slowly or only one calculation is done at high speed, as long
Uniform Random Uniform Random as the number of calculations remains equal altogether. At the
Number Generator Number Generator moment, our parallel implementation allows simulating one
User Interface
time step on many assets simultaneously. The whole procedure
Floating-Point Floating-Point of calculating one time step is fixed on the FPGA and limited
Gaussian Converter Gaussian Converter
to a single algorithm that is set at design time.
price variations volatility One possible way to improve this is to sequentially com-
Heston Step Generator
Unit Multi-Level pute the basic calculations needed for one time step on a
Monte Carlo Application-Specific Instruction-Set Processor (ASIP) within
Controller
Configuration
the FPGA, i.e. it is runtime-programmable. This procedure
Temporary Coarse Memory can reduce the required area and allows to calculate various
Values Step
Memory Unit algorithms since the functionality is defined in a corresponding
program. It is therefore sufficient to load a different program
Ethernet Ethernet
Interface Stack without changing the hardware. The ASIP will occupy much
less area than the parallel implementation presented in Sec-
FPGA Board Host PC tion V-B, therefore many ASIPs can be instantiated in parallel.
We are currently investigating the necessary instruction set.
Fig. 4. Fully Parallel Accelerator Structure To increase the speed of the implementation even more,
the floating-point computation can be replaced by fixed point
Figure 4 shows the structure of one pipelined accelerator computations. In order to do so, errors resulting from the use
circuit. In each clock cycle, our unit consumes two normally of fixed-point calculations have to be approximated. This task
distributed random numbers, one for the stock price variation will also require both theoretical and practical expertise.
and one for the volatility variation. The Heston step generator Besides working on the implementation, the benchmark
unit calculates the price and volatility values for the next is very important to evaluate the designs. It will also allow
time step based on a multi-level Monte Carlo algorithm. The to research fixed-point solutions with respect to necessary
pipeline depth is about 60 stages. In order to maximally precision. Further steps will be to publish the analysis of the
utilize the pipelined hardware, it computates one time step algorithms with the benchmark in a journal. To increase the
for 60 assets in parallel, before moving to the next time transparency even more, we are currently setting up a web
step. The values for the respective time steps are stored in site offering all the program code used to create the analysis.
a memory temporarily. The coarse step memory holds interim It will contain an implementation of the multi-level and the
crude Monte Carlo algorithm with a focus on the calculation ACKNOWLEDGEMENTS
complexity rather than the implementation. We gratefully acknowledge the partial financial support
To provide more benchmarking results, also for different from Center of Mathematical and Computational Modeling
architectures, we are working on a GPGPU implementation. (CM)2 of the University of Kaiserslautern.
It provides more flexibility and less implementation work than
hardware designs do, but also requires higher energy consump- R EFERENCES
tion. In order to verify this assumption the implementation is [1] R. L. Akers, I. Bica, E. Kant, C. Randall, and R. L. Young. SciFinance: A
required. The Monte Carlo simulation algorithm for the Heston Program Synthesis Tool for Financial Modeling. AI Magazine, 22(2):27,
2001.
model is currently carried out as a student work. [2] L. Andersen. Efficient simulation of the heston stochastic volatility
Moreover, we are currently researching on real-time accel- model. 2006.
eration of financial calculations. This means that hardware or [3] A. Bernemann, R. Schreyer, and K. Spanderen. Pricing Structured
Equity Products on GPUs. In High Performance Computational Finance
GPGPU accelerators are linked to real-time data streams. This (WHPCF), 2010 IEEE Workshop on, pages 1 –7, Nov. 2010.
approach seems to be very promising for keeping track with [4] R. C. C. Cheung, D.-U. Lee, W. Luk, and J. D. Villasenor. Hardware
the prices changing quickly in high-frequency trading. Generation of Arbitrary Random Number Distributions From Uniform
Distributions Via the Inversion Method. IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, 15(8):952–962, Aug. 2007.
VII. C ONCLUSION [5] B. Cope, P. Y. Cheung, W. Luk, and S. Witt. Have GPUs made FPGAs
The financial world is running faster and faster and the redundant in the field of Video Processing? In Field-Programmable
Technology, 2005. Proceedings. 2005 IEEE International Conference
importance of energy consumption increases drastically. To on, pages 111 –118, Dec. 2005.
address this challenge the question of pricing double barrier [6] C. de Schryver, D. Schmidt, N. Wehn, E. Korn, H. Marxen, and R. Korn.
options in the Heston setting is faced. As the model is more A New Hardware Efficient Inversion Based Random Number Generator
for Non-Uniform Distributions. In Proceedings of the International
complex than the famous Black-Scholes model and these Conference on ReConFigurable Computing and FPGAs (ReConFig),
types of options are path dependent, the algorithms for the pages 190–195, Dec. 2010.
calculations are more distinct and also the implementation [7] M. B. Giles. Multilevel Monte Carlo path simulation. Operations
Research-Baltimore, 56(3):607–617, 2008.
thereof. In order to be able to cope with the strong connection [8] A. V. Haastrecht and A. Pelsser. Efficient, almost exact simulation of the
between the algorithm and the implementation, a combined Heston stochastic volatility model. International Journal of Theoretical
mathematical and electrical engineering view is needed. (CM)2 and Applied Finance, 13(1):1–43, 2010.
[9] S. L. Heston. A Closed-Form Solution for Options with Stochastic
provides a perfect framework to do so. Volatility with Applications to Bond and Currency Options. Review of
To approximate the pricing process Monte Carlo simulations Financial Studies, 6(2):327, 1993.
are used. For a good implementation a fast algorithm with [10] A. Kaganov, P. Chow, and A. Lakhany. FPGA Acceleration of Monte-
Carlo based Credit Derivative Pricing. In Proc. Int. Conf. Field
an adjusted implementation thereof is needed. In order to Programmable Logic and Applications FPL 2008, pages 329–334, Sept.
distinguish the different algorithms we have created a bench- 2008.
mark set for double barrier options. This benchmark allows to [11] R. Lord, R. Koekkoek, and D. van Dijk. A comparison of biased sim-
ulation schemes for stochastic volatility models. Quantitative Finance,
fairly analyze and compare the diverse algorithms and designs, 10(2):177–194, 2010.
which is a very important issue due to the big differences in [12] G. W. Morris and M. Aubury. Design Space Exploration of the European
the convergence speed of these algorithms. Option Benchmark using Hyperstreams. In Field Programmable Logic
and Applications, 2007. FPL 2007. International Conference on, pages
For the target architecture, using FPGAs is the hardware of 5 –10, Aug. 2007.
choice if implementation time is not considered. It allows fast [13] D. B. Thomas, L. Howes, and W. Luk. A Comparison of CPUs, GPUs,
computation with low energy consumption. Nevertheless, op- FPGAs, and Massively Parallel Processor Arrays for Random Number
Generation. In Proceeding of the ACM/SIGDA international symposium
timal FPGA designs require deep understanding of the FPGA on Field programmable gate arrays, FPGA ’09, pages 63–72, New York,
characteristics and the calculations need to be optimized for it. NY, USA, 2009. ACM.
One commonality of all the Monte Carlo algorithms is the use [14] D. B. Thomas and W. Luk. Credit Risk Modelling using Hardware
Accelerated Monte-Carlo Simulation. In Proc. 16th Int. Symp. Field-
of (pseudo-)random numbers. In the Heston setting standard Programmable Custom Computing Machines FCCM ’08, pages 229–
normal random numbers are used. There are procedures to 238, Apr. 2008.
create these running efficiently on GPGPUs and CPUs. A [15] D. B. Thomas, W. Luk, P. H. Leong, and J. D. Villasenor. Gaussian
Random Number Generators. ACM Comput. Surv., 39(4):11, Oct. 2007.
new method was presented which is very efficient for an [16] P. Warren. City business races the Games for power. The Guardian,
implementation on a FPGA. May 2008.
The detailed analysis of the diverse algorithms was used to [17] N. A. Woods and T. VanCourt. FPGA Acceleration of Quasi-Monte
Carlo in Finance. In Proc. Int. Conf. Field Programmable Logic and
make an efficient implementation. Therefore the algorithm is Applications FPL 2008, pages 335–340, 2008.
implemented on an FPGA. This should allow a fast compu- [18] C. Wynnyk and M. Magdon-Ismail. Pricing the American Option using
tation with low energy consumption. As far as we know, this Reconfigurable Hardware. In Computational Science and Engineering,
2009. CSE ’09. International Conference on, volume 2, pages 532 –536,
will be the first implementation of a Monte Carlo simulation in Aug. 2009.
the Heston model on a FPGA. Furthermore it will be the first [19] B. Zhang and C. W. Oosterlee. Acceleration of Option Pricing Technique
implementation of the multi-level Monte Carlo method on this on Graphics Processing Units. Technical Report 10-03, Delft University
of Technology, Feb. 2010.
hardware. Thus this work expands the field of implementations [20] J. E. Zhang and J. Shu. Pricing s&p 500 index options with heston’s
of financial mathematical problems on dedicated hardware in model. In Proc. IEEE Int Computational Intelligence for Financial
several ways as new concepts are taken into consideration. Engineering Conf, pages 85–92, 2003.