=Paper= {{Paper |id=Vol-2145/p21 |storemode=property |title=CPU and GPU Implementations for High Frequency Trading in Algorithmic Finance |pdfUrl=https://ceur-ws.org/Vol-2145/p21.pdf |volume=Vol-2145 |authors=Mantas Vaitonis,Saulius Masteika }} ==CPU and GPU Implementations for High Frequency Trading in Algorithmic Finance== https://ceur-ws.org/Vol-2145/p21.pdf
  CPU and GPU Implementations for High Frequency
          Trading in Algorithmic Finance
                      Mantas Vaitonis                                                              Saulius Masteika
             Vilnius University Kaunas Faculty                                             Vilnius University Kaunas Faculty
                     Muitinės street. 8,                                                           Muitinės street. 8,
               LT-44280 Kaunas, Lithuania                                                    LT-44280 Kaunas, Lithuania
                 mantas.vaitonis@knf.vu.lt                                                    Saulius.masteika@knf.vu.lt


    Abstract— Today algorithmic trading and High Frequency                  Profit chances for high frequency traders are very time sensitive
Trading (HFT) account for a dominant part of overall trading                and low latency for trade execution is of the main importance.
volume in financial markets. The trade execution time has grown             Thus, HFT firms invest in hardware and high – speed
from daily trading to microseconds and nanoseconds.. A modern               connections and place their trading platforms close to stock
GPU allows hundreds of operations to be performed in parallel,              market servers via co-location. One of the hardware invested is
leaving the CPU free to execute other jobs. The main objective of           GPU. The architectures GPU are a cost effective alternative to
this research was to test the possibility and quantify how much             traditional parallel processing machines. This change ushers in
higher speedups the use of GPUs can bring in calculations of HFT            a new era in computing, which allows any modern personal
statistical arbitrage algorithms. In the research MATLAB
                                                                            computer to take advantage of parallel processing capabilities
software was applied for GPU application and computations. The
statistical arbitrage- pair trading algorithm was parallelized in
                                                                            previously available only in specialized systems.[20]
order to adapt it to GPU application. The effectiveness was                     Nowadays, standard computers come with sequential CPUs
measured according to time CPU and GPU did spent working on                 or with multicore CPUs, which allow a limited number of
historical data using pair trading strategy. In the paper the final         processes to be executed in parallel. On the other hand, the
results of the research are presented and discussed. The results            importance of graphics in most application domains pushed
have proven up to 30% increase in computational speed with the              industry into producing ad-hoc Graphical Processing Units
application of statistical arbitrage algorithm in HFT.
                                                                            (GPUs) to relieve the main CPU from the calculations required
  Keywords— high frequency trading; statistical arbitrage;
                                                                            for graphics. What is important here is that this hardware is
GPU; high performance computing; parallel computing.                        strongly parallel and may operate independent from the main
                                                                            CPU. A modern GPU, like those equipping most computers
                      I. INTRODUCTION                                       today, allows hundreds of operations to be performed in parallel,
                                                                            leaving the CPU free to execute other jobs. In particular, GPUs
    The computational power requirements have continuously                  offer hundreds of processing cores, but they can be used
increased in computer science fields such as computational                  simultaneously only to perform data parallel computations.
physics, quantitative finance and etc. One of the examples is               Moreover, GPUs usually have no direct access to the main
high-frequency trading (HFT) which is focused on automatic                  memory and they do not offer hardware managed caches; two
trading decisions making. All decisions to buy or to sell financial         aspects that make memory management a critical factor to be
instrument are made by computer algorithms without human                    carefully considered. [7]
interaction. The mentioned algorithms analyze the incoming
information which is received from the exchange system.                         The increasing pervasivity of parallel architectures like
Information from exchange system may include new                            multi-/many-core CPUs and GPUs, parallel programming has
transactions taking place with their transaction prices and                 become not an alternative but rather a need for increasing the
volumes, but in some systems also order submission, order                   software performance.[2]
modification and order deletion events of other exchange                        Graphics processing units (GPU) offer a new possibility for
members. If a trading algorithm decides to submit a buy or sell             speeding up large scale simulation of long range interacting
order to the exchange system, then within a few milliseconds                systems without sacrificing accuracy. GPU is a powerful device
this information is sent from exchange member’s system to the               which can process thousands of threads simultaneously with
central exchange server which is responsible for matching offer             high memory bandwidth. Compared to CPU, GPU is designed
and demand. The exchange server responds with a confirmation                with more transistors that are devoted to data processing rather
message. [6]                                                                than data caching and flow control. It is suitable for
   The trade execution time has grown from daily trading to                 computation-intensive and data-parallel computations needed
microseconds and even nanoseconds. By the increase in speed a               for high frequency traders that are time sensitive. [5]
huge number of orders and order cancellations are required.                     Multi-threaded parallel CPU implementations are expected
                                                                            to run faster than the single-threaded counterparts, the overhead
  Copyright held by the author(s).                                          of creating, destroying, and synchronizing threads may be very




                                                                      119
high. An alternative parallel computing platform is the GPU.                 intense calculations include Field-Programmable Gate Array
Originally, it was developed for graphics applications. Due to               (FPGA), IBM‟s Cell Broadband Engine Architecture (Cell BE
their massive parallel processing capabilities, state-of-the-art             or, simply, Cell) and Graphics Processing Units (GPUs). Until
GPUs are the leading software computing devices for the most                 recently GPU remained on fringes of HPC (high performance
parallel and computationally intensive applications such as high             computing) mostly because of the high learning curve caused by
frequency trading algorithms. [3]                                            the fact that low-level graphics languages were the only way to
                                                                             program the GPUs. Now, however, NVIDIA has come out with
    Our study demonstrates how the use of GPUs can bring                     a new line of graphics cards – Tesla. [6]
impressive speedups in statistical arbitrage trading algorithm,
leaving the main CPU free to focus on the remaining aspects of                   One of NVIDIA GPUs‟ main features is ease of
trading strategy. Several vendors have recently started offering             programmability made possible with CUDA – Compute Unified
toolkits to leverage the power of GPUs for general purpose                   Device Architecture. CUDA provides the means to compile and
programming. Unfortunately, they introduce a totally new                     run code for NVIDIA‟s GPUs. With a low learning curve,
model of computation, which requires algorithms to be fully re-              CUDA allows developers to tap into enormous computing
designed. In this research MATLAB was used for GPU                           power of GPUs yielding high performance benefits. [8] As
computing which allows to accelerate an application with GPUs                mentioned in the introduction, we use the compute unified
more easily than by using C or Fortran. With the MATLAB                      device architecture (CUDA), which allows for implementation
language it is possible take advantage of the CUDA GPU                       of algorithms using MATLAB with CUDA specific extensions.
computing technology without having to learn the intricacies of              Thus, CUDA issues and manages computations on a GPU as a
GPU architectures or low-level GPU computing libraries.                      data-parallel computing device. The graphics card architecture
                                                                             used in recent GPU generations is built around a scalable array
    In this paper, we investigate implementations of CPU and                 of streaming multiprocessors. [8] When a program using CUDA
GPU the parallel pair trading algorithm. The main aim of this                extensions and running on the CPU invokes a GPU kernel,
research is to explain the improved designs in detail, and report            which is a synonym for a GPU function, many copies of this
a performance comparison between CPU and GPU                                 kernel – known as threads – are enumerated and distributed to
implementations in terms of speed. Improvements suggested in                 the available multiprocessors, where their execution starts. [6]
the paper for CPU and GPU implementations are summarized as
faster speed due to new memory access patterns, and more
flexibility due to a more efficient use of processors, respectively.
    In order to take advantage of the CPU and GPU it is
necessary to parallelize the calculations. The effectiveness was
measured according to time CPU and GPU did spent working
on historical data using pair trading strategy. The strategy used
was first researched by D. Herlemont on his paper about pairs
trading [19]. This trading strategy was used on high frequency
data during previous researches. [24][32] However it was not
used with GPU. There are a number of functions of this trading
algorithm that can be parallelized like pair selection, trading
signal detection, trading and profit/loss calculation for each
trade. Thus, it had to be modified and parallelize in order to take
advantage of GPU. Importantly, not only pairs trading strategies,
but also the method of pairs selection is introduced in this
research.
                                                                             Fig. 1. Visualization of a GPU multiprocessor with on-chip shared
    Cointegration method was used for trading pairs selection.               memory.Example of a figure caption. (figure caption)
The pairs selection algorithm is based on using Augmented
Dickey Fuller Test, Engle and Grangers 2-step approach and                       As shown in Fig. 3, each multiprocessor of the GPU device
Johansen test. [12] Finally, the comparison of statistical                   contains several local registers per processor, memory which is
arbitrage trading strategy is given when using CPU and later                 shared by all scalar processor cores in a multiprocessor. In order
with GPU.                                                                    to allow for reducing the number of involved multiprocessors,
    The rest of the paper is organized as follows: theory and the            the slower global memory can be used, which is shared among
problem statement are presented in Sections 1 and 2, the                     all multiprocessors and is also accessible by the function running
methodology, including the pairs trading strategy, pairs                     in the CPU. Please note, that the GPU’s global memory is still
selection algorithm, speedup of an trading algorithm is presented            roughly 10 times faster than current main memory of personal
in Sections 3 and 4. The results and the summary of the research,            computers. However, each multiprocessor features only one
followed by conclusions in Section 5.                                        double-precision processing core and so, the theoretical peak
                                                                             performance is significantly reduced for double-precision
II. TRADING USING HARDWARE ACCELERATION                                      operations. [8]
   Hardware acceleration is achieved by utilizing specific
hardware to gain higher computational results than those
provided by general purpose CPU. Most devices intended for




                                                                       120
                 III. STATISTICAL ARBITRAGE                                    20150809     17:00:00.930168164    NGF6     NG      B        3221

    Correlation is a statistical term that comes from linear                   20150809     17:00:01.017456320    NGF6     NG      A        3226
regression analysis. This term defines the strength of a                       20150809     17:00:01.017456320    NGF6     NG      B        3219
relationship between two variables. The main idea of statistical
                                                                               20150809     17:00:01.059840559    NGF6     NG      A        3227
arbitrage or pairs trading is to find the pair of financial
instruments that are highly correlated. When a pair is found, a                20150809     17:00:01.059840559    NGF6     NG      B        3219
trader must look for the changes in correlation followed by mean               20150809     17:00:01.156791713    NGF6     NG      A        3238
– reversion to the trend of financial instruments pair, thereby,
creating a profit opportunity. This type of trading needs to                   20150809     17:00:01.156791713    NGF6     NG      B        3216
identify a relationship between two financial instruments, figure              20150809     17:00:01.204683812    NGF6     NG      A        3238
out the direction of their relationship, and execute long and short
positions, based on the statistical data presented. Selecting a                20150809     17:00:01.204683812    NGF6     NG      B        3216
good pair for trading becomes the most important stage of mean-                20150809     17:00:01.205605232    NGF6     NG      A        3238
reversion of the market-neutral statistical arbitrage
                                                                               20150809     17:00:01.205605232    NGF6     NG      B        3215
strategy.[26][34]
                                                                               20150809     17:00:01.206755867    NGF6     NG      A        3238
A. Pairs Trading Using Cointegration
                                                                               20150809     17:00:01.206755867    NGF6     NG      B        3215
    The cointegration method uses mathematical model,
                                                                               20150809     17:00:01.207350519    NGF6     NG      A        3231
developed by Engle and Granger [17], which have attracted a
considerable interest of the economists over the last two                      20150809     17:00:01.207350519    NGF6     NG      B        3215
decades. Cointegration states that, in some instances, despite                 20150809     17:00:01.208805474    NGF6     NG      A        3231
two given non-stationary time series, a specific linear
combination of the two time series is actually stationary. The                 20150809     17:00:01.208805474    NGF6     NG      B        3217
two time series move together in a lockstep fashion. The                       20150809     17:00:01.224604710    NGF6     NG      A        3233
cointegration can be described like this: xt and yt are two time
series that were non-stationary. If there was parameter and the              20150809     17:00:01.224604710    NGF6     NG      B        3217
following equation:
                          zt=yt-xt                              (1)             The cointegration method uses mathematical model,
    was a stationary process, then xt and yt would be                         developed
cointegrated. This path-breaking process emerged as a powerful                                    IV. 3.   METHODOLOGY
tool for investigating common asset trends in multivariate time
series. [25]                                                                      The main purpose of pairs trading is to find two financial
                                                                              instruments that move together. Once the pair of these
B. Data                                                                       instruments is found, strategy has to decide when to take long
    The microsecond data for this research was provided by                    and short positions based on the trading rules. Following the
Nanotick company. Futures contract data is from ME group                      research, six main steps of pairs trading strategy were identified:
which consists of NYMEX, COMEX and CBOT. Nanotick                                 1.   Selection of the size of the window trading and data
provided five different futures commodity contracts: NG                                normalization;
(natural gas), BZ (Brent crude oil), CL (crude oil), HO (NY
Harbor ULSD) , RB (RBOB Gasoline). Time period of                                 2.   Data normalization;
commodity futures contracts was from 01-08-2015 to 31-08-                         3.   Selection of the correlated pair;
2015.
                                                                                  4.   Definition of the trading rules;
    After normalization, microsecond futures commodity
contracts data consisted of 24957994 records. Upon preparation,                   5.   Trading;
the data had to be applied to statistical arbitrage trading strategy.
                                                                                  6.   Assessment of the pairs trading strategy.[16][24][32]
                                                                                  Before selecting trading and data normalization window,
                                                                              strategy has to be trained. Thus, before starting to trade, some
                                                                              data must be used for training. This data may be called out of
   TABLE I.       MICROSECOND DATA EXAMPLE FOR NGF6 CONTRACT                  sample data. All data of microsecond futures commodity
 Receiving    Receiving Time          Symbo   Asse   Entr     Entr
                                                                              contracts had to be divided into training and testing datasets. The
 Date                                 l       t      y        y               method of dividing data into training and testing periods was
                                                     Type     Price           referred to as the holdout method in statistical classification. [26]
 20150809     17:00:00.869053009      NGF6    NG     A        3227            When selecting training or out of sample period, it is important
 20150809     17:00:00.869053009      NGF6    NG     B        3221            to select the right size of this window: if too big window is
                                                                              chosen, strategy may overtrain and it cannot be too small as the
 20150809     17:00:00.930168164      NGF6    NG     A        3226            strategy will not be able to notice the abnormal behaviour. [30]




                                                                        121
Finally, the testing period follows immediately after the training           of pairs. To test for cointegration we adopted Engle and Granger
period.                                                                      2-step approach and Johansen test. This methodology is based
                                                                             on Caldeira and Moura. [12]
A. Data Normalization
                                                                                 Johansen test determines the number of cointegrating
    Upon receiving the microsecond data for commodity futures                relations and also implements a multivariate extension of the 2-
contracts, next step was to normalize these data to be able to               step Engle and Granger procedure. [12]
implement them in our test environment. First task was to bring
time stamp data together. For example, if we have a time stamp                   All of the procedures are implemented on MATLAB. The
of 17:00:00.869053009 in one contract and the time stamp of                  second part of the algorithm creates trading signals for the
17:00:00.825207610 in other futures contract, these two time                 detected cointegrating relations based on the predefined
stamps have to appear in both contracts. In our case, all different          investment decision rules.
time stamps had to appear in all five different futures contacts.
                                                                                               V. EXPERIMENTAL SETUP
    If the contract is filled with a new time stamp, the price for
that futures contract is set the same as the last time stamp. It is              The two main criteria for algorithmic trading are speed – that
assumed that the price did not change for that time. In this way,            is the speed with which the same set of computations can be
all time stamps of futures contracts are normalized for                      performed on multiple sets of data – and programmability. For
nanosecond and microsecond data. [24][32]                                    this principle, general-purpose hardware – such as Intel Central
                                                                             Processing Unit (CPU) – is not suitable. The CPU is designed to
    As all time stamps for all the futures contracts were obtained,          execute commands in a linear fashion, however, the task at hand
it was time to define data out of sample, normalization and                  will benefit most from parallelization as the same calculations
trading periods. During this procedure, all parameter were kept              are required to be performed on multiple data; this is where
the same: out of sample period was 5 minutes, normalization and              parallelization and hardware acceleration come into play.
trading period was kept the same, i.e., 20 seconds for each
trading window. One more period was selected, which is for                       During our research CPU used was Intel i5 - 3230M 2,6 GHz
closing the positions, which was 20 seconds as well.                         with two cores (2 MATLAB worker) and GPU GeForce 710M
                                                                             with 96 CUDA cores. Firstly we did apply the pair trading
    Upon setting and defining the above parameters on the                    strategy only two CPU. Using “parfor” function of MATLAB
trading strategy, price normalization follows. When normalizing              which allows hundreds of operations to be performed in parallel
for each price of futures commodity contract P(i,t), we calculate            with CPU we did detect calculations that were possible to
empirical mean µ(i,t) and standard deviation σ(i,t) for the                  parallelize. During this stage we did speed up the strategy to
selected normalization period, and then apply the following                  maximize its performance by using only CPU.
equation [30]:
                                                                                 When it came to GPU we did use gpuArray and arrayfun
                                  𝑃(𝑖,𝑡)−𝜇(𝑖,𝑡)
                      𝑝(𝑖, 𝑡) =                                 (2)          GPU functions together with parfor, which works on CPU.
                                     𝜎(𝑖,𝑡)
                                                                             GpuArray creates array on GPU and arrayfun applys function to
   Value p(i,t) is the normalized price of futures commodity                 each element of array. This method of using gpuArray with
contract i at time t. [30]                                                   arrayfun makes actual evaluation of the function happens on the
                                                                             GPU, not on the CPU. Thus, any required data not already on
B. Pair Selection                                                            the GPU is moved to GPU memory, the MATLAB function
    One of two main parts of this trading methodology is the                 passed in for evaluation is compiled for the GPU, and then
pairs selection algorithm which is essentially based on                      executed on the GPU. All the output arguments return as
cointegration testing. Cointegration method involves the                     gpuArray objects. [10][11]
following steps:                                                                 In our experiment we did parallelize pair detection, detecting
   1. Identify futures contract pairs that could potentially be              buy/sell signals, the trading and profit calculation. It was
cointegrated;                                                                possible to parallelize these functions because every iteration the
                                                                             strategy has it must perform same calculations. In order not to
    2. Once the potential pairs are identified, we need to verify            wait for one function to stop we can perform multiple
the proposed hypothesis that the futures contract pairs are indeed           calculations with multiple functions.
cointegrated based on the information from historical data;
                                                                                             VI. EXPERIMENTAL RESULTS
   3. Examine the cointegrated pairs to determine whether they
can be trade on. [33]                                                            The overall pair trading strategy performance was measured
                                                                             in the profit it did generate. During the experiment we did not
    The objective of this phase is to identify the pairs with linear
                                                                             use transactions cost, which was kept zero, and the amount
combination exhibiting a significant predictable component that
                                                                             invested in each trade was kept the same, which was 10. The
is uncorrelated with underlying movements in the market as a
                                                                             profit/loss was measured in percentage in change of overall
whole. With this aim, we first measure the spread of pair prices
                                                                             difference at the end of each trading day. A more detailed
for stationarity. In this research, it is done by checking whether
                                                                             information is presented in figure below.
the data series are integrated in the same order by using
Augmented Dickey Fuller Test (ADF), which is the extended
version Dickey Fuller. [12] Having passed the ADF test,
cointegration tests are performed on all possible combinations




                                                                       122
                                                                                      2015-08-26     3187,60           2600,40                 5119660
                                                                                      2015-08-27     5004,90           4244,20                 7963320
                                                                                      2015-08-28     5287,10           4413,10                 7721975
                                                                                      2015-08-31     5409,70           4594,10                 8613445

                                                                                         From table 2 it is shown how much time in seconds did
                                                                                     algorithm spend on each day trading simulation using different
                                                                                     hardware CPU (Intel i5 - 3230M 2,6 GHz,2 cores) and GPU
                                                                                     (GeForce 710m, 96 CUDA Cores) and how many records it had
                                                                                     to process.
                                                                                        The more detailed information is presented in figure below
                                                                                     where the speedup difference in percentage is shown.

Fig. 2. Strategy performance for each day by the profit it did generate

    Figure 2 above shows the daily profits from HFT trading
algorithm and confirms the results revealed by High Frequency
Trading market leader Virtu Financial, Inc, where only one
losing trading day out of 1237 days was generated [14]. The
chart in Figure 1 illustrates daily results of an algorithm-based
on a statistical arbitrage HFT system. The less profitable days
occur because of fewer trades, due to less trade signals, rather
than fluctuations or a series of unproductive trades. However,                       Fig. 3. The improvement of the algorithm when using GPU
our research aim was not to measure the profit of the strategy but
to improve the speed of algorithm by using GPU. The same pair                            As shown in figure above when pair trading algorithm was
trading strategy was applied to CPU and later to CPU working                         presented to GPU, the speed of simulation did improve
together with GPU. In the table below we can see the amount of                       dramatically varying from 12% to 36% improve in overall
records pairs trading algorithm had to process and how much                          speed. The difference of speed for different days occurs due to
time did it take using CPU and GPU.                                                  different number of trades made and different number of trade
                                                                                     signals. The more parameters are possible to make parallel and
                TABLE II.        CPU AND GPU COMPARISON                              move to GPU, the bigger speedup is possible to achieve. It is
                                                                                     shown that CPU, even with multi-threaded implementation, is
 Date            Intel i5 - 3230M     GeForce 710m, 96        Number      of
                 2,6 GHz,2 cores      CUDA Cores (in          records                not a feasible option for large dense matrices. For the GPU
                 (in seconds)         seconds)                processed              implementation, performance impact of the global memory
                                                                                     access patterns on the GPU board and the memory coalescing
 2015-08-03      2991,80              2081,60                 6096505
                                                                                     are emphasized. In our case the bigger the matrix of trades and
 2015-08-04      2208,10              1400,50                 4579465                pairs the more measurable is the speed up by GPU. The results
 2015-08-05      2393,70              1783,10                 5793525                show the importance of technical advantages in HFT and how
                                                                                     important is to improve the algorithm in order to use the most of
 2015-08-06      3040,90              2585,3                  5595770                the hardware it is presented to. In our research the possibility to
 2015-08-07      2650,10              2027,1                  5586360                improve the speed of daily trading with microseconds came,
                                                                                     when algorithms calculations were parallelized and presented to
 2015-08-10      4410,80              3080,70                 5732355
                                                                                     GPU using gpuArrays and arrayfun in MATLAB, that allows to
 2015-08-11      4980,30              3154,50                 6249980                exploit the GPU at hand.
 2015-08-12      2769,20              2151,20                 6758875                                          VII. CONCLUSIONS
 2015-08-13      4122,60              3419,00                 5666900                    Recent technological advances have made trading in the
 2015-08-14      1325,90              1055,80                 4227335                markets fast and mostly done by computers and algorithms.
                                                                                     Instead of humans, computers replicate the role of market
 2015-08-17      1550,00              1171,10                 4879990
                                                                                     makers, specialists or liquidity providers but at a much higher
 2015-08-18      1912,10              1299,50                 4364540                rate of speed. The number of derived financial instruments has
 2015-08-19      4002,30              3278,70                 5666700                caused increased opportunities for profits arising from pricing
                                                                                     inefficiencies or price move delays between securities. Trading
 2015-08-20      4449,00              3119,43                 5411145                algorithms now work not only with CPU, but with GPU. These
 2015-08-21      4311,70              3389,10                 5946205                factors have been driving forces to test the system based on pair
                                                                                     trading in HFT and see how the effectiveness differ when using
 2015-08-24      4809,40              4064,00                 7710745
                                                                                     different hardware. In this paper, high frequency algorithmic
 2015-08-25      3960,20              3466,10                 5105175                pairs trading was developed on the market - neutral statistical
                                                                                     arbitrage strategy presented by D. Herlemont. Importantly, all




                                                                               123
five futures commodity contracts, used for the proposed pairs                         [10] Matlab.       (2016),     se.mathworks.com.      [ONLINE]         Available
trading strategy, belong to same CME group, which is the                                   at: https://se.mathworks.com/help/distcomp/gpu-computing.html.
world's largest options and futures exchange platform. proposed                       [11] Matlab.       (2015),     se.mathworks.com.      [ONLINE]         Available
                                                                                           at: https://se.mathworks.com/discovery/matlab-gpu.html.
trading strategy used the pairs selection algorithm which
                                                                                      [12] Caldeira J. F., Moura G. V. (2013), “Selection of a portfolio of pairs based
consisted of the Augmented Dickey Fuller test. If futures                                  on cointegration: A statistical arbitrage strategy”, Revista Brasileira de
commodity contracts prices pass the Augmented Dickey Fuller                                Financas, Vol. 11(1), pp. 49–80.
test, cointegration tests are performed on all possible                               [13] Bogoev D., Karam A. (2016), “An Empirical detection of High Frequency
combination of pairs. To test for cointegration Engle and                                  Trading Strategies”, 6th International Conference of the Financial
Grangers 2-step approach and Johansen test was adopted.                                    Engineering and Banking Society. June 10-12, 2016 Melaga.
Trading strategy was firstly presented to CPU (Intel i5 - 3230M                       [14] Cifu D. A. (2014), “FORM S-1, Registration Statement Under The
2,6 GHz1 2 cores) and later to GPU (GeForce 710m, 96 CUDS                                  Securities Act Of 1933”, Virtu Financial, Inc.
cores). All trading parameters were kept the same during                              [15] Dickey D., Fuller W. (1979), “Distribution of the Estimator for
research. The purpose of this was to measure the effectiveness                             Autoregressive Time series with a Unit Root”, Journal of the American
                                                                                           Statistical Association, Vol. 74, pp. 427-431.
of hardware and to check how much higher frequency trading
                                                                                      [16] Driaunys K., Masteika S., Sakalauksas V., Vaitonis M. (2014), “An
evolution and performance improves when it is presented to                                 algorithm-based statistical arbitrage high frequency trading system to
GPU rather than to only CPU. At the end of the research, when                              forecast prices of natural gas futures”, Transformations in business and
all datasets were implemented to the pairs selection algorithm                             economics. Vol. 13(3), p. 96–109.
working with CPU and GPU, the results were gathered. It should                        [17] Engle, R. F., Granger, C. W. J. (1987), “Co-integration and error
be no surprise that when algorithm was presented to GPU it did                             correction: Representation, estimation, and testing”, Econometrica, Vol.
perform more effective. The speed up of daily improvement of                               55(2), pp. 251–276.
speed did vary from 12% to 36%. The difference of speed for                           [18] Fox M. B., Glosten L. R., Rauterberg G. V. (2015), “The New Stock
                                                                                           Market: Sense and Nonsense” , 65 Duke L.J. 191.
different days occurs due to different number of trades made and
different number of trade signals. The more parameters are                            [19] Herlemont D. (2013), “Pairs Trading, Convergence Trading,
                                                                                           Cointegration”, Quantitative Finance, Vol. 12(9).
possible to make parallel and move to GPU, the bigger speedup
                                                                                      [20] Kaya O. (2016), “High – frequency trading. Reaching the limits”,
is possible to achieve. The increase could be even more dramatic                           Automated trader magazine. Vol. 41, p. 23 – 27.
if algorithm would be presented to even more financial                                [21] Kirchner S. (2015), “High frequency trading: Fact and fiction”, Policy: A
instruments and more trading signals would be created.                                     Journal of Public Policy and Ideas, Vol. 31(4), pp. 8-20..
                                                                                      [22] Lau C. A., Xie W., Wu Y. (2016), “Multi – Dimensional Pairs Trading
                          ACKNOWLEDGMENT                                                   Using Copulas”, European Financial Management Association 2016
   We would also like to show our gratitude to the NANOTICK                                Annual Meetings June 29-July 2, 2016 Basel, Switzerland.
for providing high frequency data in microseconds of 5                                [23] Madhavaram G. R. (2013), “Statistical Arbitrage Using Pairs Trading
commodity futures contracts.                                                               With Support Vector Machine Learning”, Saint Mary's University.
                                                                                      [24] Masteika S., Vaitonis M. (2015), “Quantitative Research in High
                              REFERENCES                                                   Frequency Trading for Natural Gas Futures Market”, Business
                                                                                           Information Systems Workshops, Vol. 228, pp. 29-35.
[1]   Ahmed M., Chai A., Ding X., Jiang Y., Sun Y. (2009), “Statistical
                                                                                      [25] Miao G. J. (2014), “High Frequency and Dynamic Pairs Trading Based
      Arbitrage in High Frequency Trading Based on Limit Order Book
                                                                                           on Statistical Arbitrage Using a Two-Stage Correlation and Cointegration
      Dynamics”.
                                                                                           Approach”, International Journal of Economics and Finance, Vol. 6(3),
[2]   Danelutto M., De Matteis T., Mencagli G., Torquati M. (2015),                        pp. 96 – 110.
      “Parallelizing High-Frequency Trading Applications by Using C++11
                                                                                      [26] Miao G. J., Clements M. A. (2002), “Digital Signal Processing and
      Attributes”, August 2015, IEEE.
                                                                                           Statistical Classification”, Artech House, ISBN 1580531350.
[3]   Mustafa U. Torun, Onur Yılmaz, Ali N. Akansu. (2016), “FPGA, GPU,
                                                                                      [27] Miller R. S., Shorter G. (2016), “High Frequency Trading: Overview of
      and CPU implementations of Jacobi algorithm for eigenanalysis”, Journal
                                                                                           Recent Developments”, report, April 4, 2016; Washington D.C
      of Parallel and Distributed Computing, Vol. 96, pp 172-180.
                                                                                      [28] Mushtaq R. (2011), “Augmented Dickey Fuller Test”. Available at SSRN:
[4]   Kozikowski G., Papamanousakis G., Yang J. (2015), “Potential future
                                                                                           https://ssrn.com/abstract=1911068.
      exposure, modelling and accelerating on GPU and FPGA”, WHPCF 2015
      Proceedings of the 8th Workshop on High Performance Computational               [29] Ohara M. (2015), “High frequency market microstructure”, Journal of
      Finance, Article No. 4.                                                              Financial Economics, Vol. 116(2), pp. 257–270.
[5]   Liang Y., Xing, X., Li Y.(2017), “A GPU-based large-scale Monte Carlo           [30] Perlin M. S. (2009), “Evaluation of Pairs-trading strategy at the Brazilian
      simulation method for systems with long-range interactions”, Journal of              financial market”, Journal of Derivatives & Hedge Funds, Vol. 15(2), pp.
      Computational Physics, Vol/ 338, pp. 252-268 .                                       122–136.
[6]   Preis T. (2011), “GPU – computing in econophysics and statistical               [31] Vaitonis M. (2017), “Pairs Trading Using HFT in OMX Baltic Market”,
      physics”, The European Physical Journal Special Topics, Vol. 194, pp.                Baltic J. Modern Computing, Vol. 5(1), pp. 37-49.
      87 – 119.                                                                       [32] Vaitonis M., Masteika S. (2016), “Research in High Frequency Trading
[7]   Margara A., Cugola G. (2011), “High performance content-based                        and Pairs Selection Algorithm with Baltic Region Stocks”, In: Dregvaite
      matching using GPUs”, Proceedings of the 5th ACM international                       G., Damasevicius R. Information and Software Technologies. ICIST
      conference on Distributed event-based system, New York, USA                          2016. Communications in Computer and Information Science, Vol 639.
                                                                                           Springer.
[8]   NVIDIA Corporation. (2008) NVIDIA CUDA Compute Unified Device
      Architecture,                                                                   [33] Vidyamurthy G. (2004), “Pairs Trading – Quantitative Methods and
                                                                                           Analysis, New Jersey”, John Wiley & Sons, Inc., p.210.
[9]   Napoli C. et al., A cloud-distributed GPU architecture for pattern
      identification in segmented detectors big-data surveys. The Computer            [34] Zubulake P., Lee S. (2011), “The High frequency game changer: how
      Journal, vol. 59, issue 3 , pp.338-352.                                              automated trading strategies have revolutionized the markets”, Aite
                                                                                           group. Wiley trading.




                                                                                124