CPU and GPU Implementations for High Frequency Trading in Algorithmic Finance Mantas Vaitonis Saulius Masteika Vilnius University Kaunas Faculty Vilnius University Kaunas Faculty Muitinės street. 8, Muitinės street. 8, LT-44280 Kaunas, Lithuania LT-44280 Kaunas, Lithuania mantas.vaitonis@knf.vu.lt Saulius.masteika@knf.vu.lt Abstract— Today algorithmic trading and High Frequency Profit chances for high frequency traders are very time sensitive Trading (HFT) account for a dominant part of overall trading and low latency for trade execution is of the main importance. volume in financial markets. The trade execution time has grown Thus, HFT firms invest in hardware and high – speed from daily trading to microseconds and nanoseconds.. A modern connections and place their trading platforms close to stock GPU allows hundreds of operations to be performed in parallel, market servers via co-location. One of the hardware invested is leaving the CPU free to execute other jobs. The main objective of GPU. The architectures GPU are a cost effective alternative to this research was to test the possibility and quantify how much traditional parallel processing machines. This change ushers in higher speedups the use of GPUs can bring in calculations of HFT a new era in computing, which allows any modern personal statistical arbitrage algorithms. In the research MATLAB computer to take advantage of parallel processing capabilities software was applied for GPU application and computations. The statistical arbitrage- pair trading algorithm was parallelized in previously available only in specialized systems.[20] order to adapt it to GPU application. The effectiveness was Nowadays, standard computers come with sequential CPUs measured according to time CPU and GPU did spent working on or with multicore CPUs, which allow a limited number of historical data using pair trading strategy. In the paper the final processes to be executed in parallel. On the other hand, the results of the research are presented and discussed. The results importance of graphics in most application domains pushed have proven up to 30% increase in computational speed with the industry into producing ad-hoc Graphical Processing Units application of statistical arbitrage algorithm in HFT. (GPUs) to relieve the main CPU from the calculations required Keywords— high frequency trading; statistical arbitrage; for graphics. What is important here is that this hardware is GPU; high performance computing; parallel computing. strongly parallel and may operate independent from the main CPU. A modern GPU, like those equipping most computers I. INTRODUCTION today, allows hundreds of operations to be performed in parallel, leaving the CPU free to execute other jobs. In particular, GPUs The computational power requirements have continuously offer hundreds of processing cores, but they can be used increased in computer science fields such as computational simultaneously only to perform data parallel computations. physics, quantitative finance and etc. One of the examples is Moreover, GPUs usually have no direct access to the main high-frequency trading (HFT) which is focused on automatic memory and they do not offer hardware managed caches; two trading decisions making. All decisions to buy or to sell financial aspects that make memory management a critical factor to be instrument are made by computer algorithms without human carefully considered. [7] interaction. The mentioned algorithms analyze the incoming information which is received from the exchange system. The increasing pervasivity of parallel architectures like Information from exchange system may include new multi-/many-core CPUs and GPUs, parallel programming has transactions taking place with their transaction prices and become not an alternative but rather a need for increasing the volumes, but in some systems also order submission, order software performance.[2] modification and order deletion events of other exchange Graphics processing units (GPU) offer a new possibility for members. If a trading algorithm decides to submit a buy or sell speeding up large scale simulation of long range interacting order to the exchange system, then within a few milliseconds systems without sacrificing accuracy. GPU is a powerful device this information is sent from exchange member’s system to the which can process thousands of threads simultaneously with central exchange server which is responsible for matching offer high memory bandwidth. Compared to CPU, GPU is designed and demand. The exchange server responds with a confirmation with more transistors that are devoted to data processing rather message. [6] than data caching and flow control. It is suitable for The trade execution time has grown from daily trading to computation-intensive and data-parallel computations needed microseconds and even nanoseconds. By the increase in speed a for high frequency traders that are time sensitive. [5] huge number of orders and order cancellations are required. Multi-threaded parallel CPU implementations are expected to run faster than the single-threaded counterparts, the overhead Copyright held by the author(s). of creating, destroying, and synchronizing threads may be very 119 high. An alternative parallel computing platform is the GPU. intense calculations include Field-Programmable Gate Array Originally, it was developed for graphics applications. Due to (FPGA), IBM‟s Cell Broadband Engine Architecture (Cell BE their massive parallel processing capabilities, state-of-the-art or, simply, Cell) and Graphics Processing Units (GPUs). Until GPUs are the leading software computing devices for the most recently GPU remained on fringes of HPC (high performance parallel and computationally intensive applications such as high computing) mostly because of the high learning curve caused by frequency trading algorithms. [3] the fact that low-level graphics languages were the only way to program the GPUs. Now, however, NVIDIA has come out with Our study demonstrates how the use of GPUs can bring a new line of graphics cards – Tesla. [6] impressive speedups in statistical arbitrage trading algorithm, leaving the main CPU free to focus on the remaining aspects of One of NVIDIA GPUs‟ main features is ease of trading strategy. Several vendors have recently started offering programmability made possible with CUDA – Compute Unified toolkits to leverage the power of GPUs for general purpose Device Architecture. CUDA provides the means to compile and programming. Unfortunately, they introduce a totally new run code for NVIDIA‟s GPUs. With a low learning curve, model of computation, which requires algorithms to be fully re- CUDA allows developers to tap into enormous computing designed. In this research MATLAB was used for GPU power of GPUs yielding high performance benefits. [8] As computing which allows to accelerate an application with GPUs mentioned in the introduction, we use the compute unified more easily than by using C or Fortran. With the MATLAB device architecture (CUDA), which allows for implementation language it is possible take advantage of the CUDA GPU of algorithms using MATLAB with CUDA specific extensions. computing technology without having to learn the intricacies of Thus, CUDA issues and manages computations on a GPU as a GPU architectures or low-level GPU computing libraries. data-parallel computing device. The graphics card architecture used in recent GPU generations is built around a scalable array In this paper, we investigate implementations of CPU and of streaming multiprocessors. [8] When a program using CUDA GPU the parallel pair trading algorithm. The main aim of this extensions and running on the CPU invokes a GPU kernel, research is to explain the improved designs in detail, and report which is a synonym for a GPU function, many copies of this a performance comparison between CPU and GPU kernel – known as threads – are enumerated and distributed to implementations in terms of speed. Improvements suggested in the available multiprocessors, where their execution starts. [6] the paper for CPU and GPU implementations are summarized as faster speed due to new memory access patterns, and more flexibility due to a more efficient use of processors, respectively. In order to take advantage of the CPU and GPU it is necessary to parallelize the calculations. The effectiveness was measured according to time CPU and GPU did spent working on historical data using pair trading strategy. The strategy used was first researched by D. Herlemont on his paper about pairs trading [19]. This trading strategy was used on high frequency data during previous researches. [24][32] However it was not used with GPU. There are a number of functions of this trading algorithm that can be parallelized like pair selection, trading signal detection, trading and profit/loss calculation for each trade. Thus, it had to be modified and parallelize in order to take advantage of GPU. Importantly, not only pairs trading strategies, but also the method of pairs selection is introduced in this research. Fig. 1. Visualization of a GPU multiprocessor with on-chip shared Cointegration method was used for trading pairs selection. memory.Example of a figure caption. (figure caption) The pairs selection algorithm is based on using Augmented Dickey Fuller Test, Engle and Grangers 2-step approach and As shown in Fig. 3, each multiprocessor of the GPU device Johansen test. [12] Finally, the comparison of statistical contains several local registers per processor, memory which is arbitrage trading strategy is given when using CPU and later shared by all scalar processor cores in a multiprocessor. In order with GPU. to allow for reducing the number of involved multiprocessors, The rest of the paper is organized as follows: theory and the the slower global memory can be used, which is shared among problem statement are presented in Sections 1 and 2, the all multiprocessors and is also accessible by the function running methodology, including the pairs trading strategy, pairs in the CPU. Please note, that the GPU’s global memory is still selection algorithm, speedup of an trading algorithm is presented roughly 10 times faster than current main memory of personal in Sections 3 and 4. The results and the summary of the research, computers. However, each multiprocessor features only one followed by conclusions in Section 5. double-precision processing core and so, the theoretical peak performance is significantly reduced for double-precision II. TRADING USING HARDWARE ACCELERATION operations. [8] Hardware acceleration is achieved by utilizing specific hardware to gain higher computational results than those provided by general purpose CPU. Most devices intended for 120 III. STATISTICAL ARBITRAGE 20150809 17:00:00.930168164 NGF6 NG B 3221 Correlation is a statistical term that comes from linear 20150809 17:00:01.017456320 NGF6 NG A 3226 regression analysis. This term defines the strength of a 20150809 17:00:01.017456320 NGF6 NG B 3219 relationship between two variables. The main idea of statistical 20150809 17:00:01.059840559 NGF6 NG A 3227 arbitrage or pairs trading is to find the pair of financial instruments that are highly correlated. When a pair is found, a 20150809 17:00:01.059840559 NGF6 NG B 3219 trader must look for the changes in correlation followed by mean 20150809 17:00:01.156791713 NGF6 NG A 3238 – reversion to the trend of financial instruments pair, thereby, creating a profit opportunity. This type of trading needs to 20150809 17:00:01.156791713 NGF6 NG B 3216 identify a relationship between two financial instruments, figure 20150809 17:00:01.204683812 NGF6 NG A 3238 out the direction of their relationship, and execute long and short positions, based on the statistical data presented. Selecting a 20150809 17:00:01.204683812 NGF6 NG B 3216 good pair for trading becomes the most important stage of mean- 20150809 17:00:01.205605232 NGF6 NG A 3238 reversion of the market-neutral statistical arbitrage 20150809 17:00:01.205605232 NGF6 NG B 3215 strategy.[26][34] 20150809 17:00:01.206755867 NGF6 NG A 3238 A. Pairs Trading Using Cointegration 20150809 17:00:01.206755867 NGF6 NG B 3215 The cointegration method uses mathematical model, 20150809 17:00:01.207350519 NGF6 NG A 3231 developed by Engle and Granger [17], which have attracted a considerable interest of the economists over the last two 20150809 17:00:01.207350519 NGF6 NG B 3215 decades. Cointegration states that, in some instances, despite 20150809 17:00:01.208805474 NGF6 NG A 3231 two given non-stationary time series, a specific linear combination of the two time series is actually stationary. The 20150809 17:00:01.208805474 NGF6 NG B 3217 two time series move together in a lockstep fashion. The 20150809 17:00:01.224604710 NGF6 NG A 3233 cointegration can be described like this: xt and yt are two time series that were non-stationary. If there was parameter and the 20150809 17:00:01.224604710 NGF6 NG B 3217 following equation: zt=yt-xt (1) The cointegration method uses mathematical model, was a stationary process, then xt and yt would be developed cointegrated. This path-breaking process emerged as a powerful IV. 3. METHODOLOGY tool for investigating common asset trends in multivariate time series. [25] The main purpose of pairs trading is to find two financial instruments that move together. Once the pair of these B. Data instruments is found, strategy has to decide when to take long The microsecond data for this research was provided by and short positions based on the trading rules. Following the Nanotick company. Futures contract data is from ME group research, six main steps of pairs trading strategy were identified: which consists of NYMEX, COMEX and CBOT. Nanotick 1. Selection of the size of the window trading and data provided five different futures commodity contracts: NG normalization; (natural gas), BZ (Brent crude oil), CL (crude oil), HO (NY Harbor ULSD) , RB (RBOB Gasoline). Time period of 2. Data normalization; commodity futures contracts was from 01-08-2015 to 31-08- 3. Selection of the correlated pair; 2015. 4. Definition of the trading rules; After normalization, microsecond futures commodity contracts data consisted of 24957994 records. Upon preparation, 5. Trading; the data had to be applied to statistical arbitrage trading strategy. 6. Assessment of the pairs trading strategy.[16][24][32] Before selecting trading and data normalization window, strategy has to be trained. Thus, before starting to trade, some data must be used for training. This data may be called out of TABLE I. MICROSECOND DATA EXAMPLE FOR NGF6 CONTRACT sample data. All data of microsecond futures commodity Receiving Receiving Time Symbo Asse Entr Entr contracts had to be divided into training and testing datasets. The Date l t y y method of dividing data into training and testing periods was Type Price referred to as the holdout method in statistical classification. [26] 20150809 17:00:00.869053009 NGF6 NG A 3227 When selecting training or out of sample period, it is important 20150809 17:00:00.869053009 NGF6 NG B 3221 to select the right size of this window: if too big window is chosen, strategy may overtrain and it cannot be too small as the 20150809 17:00:00.930168164 NGF6 NG A 3226 strategy will not be able to notice the abnormal behaviour. [30] 121 Finally, the testing period follows immediately after the training of pairs. To test for cointegration we adopted Engle and Granger period. 2-step approach and Johansen test. This methodology is based on Caldeira and Moura. [12] A. Data Normalization Johansen test determines the number of cointegrating Upon receiving the microsecond data for commodity futures relations and also implements a multivariate extension of the 2- contracts, next step was to normalize these data to be able to step Engle and Granger procedure. [12] implement them in our test environment. First task was to bring time stamp data together. For example, if we have a time stamp All of the procedures are implemented on MATLAB. The of 17:00:00.869053009 in one contract and the time stamp of second part of the algorithm creates trading signals for the 17:00:00.825207610 in other futures contract, these two time detected cointegrating relations based on the predefined stamps have to appear in both contracts. In our case, all different investment decision rules. time stamps had to appear in all five different futures contacts. V. EXPERIMENTAL SETUP If the contract is filled with a new time stamp, the price for that futures contract is set the same as the last time stamp. It is The two main criteria for algorithmic trading are speed – that assumed that the price did not change for that time. In this way, is the speed with which the same set of computations can be all time stamps of futures contracts are normalized for performed on multiple sets of data – and programmability. For nanosecond and microsecond data. [24][32] this principle, general-purpose hardware – such as Intel Central Processing Unit (CPU) – is not suitable. The CPU is designed to As all time stamps for all the futures contracts were obtained, execute commands in a linear fashion, however, the task at hand it was time to define data out of sample, normalization and will benefit most from parallelization as the same calculations trading periods. During this procedure, all parameter were kept are required to be performed on multiple data; this is where the same: out of sample period was 5 minutes, normalization and parallelization and hardware acceleration come into play. trading period was kept the same, i.e., 20 seconds for each trading window. One more period was selected, which is for During our research CPU used was Intel i5 - 3230M 2,6 GHz closing the positions, which was 20 seconds as well. with two cores (2 MATLAB worker) and GPU GeForce 710M with 96 CUDA cores. Firstly we did apply the pair trading Upon setting and defining the above parameters on the strategy only two CPU. Using “parfor” function of MATLAB trading strategy, price normalization follows. When normalizing which allows hundreds of operations to be performed in parallel for each price of futures commodity contract P(i,t), we calculate with CPU we did detect calculations that were possible to empirical mean µ(i,t) and standard deviation σ(i,t) for the parallelize. During this stage we did speed up the strategy to selected normalization period, and then apply the following maximize its performance by using only CPU. equation [30]: When it came to GPU we did use gpuArray and arrayfun 𝑃(𝑖,𝑡)−𝜇(𝑖,𝑡) 𝑝(𝑖, 𝑡) = (2) GPU functions together with parfor, which works on CPU. 𝜎(𝑖,𝑡) GpuArray creates array on GPU and arrayfun applys function to Value p(i,t) is the normalized price of futures commodity each element of array. This method of using gpuArray with contract i at time t. [30] arrayfun makes actual evaluation of the function happens on the GPU, not on the CPU. Thus, any required data not already on B. Pair Selection the GPU is moved to GPU memory, the MATLAB function One of two main parts of this trading methodology is the passed in for evaluation is compiled for the GPU, and then pairs selection algorithm which is essentially based on executed on the GPU. All the output arguments return as cointegration testing. Cointegration method involves the gpuArray objects. [10][11] following steps: In our experiment we did parallelize pair detection, detecting 1. Identify futures contract pairs that could potentially be buy/sell signals, the trading and profit calculation. It was cointegrated; possible to parallelize these functions because every iteration the strategy has it must perform same calculations. In order not to 2. Once the potential pairs are identified, we need to verify wait for one function to stop we can perform multiple the proposed hypothesis that the futures contract pairs are indeed calculations with multiple functions. cointegrated based on the information from historical data; VI. EXPERIMENTAL RESULTS 3. Examine the cointegrated pairs to determine whether they can be trade on. [33] The overall pair trading strategy performance was measured in the profit it did generate. During the experiment we did not The objective of this phase is to identify the pairs with linear use transactions cost, which was kept zero, and the amount combination exhibiting a significant predictable component that invested in each trade was kept the same, which was 10. The is uncorrelated with underlying movements in the market as a profit/loss was measured in percentage in change of overall whole. With this aim, we first measure the spread of pair prices difference at the end of each trading day. A more detailed for stationarity. In this research, it is done by checking whether information is presented in figure below. the data series are integrated in the same order by using Augmented Dickey Fuller Test (ADF), which is the extended version Dickey Fuller. [12] Having passed the ADF test, cointegration tests are performed on all possible combinations 122 2015-08-26 3187,60 2600,40 5119660 2015-08-27 5004,90 4244,20 7963320 2015-08-28 5287,10 4413,10 7721975 2015-08-31 5409,70 4594,10 8613445 From table 2 it is shown how much time in seconds did algorithm spend on each day trading simulation using different hardware CPU (Intel i5 - 3230M 2,6 GHz,2 cores) and GPU (GeForce 710m, 96 CUDA Cores) and how many records it had to process. The more detailed information is presented in figure below where the speedup difference in percentage is shown. Fig. 2. Strategy performance for each day by the profit it did generate Figure 2 above shows the daily profits from HFT trading algorithm and confirms the results revealed by High Frequency Trading market leader Virtu Financial, Inc, where only one losing trading day out of 1237 days was generated [14]. The chart in Figure 1 illustrates daily results of an algorithm-based on a statistical arbitrage HFT system. The less profitable days occur because of fewer trades, due to less trade signals, rather than fluctuations or a series of unproductive trades. However, Fig. 3. The improvement of the algorithm when using GPU our research aim was not to measure the profit of the strategy but to improve the speed of algorithm by using GPU. The same pair As shown in figure above when pair trading algorithm was trading strategy was applied to CPU and later to CPU working presented to GPU, the speed of simulation did improve together with GPU. In the table below we can see the amount of dramatically varying from 12% to 36% improve in overall records pairs trading algorithm had to process and how much speed. The difference of speed for different days occurs due to time did it take using CPU and GPU. different number of trades made and different number of trade signals. The more parameters are possible to make parallel and TABLE II. CPU AND GPU COMPARISON move to GPU, the bigger speedup is possible to achieve. It is shown that CPU, even with multi-threaded implementation, is Date Intel i5 - 3230M GeForce 710m, 96 Number of 2,6 GHz,2 cores CUDA Cores (in records not a feasible option for large dense matrices. For the GPU (in seconds) seconds) processed implementation, performance impact of the global memory access patterns on the GPU board and the memory coalescing 2015-08-03 2991,80 2081,60 6096505 are emphasized. In our case the bigger the matrix of trades and 2015-08-04 2208,10 1400,50 4579465 pairs the more measurable is the speed up by GPU. The results 2015-08-05 2393,70 1783,10 5793525 show the importance of technical advantages in HFT and how important is to improve the algorithm in order to use the most of 2015-08-06 3040,90 2585,3 5595770 the hardware it is presented to. In our research the possibility to 2015-08-07 2650,10 2027,1 5586360 improve the speed of daily trading with microseconds came, when algorithms calculations were parallelized and presented to 2015-08-10 4410,80 3080,70 5732355 GPU using gpuArrays and arrayfun in MATLAB, that allows to 2015-08-11 4980,30 3154,50 6249980 exploit the GPU at hand. 2015-08-12 2769,20 2151,20 6758875 VII. CONCLUSIONS 2015-08-13 4122,60 3419,00 5666900 Recent technological advances have made trading in the 2015-08-14 1325,90 1055,80 4227335 markets fast and mostly done by computers and algorithms. Instead of humans, computers replicate the role of market 2015-08-17 1550,00 1171,10 4879990 makers, specialists or liquidity providers but at a much higher 2015-08-18 1912,10 1299,50 4364540 rate of speed. The number of derived financial instruments has 2015-08-19 4002,30 3278,70 5666700 caused increased opportunities for profits arising from pricing inefficiencies or price move delays between securities. Trading 2015-08-20 4449,00 3119,43 5411145 algorithms now work not only with CPU, but with GPU. These 2015-08-21 4311,70 3389,10 5946205 factors have been driving forces to test the system based on pair trading in HFT and see how the effectiveness differ when using 2015-08-24 4809,40 4064,00 7710745 different hardware. In this paper, high frequency algorithmic 2015-08-25 3960,20 3466,10 5105175 pairs trading was developed on the market - neutral statistical arbitrage strategy presented by D. Herlemont. Importantly, all 123 five futures commodity contracts, used for the proposed pairs [10] Matlab. (2016), se.mathworks.com. [ONLINE] Available trading strategy, belong to same CME group, which is the at: https://se.mathworks.com/help/distcomp/gpu-computing.html. world's largest options and futures exchange platform. proposed [11] Matlab. (2015), se.mathworks.com. [ONLINE] Available at: https://se.mathworks.com/discovery/matlab-gpu.html. trading strategy used the pairs selection algorithm which [12] Caldeira J. F., Moura G. V. (2013), “Selection of a portfolio of pairs based consisted of the Augmented Dickey Fuller test. If futures on cointegration: A statistical arbitrage strategy”, Revista Brasileira de commodity contracts prices pass the Augmented Dickey Fuller Financas, Vol. 11(1), pp. 49–80. test, cointegration tests are performed on all possible [13] Bogoev D., Karam A. (2016), “An Empirical detection of High Frequency combination of pairs. To test for cointegration Engle and Trading Strategies”, 6th International Conference of the Financial Grangers 2-step approach and Johansen test was adopted. Engineering and Banking Society. June 10-12, 2016 Melaga. Trading strategy was firstly presented to CPU (Intel i5 - 3230M [14] Cifu D. A. (2014), “FORM S-1, Registration Statement Under The 2,6 GHz1 2 cores) and later to GPU (GeForce 710m, 96 CUDS Securities Act Of 1933”, Virtu Financial, Inc. cores). All trading parameters were kept the same during [15] Dickey D., Fuller W. (1979), “Distribution of the Estimator for research. The purpose of this was to measure the effectiveness Autoregressive Time series with a Unit Root”, Journal of the American Statistical Association, Vol. 74, pp. 427-431. of hardware and to check how much higher frequency trading [16] Driaunys K., Masteika S., Sakalauksas V., Vaitonis M. (2014), “An evolution and performance improves when it is presented to algorithm-based statistical arbitrage high frequency trading system to GPU rather than to only CPU. At the end of the research, when forecast prices of natural gas futures”, Transformations in business and all datasets were implemented to the pairs selection algorithm economics. Vol. 13(3), p. 96–109. working with CPU and GPU, the results were gathered. It should [17] Engle, R. F., Granger, C. W. J. (1987), “Co-integration and error be no surprise that when algorithm was presented to GPU it did correction: Representation, estimation, and testing”, Econometrica, Vol. perform more effective. The speed up of daily improvement of 55(2), pp. 251–276. speed did vary from 12% to 36%. The difference of speed for [18] Fox M. B., Glosten L. R., Rauterberg G. V. (2015), “The New Stock Market: Sense and Nonsense” , 65 Duke L.J. 191. different days occurs due to different number of trades made and different number of trade signals. The more parameters are [19] Herlemont D. (2013), “Pairs Trading, Convergence Trading, Cointegration”, Quantitative Finance, Vol. 12(9). possible to make parallel and move to GPU, the bigger speedup [20] Kaya O. (2016), “High – frequency trading. Reaching the limits”, is possible to achieve. The increase could be even more dramatic Automated trader magazine. Vol. 41, p. 23 – 27. if algorithm would be presented to even more financial [21] Kirchner S. (2015), “High frequency trading: Fact and fiction”, Policy: A instruments and more trading signals would be created. Journal of Public Policy and Ideas, Vol. 31(4), pp. 8-20.. [22] Lau C. A., Xie W., Wu Y. (2016), “Multi – Dimensional Pairs Trading ACKNOWLEDGMENT Using Copulas”, European Financial Management Association 2016 We would also like to show our gratitude to the NANOTICK Annual Meetings June 29-July 2, 2016 Basel, Switzerland. for providing high frequency data in microseconds of 5 [23] Madhavaram G. R. (2013), “Statistical Arbitrage Using Pairs Trading commodity futures contracts. With Support Vector Machine Learning”, Saint Mary's University. [24] Masteika S., Vaitonis M. (2015), “Quantitative Research in High REFERENCES Frequency Trading for Natural Gas Futures Market”, Business Information Systems Workshops, Vol. 228, pp. 29-35. [1] Ahmed M., Chai A., Ding X., Jiang Y., Sun Y. (2009), “Statistical [25] Miao G. J. (2014), “High Frequency and Dynamic Pairs Trading Based Arbitrage in High Frequency Trading Based on Limit Order Book on Statistical Arbitrage Using a Two-Stage Correlation and Cointegration Dynamics”. Approach”, International Journal of Economics and Finance, Vol. 6(3), [2] Danelutto M., De Matteis T., Mencagli G., Torquati M. (2015), pp. 96 – 110. “Parallelizing High-Frequency Trading Applications by Using C++11 [26] Miao G. J., Clements M. A. (2002), “Digital Signal Processing and Attributes”, August 2015, IEEE. Statistical Classification”, Artech House, ISBN 1580531350. [3] Mustafa U. Torun, Onur Yılmaz, Ali N. Akansu. (2016), “FPGA, GPU, [27] Miller R. S., Shorter G. (2016), “High Frequency Trading: Overview of and CPU implementations of Jacobi algorithm for eigenanalysis”, Journal Recent Developments”, report, April 4, 2016; Washington D.C of Parallel and Distributed Computing, Vol. 96, pp 172-180. [28] Mushtaq R. (2011), “Augmented Dickey Fuller Test”. Available at SSRN: [4] Kozikowski G., Papamanousakis G., Yang J. (2015), “Potential future https://ssrn.com/abstract=1911068. exposure, modelling and accelerating on GPU and FPGA”, WHPCF 2015 Proceedings of the 8th Workshop on High Performance Computational [29] Ohara M. (2015), “High frequency market microstructure”, Journal of Finance, Article No. 4. Financial Economics, Vol. 116(2), pp. 257–270. [5] Liang Y., Xing, X., Li Y.(2017), “A GPU-based large-scale Monte Carlo [30] Perlin M. S. (2009), “Evaluation of Pairs-trading strategy at the Brazilian simulation method for systems with long-range interactions”, Journal of financial market”, Journal of Derivatives & Hedge Funds, Vol. 15(2), pp. Computational Physics, Vol/ 338, pp. 252-268 . 122–136. [6] Preis T. (2011), “GPU – computing in econophysics and statistical [31] Vaitonis M. (2017), “Pairs Trading Using HFT in OMX Baltic Market”, physics”, The European Physical Journal Special Topics, Vol. 194, pp. Baltic J. Modern Computing, Vol. 5(1), pp. 37-49. 87 – 119. [32] Vaitonis M., Masteika S. (2016), “Research in High Frequency Trading [7] Margara A., Cugola G. (2011), “High performance content-based and Pairs Selection Algorithm with Baltic Region Stocks”, In: Dregvaite matching using GPUs”, Proceedings of the 5th ACM international G., Damasevicius R. Information and Software Technologies. ICIST conference on Distributed event-based system, New York, USA 2016. Communications in Computer and Information Science, Vol 639. Springer. [8] NVIDIA Corporation. (2008) NVIDIA CUDA Compute Unified Device Architecture, [33] Vidyamurthy G. (2004), “Pairs Trading – Quantitative Methods and Analysis, New Jersey”, John Wiley & Sons, Inc., p.210. [9] Napoli C. et al., A cloud-distributed GPU architecture for pattern identification in segmented detectors big-data surveys. The Computer [34] Zubulake P., Lee S. (2011), “The High frequency game changer: how Journal, vol. 59, issue 3 , pp.338-352. automated trading strategies have revolutionized the markets”, Aite group. Wiley trading. 124