=Paper=
{{Paper
|id=Vol-2147/p09
|storemode=property
|title=Algorithmic Trading and Machine Learning Based on GPU
|pdfUrl=https://ceur-ws.org/Vol-2147/p09.pdf
|volume=Vol-2147
|authors=Mantas Vaitonis,Saulius Masteika,Konstantinas Korovkinas
|dblpUrl=https://dblp.org/rec/conf/system/VaitonisMK18
}}
==Algorithmic Trading and Machine Learning Based on GPU==
<pdf width="1500px">https://ceur-ws.org/Vol-2147/p09.pdf</pdf>
<pre>
                           Algorithmic Trading and Machine
                               Learning Based on GPU

          Mantas Vaitonis                                  Saulius Masteika                               Konstantinas Korovkinas
 Vilnius University Kaunas Faculty                Vilnius University Kaunas Faculty                   Vilnius University Kaunas Faculty
         Muitinės street. 8,                              Muitinės street. 8,                                 Muitinės street. 8,
   LT-44280 Kaunas, Lithuania                       LT-44280 Kaunas, Lithuania                          LT-44280 Kaunas, Lithuania
     mantas.vaitonis@knf.vu.lt                       saulius.masteika@knf.vu.lt                       konstantinas.korovkinas@knf.vu.lt


    Abstract— This paper investigates the speed improvements               subsystems, typically endowed with more than 10x higher
available when using a graphics processing unit (GPU) for                  memory bandwidth than a CPU. Peak performance is usually
algorithmic trading and machine learning. A modern GPU allows              impossible to achieve on general purpose applications, yet
hundreds of operations to be performed in parallel, leaving the            capturing even a fraction of peak performance yields significant
CPU free to execute other jobs. Several issues related to
implementing algorithmic trading and machine learning on GPU
                                                                           speedup. GPU performance is dependent on finding high
are discussed, including limited programing flexibility, as well as        degrees of parallelism: a typical computation running on the
the effect that proper memory layout can have on speed increases           GPU must express thousands of threads in order to effectively
when using GPU devices. An empirical research of algorithmic               use the hardware capabilities. Algorithms for machine learning
trading on GPU is presented, which showed the advantage of the             applications will need to consider such parallelism in order to
GPU over CPU system. Moreover the machine learning methods                 utilize many-core processors. Applications which do not
on GPU are presented and the findings of this paper may be                 express parallelism will not continue improving their
applied in future works.                                                   performance when run on newer computing platforms at the
                                                                           rates we have enjoyed in the past. Therefore, finding large scale
   Keywords— high frequency trading; machine learning; GPU;
high performance computing; genetic programming.
                                                                           parallelism is important for compute performance in the future.
                                                                           Programming for GPUs is then indicative of the future many-
                      I.      INTRODUCTION                                 core programming experience [2].
    Nowadays standard computers come with sequential CPUs                      When searching for “GPU back-testing software” almost no
or with multicore CPUs, which allow a limited number of                    results appear. The technology is very difficult to use and
processes to be executed in parallel. What is important here is            implement across a general back-testing.
that this hardware is strongly parallel and may operate
                                                                               The problem is the way in which a GPU works and the way
independent from the main CPU. A modern GPU allows
                                                                           in which general purpose back-testing works. Most of these
hundreds of operations to be performed in parallel, leaving the
                                                                           back-testing programs have a language like MQL4, Ninjascript.
CPU free to execute other jobs. In particular, GPUs offer
                                                                           These languages are used to construct trading systems that the
hundreds of processing cores, but they can be used
                                                                           simulator executes by performing some sort of parsing of the
simultaneously only to perform data parallel computations.
                                                                           scripted code. This approach gives flexibility because
Moreover, GPUs usually have no direct access to the main
                                                                           researchers can code whichever strategy they can think of with
memory and they do not offer hardware managed caches; two
                                                                           whatever logic and the simulator will be able to handle it. The
aspects that make memory management a critical factor to be
                                                                           strategy coded is in essence a function that the simulator then
carefully considered [1].
                                                                           uses to execute code within its back-testing engine. However,
   GPU architectures are specialized for computeintensive,                 when trying to move this type of thinking to the GPU researches
memory-intensive, highly parallel computation, and therefore               go into lots of problems [3].
are designed such that more resources are devoted to data
                                                                               The work reported in this paper aims to present literature
processing than caching or control flow. State of the art GPUs
                                                                           review of GPU benefits on algorithmic trading and machine
provide up to an order of magnitude more peak IEEE single-
                                                                           learning. The overview of the uses of machine learning and
precision floating-point than their CPU counterparts.
                                                                           algorithmic trading on GPU is presented. Both topics are
Additionally, GPUs have much more aggressive memory
                                                                           presented separately and the results will be used for future
                                                                           works in machine learning with high frequency trading on
  Copyright held by the author(s).


                                                                      50
GPU. The paper also presents high frequency algorithmic                  MATLAB with CUDA specific extensions [5]. When a
trading results when applied on CPU and GPU.                             program using CUDA extensions and running on the CPU
                                                                         invokes a GPU kernel, many copies of this kernel – known as
   The rest of the paper is organized as follows: theory and the
                                                                         threads – are enumerated and distributed to the available
problem statement are presented in Sections 1 and 2. Sections
                                                                         multiprocessors, where their execution starts [4].
3, 4, 5 and 6 give an overview of: GPU for hardware
acceleration, high frequency trading, GPU in high performance                The two main criteria for algorithmic trading are speed –
computing and GPU in machine learning The results and the                that is the speed with which the same set of computations can
summary of the research, followed by conclusions in Section 7.           be performed on multiple sets of data – and programmability.
                                                                         For this principle, general-purpose hardware – such as Intel
                 II.    OBSTACLES USING GPU                              Central Processing Unit (CPU) – is not suitable. The CPU is
    GPU is a very limited machine in terms of programming                designed to execute commands in a linear fashion, however, the
flexibility. It is not possible just to code the system within a         task at hand will benefit most from parallelization as the same
script and send it to a GPU back-tester. If researchers want the         calculations are required to be performed on multiple data; this
GPU to perform a trading system simulation they will need to             is where parallelization and hardware acceleration come into
code the entire system and simulator within the same function            play.
and have the GPU run that in a batch process.
                                                                                    IV.      HIGH FREQUENCY TRADING
     Introducing things like double loops and random access
                                                                             The developments in computer technology have changed
patterns is hard for the GPU. When writing simulations for a
                                                                         the way financial instruments are traded. A significant part of
GPU it is necessary to ensure that everything that is random
                                                                         trades is handled without human intervention, where trading
access intensive or conditional intensive is pre-calculated and
                                                                         algorithms make trading decisions. Although the concept of
passed to the GPU. Therefore, something that is “general
                                                                         algorithmic trading is not brand new, the speed in which
purpose” starts to become very hard to pre-calculate and
                                                                         algorithmic trading operates has grown tremendously over the
interactively build the entire simulator-plus-system code to load
                                                                         past ten years.
it into the GPU and perform the simulations. There are many
ways in which GPU technology is currently being used in                      The trade execution time has grown from daily trading to
trading. Traditionally they have been used to execute                    microseconds and even nanoseconds. Due to the increase in
simulations that are very specific and parallelizable – such as          speed, a huge number of orders and order cancellations are
pricing simulations, machine learning training and high                  required. Profit chances for high frequency traders are very
frequency trading algorithms.                                            time-sensitive and low latency for trade execution is of the main
                                                                         importance. Thus, HFT firms invest in high-speed connections
    When looking for something very general the GPU tends to
                                                                         and place their trading platforms close to the stock market
be a hard solution. However, if one is interested in some
                                                                         servers via co-location [6].
particular trading problem then there’s a big chance that
researchers would be able to benefit from it if they are willing              Nowadays, financial markets are fully automated,
to spend the time, energy and money necessary to create a                consisting of algorithmic trading, thus, they are largely
custom GPU implementation [3][4].                                        dominated by high frequency trading. High frequency trading
                                                                         platforms have replaced the traditional auction-like floor where
         III.     GPU FOR HARDWARE ACCELERATION                          traders compete on price [7]. The main focus of HFT is to beat
    Hardware acceleration is achieved by utilizing specific              the time. The algorithm waits till the trader buys a certain
hardware to gain higher computational results than those                 amount of any financial instrument at any given time, then the
provided by general purpose CPU. Most devices intended for               high frequency traders use this information to change the price
intense calculations include Field-Programmable Gate Array               it is quoting in the market [8][9][10][11]. The economics and
(FPGA), IBM’s Cell Broadband Engine Architecture (Cell BE                finance academic community consider HFT as beneficial to the
or, simply, Cell) and Graphics Processing Units (GPUs). Until            market because HFT provides liquidity and, therefore,
recently GPU remained on fringes of HPC (high performance                facilitates the flow of commerce in the capital markets [11].
computing) mostly because of the high learning curve caused
                                                                             Given the fact that high frequency trading has to be done in
by the fact that low-level graphics languages were the only way
                                                                         milliseconds or even nanoseconds, all trading must be
to program the GPUs. However, now NVIDIA has come out
                                                                         performed by using supercomputer. In real life, depending on
with a new line of graphics cards – Tesla [4].
                                                                         the trade, trading opportunities can last from nanoseconds to
    One of NVIDIA GPUs main features is ease of                          minutes or even hours.
programmability made possible with CUDA – Compute
                                                                             Trading strategies, used by high frequency traders, seek for
Unified Device Architecture. With a low learning curve, CUDA
                                                                         the opportunity to exploit short-lived trading in the markets that
allows developers to tap into enormous computing power of
                                                                         would not be possible to find or identify in other way than high-
GPUs yielding high performance benefits [5]. As mentioned in
                                                                         speed processing power of computers. These trading
the introduction, we use the compute unified device architecture
                                                                         opportunities are very small abnormalities in the pricing of
(CUDA), which allows for implementation of algorithms using
                                                                         financial instruments that result in extra low profit per trade.


                                                                    51
High frequency earns higher profit as it is possible to trade in              During the research pair detection, detection of buy/sell
big volumes. Thus, profit can be generated from these small               signals, the trading and profit calculation were parallelized
changes in the prices. One of the advantages of HFT is that it            when implemented on CPU and GPU [8]. When these functions
provides liquidity and helps to ensure the efficiency of prices           were parallelized it was no longer necessary to wait for one
for financial assets [12].                                                function to stop and start the other one. The multiple
                                                                          calculations with multiple functions were possible.
        V.      GPU IN HIGH PERFORMANCE COMPUTING
                                                                              The research aim was not to measure the profit of the
    High-frequency trading (HFT) is a specialized form of
                                                                          strategy but to improve the speed of algorithm by using GPU.
Algorithmic trading, where the execution of computerized
                                                                          The same pair trading strategy was applied to CPU and later to
trading strategies is characterized by extremely short position-
                                                                          CPU working together with GPU. In the table below we can see
holding periods – just a few seconds or even down to
                                                                          the amount of records pairs trading algorithm had to process
milliseconds. The success of an HFT algorithm depends on its
                                                                          and how much time did it take using CPU and GPU.
ability to react to a situation faster than others. This has given
birth to another variant of HFT called Ultra High Frequency               TABLE I. CPU and GPU comparison
Trading (UHFT). Here, the execution of trades happens in sub-
millisecond times. The technology used by UHFT traders is co-                                                       GeForce
                                                                                              Intel i5 -                             Number of
                                                                                                               710m,        96
location of servers with exchange, direct market access, using                Date        3230M 2,6 GHz,2
                                                                                                               CUDA Cores
                                                                                                                                 records
parallel processing on GPUs and using special hardware like                               cores (in seconds)                     processed
                                                                                                               (in seconds)
FPGAs [13].
                                                                              2015-
    Consolidated Tape Association(CTA) oversees the                          08-03 till
                                                                                              74777,4              58378,53         124789970
collection, processing and dissemination of consolidated quote               2015-08-
and trade data at NYSE. Securities Information Processor(SIP),               31
is the technology that enables collecting quote and trade data
from the exchanges, consolidating it, and sending it out as a
continuous stream of best bids and offers (quotes) and last sales
(trades). SIP has to work at enormous speed. On average, NYSE
handles average 2 lakh quotes per second out of which 28000                  Table 1 shows trading time of algorithm using different
per second get converted into trades. The traders talk to the             hardware CPU (Intel i5 - 3230M 2,6 GHz,2 cores) and GPU
exchanges using FIX protocol. FIX stands for Financial                    (GeForce 710m, 96 CUDA Cores). The total number of records
Information eXchange. The standard is managed by a nonprofit              processed was 124789970 for each simulation.
organization called FIX Trading Community. The message                      The more detailed information is presented in figure below
consists of ASCII characters and the format is an extension of            where the speedup difference is presented.
XML, called FIXML. Recently Citibank has announced that it
will provide FIX functionality to NSE in India [13].
     There is increasing use of High Performance Computing
platforms like GPU multiprocessing and FPGA. D.HFT
algorithms. They are fast and parallelizable. They are
specifically designed to make money by exploiting tiny,
lightningfast price changes in shares[13][14].


A. GPU in high frequency trading
   During our research algorithmic trading strategy [8] was               Fig. 1.    Comparison of CPU and GPU using HFT in seconds
used on CPU Intel i5 - 3230M 2,6 GHz with two cores (2                        As shown in figure above the pair trading algorithm speed
MATLAB worker) and GPU GeForce 710M with 96 CUDA                          of simulation did improve varying from 12% to 36% when used
cores. Firstly we applied the pair trading strategy only on CPU           on GPU instead of just CPU. The difference of speed for
and then on CPU together working with GPU.                                different days occurs due to different number of trades made
    The nanosecond data used for experiment was provided by               and different number of trade signals. The more parameters are
Nanotick company. Futures contracts were from ME group                    possible to make parallel and move to GPU, the bigger speedup
which consists of NYMEX, COMEX and CBOT. Nanotick                         is possible to achieve. During this experiment bigger the matrix
provided five different futures commodity contracts: NG                   of trades and pairs were used the more measurable was the
(natural gas), BZ (Brent crude oil), CL (crude oil), HO (NY               speed up by GPU. The results show the importance of technical
Harbor ULSD) , RB (RBOB Gasoline). Time period of                         advantages in HFT and how important is to improve the
commodity futures contracts was from 01-08-2015 to 31-08-                 algorithm in order to use the most of the hardware it is presented
2015.                                                                     to.


                                                                     52
B. Stock trading using genetic programming on GPU                        well suited to be implemented on graphic processors. Raina et
    D. McKenney and T. White [14] did present their research             al. in [21] developed general principles for massively
on stock trading using genetic programing on GPU. Within this            parallelizing unsupervised learning tasks using graphics
work, genetic programming was used in an attempt to solve the            processors and shown that these principles can be applied to
real-world problem of stock trading strategy generation. A GPU           successfully scaling up learning algorithms for both deep belief
device was used to evaluate individuals within the GP                    networks (DBNs) and sparse coding. Their implementation of
population through stack-based interpretation (due to the lack           DBN learning is up to 70 times faster than a dual-core CPU
of recursion support on many GPU devices). With a small                  implementation for large models. Dean et al. in [22] presented
amount of memory access optimization, a speedup factor of                that training large deep learning models with billions of
over 600 was reached when compared to a sequential evaluation            parameters using 16000 CPU cores could dramatically improve
of the same data running on a 2.4Ghz CPU. The effect of                  training performance. Krizhevsky et al. in [29] showed that
increasing the size of the training set (through the addition of         training a large deep convolutional network with 60 million
more stocks and longer training periods) was also investigated.          parameters and 650,000 neurons on a large data set was in great
It was found that using small training sets resulted in the worst        performance based on GPU processors [16]. Coates et al. in
testing results. Furthermore, the best test results were found           [24] presented their own system based on Commodity Off-The-
when using the largest training sets. These results supported the        Shelf High Performance Computing (COTS HPC) technology:
hypothesis that analyzing more stocks over a longer period of            a cluster of GPU servers with Infiniband interconnects and
time can generate a more general and effective stock trading             MPI. Their system is able to train 1 billion parameter networks
strategy. The speedup gained using GPU devices for evaluation            on just 3 machines in a couple of days, and they showed that it
enable this large training set to be evaluated quickly, while a          can scale to networks with over 11 billion parameters using just
sequential implementation would make this approach                       16 machines. They have shown that can comfortably train
unfeasible. Finally, several areas of improvement for both GP            networks with well over 11 billion parameters—more than 6.5
on GPU and stock trading strategy creation using GP were                 times as large as the one reported in [22] (the largest previous
identified. Continuing work and addressing these possible areas          network), and using fewer than 2% as many machines. Chen et
of improvement may result in faster evaluation of individuals,           al. in [25] implemented a variant of the deep belief network
as well as a much more profitable trading solution [14].                 (DBNs), called folded-DBN, on NVIDA’s Tesla K20 GPU.
                                                                         Results showed, that comparing execution time of the fine-
             VI.       GPU IN MACHINE LEARNING                           tuning process, the GPU implementation results 7 to 11 times
    The use of GPUs in machine learning is widely used in                speedup over the CPU platform.
recent years. The most promising machine learning algorithm                  Others authors in their researches also approved that
is SVM, that can be conveniently adapted to parallel                     proposed models on GPU achieved the better results. Hung and
architectures. During the last decade, many works have been              Wang in [26] proposed a GPU-accelerated PSO (GPSO)
done for accelerating the time-consuming training phase in               algorithm that uses the NVIDIA Tesla C1060 GPU to improve
SVM on many-core GPUs. Catanzaro et al. in [2] first proposed            the timing efficiency of PSO. Numerical results showed that the
the GPUSVM for binary classification problem and achieved                GPU architecture fits the PSO framework well by reducing
speedup of 9-35× over LIBSVM running on a traditional                    computational timing, achieving high parallel efficiency and
processor. Later Herrero-Lopez et al. in [18] improved                   finding better optimal solutions by using a large number of
Catanzaro’s work by adding the support for Multiclass                    particles. Cai et al. in [27] proposed approach to forecast large
classification. They achieved the speedups in the range of 3-57x         scale conditional volatility and covariance using neural network
for training and 3-112x for classification. Carpenter in [19]            on GPU. Tran and Cambria in [28] developed an ensemble
presented cuSVM, a software package for high-speed Support               application of extreme learning machine (ELM) and GPU for
Vector Machine (SVM) training and prediction that exploits the           real-time multimodal sentiment analysis that leverages on the
massively parallel processing power of Graphics Processors               power of sentic memes (basic inputs of sentiments that can
(GPUs). Other authors in papers [15][17][23] also reported that          generate most human emotions). Their proposed multimodal
GPU optimization of SVM achieves better performance to                   system is shown to achieve an accuracy of 78%. In term of
compare with CPU. Vaněk et al. in [20]. introduced a novel               processing speed, their method shows improvements of several
GPU approach of the support vector machine training:                     orders of magnitude for feature extraction compared to CPU-
Optimized Hierarchical Decomposition SVM (OHD-SVM). It                   based counterparts.
uses a hierarchical decomposition iterative algorithm that
allows using matrix-matrix multiplication to calculate the                                    VII.    CONCLUSIONS
kernel matrix values. They declared that algorithm is                       In this article we have presented both the opportunities and
significantly faster than all other implementations for all              challenges of the algorithmic trading and machine learning
datasets. The biggest difference was on the largest datasets             approach on GPU. The empirical study of algorithmic trading
where they achieved speed-up up to 12 times in comparison                on GPU was presented, which proved the advantage of GPU
with the fastest already published GPU implementation.                   versus CPU.
    Another challenging research area is Deep Learning, which
largely involve simple matrix manipulations and are therefore


                                                                    53
   High frequency trading and machine learning is new and                             [17] Sopyła K., Drozda P., Górecki P. (2012), “SVM with CUDA accelerated
                                                                                           kernels for big sparse problems”. In International Conference on Artificial
growing phenomenon. It provides interesting research                                       Intelligence and Soft Computing, pp. 439-447.
opportunities in Financial management, market dynamics,
                                                                                      [18] Herrero-Lopez S., Williams J. R., Sanchez A. (2010), “Parallel multiclass
FPGA hardware, multicomputing on platforms like CUDA.                                      classification using SVMs on GPUs”. In Proceedings of the 3rd Workshop
                                                                                           on general-purpose computation on graphics processing units, pp. 2-11.
    Review of works in the area of machine learning based on
                                                                                      [19] Carpenter A. U. S. T. I. N. (2009), “CUSVM: A CUDA implementation
GPU is also presented in this paper and led to the conclusion                              of support vector classification and regression”. patternsonscreen.
that this technique is very promising in classification,                                   net/cuSVMDesc. pdf, pp. 1-9.
forecasting tasks and could be used in big data areas. The                            [20] Vaněk J., Michálek J., Psutka J. (2017), “A GPU-Architecture Optimized
systems implemented on GPU is able to process a huge volume                                Hierarchical Decomposition Algorithm for Support Vector Machine
of parameters faster than CPU. The findings of this paper may                              Training”. IEEE Transactions on Parallel and Distributed Systems,
                                                                                           28(12), pp. 3330-3343.
be applied in the future works.
                                                                                      [21] Raina R., Madhavan A., Ng, A. Y. (2009), “Large-scale deep
                          ACKNOWLEDGMENT                                                   unsupervised learning using graphics processors”. In Proceedings of the
                                                                                           26th annual international conference on machine learning, pp. 873-880.
   We would also like to show our gratitude to the                                    [22] Dean J., Corrado G., Monga R., Chen K., Devin M., Mao M., ... , Ng A.
NANOTICK for providing high frequency data in                                              Y. (2012), “Large scale distributed deep networks”. In Advances in neural
microseconds of 5 commodity futures contracts.                                             information processing systems, pp. 1223-1231.
                                                                                      [23] Li Q., Salman R., Kecman V. (2010), “An intelligent system for
                               REFERENCES                                                  accelerating parallel SVM classification problems on large datasets using
                                                                                           GPU”. In Intelligent Systems Design and Applications (ISDA), 2010 10th
[1]  Margara A., Cugola G. (2011), “High performance content-based                         International Conference on, pp. 1131-1135.
     matching using GPUs”, Proceedings of the 5th ACM international
     conference on Distributed event-based system, New York, USA                      [24] Coates A., Huval B., Wang T., Wu D., Catanzaro B., Andrew N. (2013),
                                                                                           “Deep learning with COTS HPC systems”. In International Conference
[2] Catanzaro, B., Sundaram, N., & Keutzer, K. (2008, July). Fast support                  on Machine Learning, pp. 1337-1345.
     vector machine training and classification on graphics processors. In
     Proceedings of the 25th international conference on Machine learning (pp.        [25] Chen Z., Wang J., He H., Huang X. (2014), “A fast deep learning system
     104-111). ACM.                                                                        using GPU”. In Circuits and Systems (ISCAS), 2014 IEEE International
                                                                                           Symposium on, pp. 1552-1555.
[3] MechanicalForex. (2016), mechanicalforex.com. [ONLINE] Available
     at:     http://mechanicalforex.com/2016/02/trading-and-the-gpu-wasted-           [26] Hung Y., Wang W. (2012), “Accelerating parallel particle swarm
     power.html. [Accessed 12 January 2018].                                               optimization via GPU”. Optimization Methods and Software, 27(1), pp.
                                                                                           33-51.
[4] Preis T. (2011), “GPU – computing in econophysics and statistical
     physics”, The European Physical Journal Special Topics, Vol. 194, pp. 87         [27] Cai X., Lai G. Lin X. (2013), “Forecasting large scale conditional
     – 119.                                                                                volatility and covariance using neural network on GPU”. The Journal of
                                                                                           Supercomputing, 63(2), pp.490-507.
[5] [8] NVIDIA Corporation. (2008) NVIDIA CUDA Compute Unified
     Device Architecture.                                                             [28] Tran H. N., Cambria E. (2018), “Ensemble application of ELM and GPU
                                                                                           for real-time multimodal sentiment analysis”. Memetic Computing, 10(1),
[6] Kaya O. (2016), “High – frequency trading. Reaching the limits”,                       pp. 3-13.
     Automated trader magazine. Vol. 41, p. 23 – 27.
                                                                                      [29] Krizhevsky A., Sutskever I., Hinton G. E. (2012), “Imagenet
[7] Fox M. B., Glosten L. R., Rauterberg G. V. (2015), “The New Stock                      classification with deep convolutional neural networks”. In Advances in
     Market: Sense and Nonsense” , 65 Duke L.J. 191.                                       neural information processing systems (pp. 1097-1105).
[8] Herlemont D. (2013), “Pairs Trading, Convergence Trading,                         [30] Bonanno, F., Capizzi, G., Sciuto, G. L., Napoli, C., Pappalardo, G., &
     Cointegration”, Quantitative Finance, Vol. 12(9).                                     Tramontana, E. (2014). A novel cloud-distributed toolbox for optimal
[9] Zubulake P., Lee S. (2011), “The High frequency game changer: how                      energy dispatch management from renewables in igss by using wrnn
     automated trading strategies have revolutionized the markets”, Aite                   predictors and gpu parallel solutions. In International Symposium on
     group. Wiley trading.                                                                 Power Electronics, Electrical Drives, Automation and Motion
[10] Brogaard J., Hendershott J. T., Riordan R. (2013), “High frequency                    (SPEEDAM), (pp. 1077-1084.
     trading and price discovery”, ECB Lamfalussy fellowship programme/               [31] Napoli, C., Pappalardo, G., Tramontana, E., & Zappalà, G. (2014). A
     Working paper series, No 1602, European central bank Press.                           cloud-distributed GPU architecture for pattern identification in segmented
[11] Jaramillo C. (2016), “The Revolt against High-Frequency Trading: From                 detectors big-data surveys. The Computer Journal, 59(3), 338-352.
     Flash Boys, to Class Actions, to IEX”, Review of banking & financial
     law, Vol. 35, pp. 483 – 499.
[12] Kirchner S. (2015), “High frequency trading: Fact and fiction”, Policy: A
     Journal of Public Policy and Ideas, Vol. 31(4), pp. 8-20.
[13] Limaye S. S. (2014),” Electronically aided High frequency trading”,
     International Journal of Engineering Research and Applications, pp. 14 –
     18.
[14] McKenny D., White T. (2012), “Stock Trading Strategy Creation Using
     GP on GPU”, Soft Computing, Vol 16(2), pp. 247 – 259.
[15] Salleh N. S. M., Baharim M. F. (2015), “Performance Comparison of
     Parallel Execution Using GPU and CPU in SVM Training Session”. In
     Advanced Computer Science Applications and Technologies (ACSAT),
     2015 4th International Conference on, pp. 214-217.
[16] Li X., Li K., Zhang G., Zheng W. (2015), “Deep Learning and Its
     Parallelization: Concepts and Instances”.


                                                                                 54

</pre>