<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning Based on GPU</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mantas Vaitonis</string-name>
          <email>mantas.vaitonis@knf.vu.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saulius Masteika</string-name>
          <email>saulius.masteika@knf.vu.lt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantinas Korovkinas</string-name>
          <email>konstantinas.korovkinas@knf.vu.lt</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vilnius University Kaunas Faculty</institution>
          ,
          <addr-line>Muitinės street. 8, LT-44280 Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vilnius University Kaunas Faculty</institution>
          ,
          <addr-line>Muitinės street. 8, LT-44280 Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Vilnius University Kaunas Faculty</institution>
          ,
          <addr-line>Muitinės street. 8, LT-44280 Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
      </contrib-group>
      <fpage>50</fpage>
      <lpage>54</lpage>
      <abstract>
        <p>- This paper investigates the speed improvements available when using a graphics processing unit (GPU) for algorithmic trading and machine learning. A modern GPU allows hundreds of operations to be performed in parallel, leaving the CPU free to execute other jobs. Several issues related to implementing algorithmic trading and machine learning on GPU are discussed, including limited programing flexibility, as well as the effect that proper memory layout can have on speed increases when using GPU devices. An empirical research of algorithmic trading on GPU is presented, which showed the advantage of the GPU over CPU system. Moreover the machine learning methods on GPU are presented and the findings of this paper may be applied in future works.</p>
      </abstract>
      <kwd-group>
        <kwd>high frequency trading</kwd>
        <kwd>machine learning</kwd>
        <kwd>GPU</kwd>
        <kwd>high performance computing</kwd>
        <kwd>genetic programming</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>INTRODUCTION</p>
      <p>Nowadays standard computers come with sequential CPUs
or with multicore CPUs, which allow a limited number of
processes to be executed in parallel. What is important here is
that this hardware is strongly parallel and may operate
independent from the main CPU. A modern GPU allows
hundreds of operations to be performed in parallel, leaving the
CPU free to execute other jobs. In particular, GPUs offer
hundreds of processing cores, but they can be used
simultaneously only to perform data parallel computations.
Moreover, GPUs usually have no direct access to the main
memory and they do not offer hardware managed caches; two
aspects that make memory management a critical factor to be
carefully considered [1].</p>
      <p>GPU architectures are specialized for computeintensive,
memory-intensive, highly parallel computation, and therefore
are designed such that more resources are devoted to data
processing than caching or control flow. State of the art GPUs
provide up to an order of magnitude more peak IEEE
singleprecision floating-point than their CPU counterparts.
Additionally, GPUs have much more aggressive memory
Copyright held by the author(s).
subsystems, typically endowed with more than 10x higher
memory bandwidth than a CPU. Peak performance is usually
impossible to achieve on general purpose applications, yet
capturing even a fraction of peak performance yields significant
speedup. GPU performance is dependent on finding high
degrees of parallelism: a typical computation running on the
GPU must express thousands of threads in order to effectively
use the hardware capabilities. Algorithms for machine learning
applications will need to consider such parallelism in order to
utilize many-core processors. Applications which do not
express parallelism will not continue improving their
performance when run on newer computing platforms at the
rates we have enjoyed in the past. Therefore, finding large scale
parallelism is important for compute performance in the future.
Programming for GPUs is then indicative of the future
manycore programming experience [2].</p>
      <p>When searching for “GPU back-testing software” almost no
results appear. The technology is very difficult to use and
implement across a general back-testing.</p>
      <p>The problem is the way in which a GPU works and the way
in which general purpose back-testing works. Most of these
back-testing programs have a language like MQL4, Ninjascript.
These languages are used to construct trading systems that the
simulator executes by performing some sort of parsing of the
scripted code. This approach gives flexibility because
researchers can code whichever strategy they can think of with
whatever logic and the simulator will be able to handle it. The
strategy coded is in essence a function that the simulator then
uses to execute code within its back-testing engine. However,
when trying to move this type of thinking to the GPU researches
go into lots of problems [3].</p>
      <p>The work reported in this paper aims to present literature
review of GPU benefits on algorithmic trading and machine
learning. The overview of the uses of machine learning and
algorithmic trading on GPU is presented. Both topics are
presented separately and the results will be used for future
works in machine learning with high frequency trading on
GPU. The paper also presents high frequency algorithmic
trading results when applied on CPU and GPU.</p>
      <p>The rest of the paper is organized as follows: theory and the
problem statement are presented in Sections 1 and 2. Sections
3, 4, 5 and 6 give an overview of: GPU for hardware
acceleration, high frequency trading, GPU in high performance
computing and GPU in machine learning The results and the
summary of the research, followed by conclusions in Section 7.</p>
      <p>II.</p>
      <p>OBSTACLES USING GPU</p>
      <p>GPU is a very limited machine in terms of programming
flexibility. It is not possible just to code the system within a
script and send it to a GPU back-tester. If researchers want the
GPU to perform a trading system simulation they will need to
code the entire system and simulator within the same function
and have the GPU run that in a batch process.</p>
      <p>Introducing things like double loops and random access
patterns is hard for the GPU. When writing simulations for a
GPU it is necessary to ensure that everything that is random
access intensive or conditional intensive is pre-calculated and
passed to the GPU. Therefore, something that is “general
purpose” starts to become very hard to pre-calculate and
interactively build the entire simulator-plus-system code to load
it into the GPU and perform the simulations. There are many
ways in which GPU technology is currently being used in
trading. Traditionally they have been used to execute
simulations that are very specific and parallelizable – such as
pricing simulations, machine learning training and high
frequency trading algorithms.</p>
      <p>
        When looking for something very general the GPU tends to
be a hard solution. However, if one is interested in some
particular trading problem then there’s a big chance that
researchers would be able to benefit from it if they are willing
to spend the time, energy and money necessary to create a
custom GPU implementation [3][
        <xref ref-type="bibr" rid="ref3">4</xref>
        ].
      </p>
      <p>III.</p>
      <p>GPU FOR HARDWARE ACCELERATION</p>
      <p>
        Hardware acceleration is achieved by utilizing specific
hardware to gain higher computational results than those
provided by general purpose CPU. Most devices intended for
intense calculations include Field-Programmable Gate Array
(FPGA), IBM’s Cell Broadband Engine Architecture (Cell BE
or, simply, Cell) and Graphics Processing Units (GPUs). Until
recently GPU remained on fringes of HPC (high performance
computing) mostly because of the high learning curve caused
by the fact that low-level graphics languages were the only way
to program the GPUs. However, now NVIDIA has come out
with a new line of graphics cards – Tesla [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ].
      </p>
      <p>One of NVIDIA GPUs main features is ease of
programmability made possible with CUDA – Compute
Unified Device Architecture. With a low learning curve, CUDA
allows developers to tap into enormous computing power of
GPUs yielding high performance benefits [5]. As mentioned in
the introduction, we use the compute unified device architecture
(CUDA), which allows for implementation of algorithms using</p>
      <p>
        MATLAB with CUDA specific extensions [5]. When a
program using CUDA extensions and running on the CPU
invokes a GPU kernel, many copies of this kernel – known as
threads – are enumerated and distributed to the available
multiprocessors, where their execution starts [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ].
      </p>
      <p>The two main criteria for algorithmic trading are speed –
that is the speed with which the same set of computations can
be performed on multiple sets of data – and programmability.
For this principle, general-purpose hardware – such as Intel
Central Processing Unit (CPU) – is not suitable. The CPU is
designed to execute commands in a linear fashion, however, the
task at hand will benefit most from parallelization as the same
calculations are required to be performed on multiple data; this
is where parallelization and hardware acceleration come into
play.</p>
      <p>IV.</p>
    </sec>
    <sec id="sec-2">
      <title>HIGH FREQUENCY TRADING</title>
      <p>The developments in computer technology have changed
the way financial instruments are traded. A significant part of
trades is handled without human intervention, where trading
algorithms make trading decisions. Although the concept of
algorithmic trading is not brand new, the speed in which
algorithmic trading operates has grown tremendously over the
past ten years.</p>
      <p>
        The trade execution time has grown from daily trading to
microseconds and even nanoseconds. Due to the increase in
speed, a huge number of orders and order cancellations are
required. Profit chances for high frequency traders are very
time-sensitive and low latency for trade execution is of the main
importance. Thus, HFT firms invest in high-speed connections
and place their trading platforms close to the stock market
servers via co-location [
        <xref ref-type="bibr" rid="ref4">6</xref>
        ].
      </p>
      <p>
        Nowadays, financial markets are fully automated,
consisting of algorithmic trading, thus, they are largely
dominated by high frequency trading. High frequency trading
platforms have replaced the traditional auction-like floor where
traders compete on price [
        <xref ref-type="bibr" rid="ref5">7</xref>
        ]. The main focus of HFT is to beat
the time. The algorithm waits till the trader buys a certain
amount of any financial instrument at any given time, then the
high frequency traders use this information to change the price
it is quoting in the market [
        <xref ref-type="bibr" rid="ref6">8</xref>
        ][
        <xref ref-type="bibr" rid="ref7">9</xref>
        ][
        <xref ref-type="bibr" rid="ref8">10</xref>
        ][
        <xref ref-type="bibr" rid="ref9">11</xref>
        ]. The economics and
finance academic community consider HFT as beneficial to the
market because HFT provides liquidity and, therefore,
facilitates the flow of commerce in the capital markets [
        <xref ref-type="bibr" rid="ref9">11</xref>
        ].
      </p>
      <p>Given the fact that high frequency trading has to be done in
milliseconds or even nanoseconds, all trading must be
performed by using supercomputer. In real life, depending on
the trade, trading opportunities can last from nanoseconds to
minutes or even hours.</p>
      <p>
        Trading strategies, used by high frequency traders, seek for
the opportunity to exploit short-lived trading in the markets that
would not be possible to find or identify in other way than
highspeed processing power of computers. These trading
opportunities are very small abnormalities in the pricing of
financial instruments that result in extra low profit per trade.
High frequency earns higher profit as it is possible to trade in
big volumes. Thus, profit can be generated from these small
changes in the prices. One of the advantages of HFT is that it
provides liquidity and helps to ensure the efficiency of prices
for financial assets [
        <xref ref-type="bibr" rid="ref10">12</xref>
        ].
      </p>
      <p>GPU IN HIGH PERFORMANCE COMPUTING</p>
      <p>
        High-frequency trading (HFT) is a specialized form of
Algorithmic trading, where the execution of computerized
trading strategies is characterized by extremely short
positionholding periods – just a few seconds or even down to
milliseconds. The success of an HFT algorithm depends on its
ability to react to a situation faster than others. This has given
birth to another variant of HFT called Ultra High Frequency
Trading (UHFT). Here, the execution of trades happens in
submillisecond times. The technology used by UHFT traders is
colocation of servers with exchange, direct market access, using
parallel processing on GPUs and using special hardware like
FPGAs [
        <xref ref-type="bibr" rid="ref11">13</xref>
        ].
      </p>
      <p>
        Consolidated Tape Association(CTA) oversees the
collection, processing and dissemination of consolidated quote
and trade data at NYSE. Securities Information Processor(SIP),
is the technology that enables collecting quote and trade data
from the exchanges, consolidating it, and sending it out as a
continuous stream of best bids and offers (quotes) and last sales
(trades). SIP has to work at enormous speed. On average, NYSE
handles average 2 lakh quotes per second out of which 28000
per second get converted into trades. The traders talk to the
exchanges using FIX protocol. FIX stands for Financial
Information eXchange. The standard is managed by a nonprofit
organization called FIX Trading Community. The message
consists of ASCII characters and the format is an extension of
XML, called FIXML. Recently Citibank has announced that it
will provide FIX functionality to NSE in India [
        <xref ref-type="bibr" rid="ref11">13</xref>
        ].
      </p>
      <p>
        There is increasing use of High Performance Computing
platforms like GPU multiprocessing and FPGA. D.HFT
algorithms. They are fast and parallelizable. They are
specifically designed to make money by exploiting tiny,
lightningfast price changes in shares[
        <xref ref-type="bibr" rid="ref11">13</xref>
        ][
        <xref ref-type="bibr" rid="ref12">14</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>A. GPU in high frequency trading</title>
        <p>
          During our research algorithmic trading strategy [
          <xref ref-type="bibr" rid="ref6">8</xref>
          ] was
used on CPU Intel i5 - 3230M 2,6 GHz with two cores (2
MATLAB worker) and GPU GeForce 710M with 96 CUDA
cores. Firstly we applied the pair trading strategy only on CPU
and then on CPU together working with GPU.
        </p>
        <p>The nanosecond data used for experiment was provided by
Nanotick company. Futures contracts were from ME group
which consists of NYMEX, COMEX and CBOT. Nanotick
provided five different futures commodity contracts: NG
(natural gas), BZ (Brent crude oil), CL (crude oil), HO (NY
Harbor ULSD) , RB (RBOB Gasoline). Time period of
commodity futures contracts was from 01-08-2015 to
31-082015.</p>
        <p>
          During the research pair detection, detection of buy/sell
signals, the trading and profit calculation were parallelized
when implemented on CPU and GPU [
          <xref ref-type="bibr" rid="ref6">8</xref>
          ]. When these functions
were parallelized it was no longer necessary to wait for one
function to stop and start the other one. The multiple
calculations with multiple functions were possible.
        </p>
        <p>The research aim was not to measure the profit of the
strategy but to improve the speed of algorithm by using GPU.
The same pair trading strategy was applied to CPU and later to
CPU working together with GPU. In the table below we can see
the amount of records pairs trading algorithm had to process
and how much time did it take using CPU and GPU.</p>
        <p>The more detailed information is presented in figure below
where the speedup difference is presented.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Comparison of CPU and GPU using HFT in seconds</title>
      <p>As shown in figure above the pair trading algorithm speed
of simulation did improve varying from 12% to 36% when used
on GPU instead of just CPU. The difference of speed for
different days occurs due to different number of trades made
and different number of trade signals. The more parameters are
possible to make parallel and move to GPU, the bigger speedup
is possible to achieve. During this experiment bigger the matrix
of trades and pairs were used the more measurable was the
speed up by GPU. The results show the importance of technical
advantages in HFT and how important is to improve the
algorithm in order to use the most of the hardware it is presented
to.</p>
      <sec id="sec-3-1">
        <title>B. Stock trading using genetic programming on GPU</title>
        <p>
          D. McKenney and T. White [
          <xref ref-type="bibr" rid="ref12">14</xref>
          ] did present their research
on stock trading using genetic programing on GPU. Within this
work, genetic programming was used in an attempt to solve the
real-world problem of stock trading strategy generation. A GPU
device was used to evaluate individuals within the GP
population through stack-based interpretation (due to the lack
of recursion support on many GPU devices). With a small
amount of memory access optimization, a speedup factor of
over 600 was reached when compared to a sequential evaluation
of the same data running on a 2.4Ghz CPU. The effect of
increasing the size of the training set (through the addition of
more stocks and longer training periods) was also investigated.
It was found that using small training sets resulted in the worst
testing results. Furthermore, the best test results were found
when using the largest training sets. These results supported the
hypothesis that analyzing more stocks over a longer period of
time can generate a more general and effective stock trading
strategy. The speedup gained using GPU devices for evaluation
enable this large training set to be evaluated quickly, while a
sequential implementation would make this approach
unfeasible. Finally, several areas of improvement for both GP
on GPU and stock trading strategy creation using GP were
identified. Continuing work and addressing these possible areas
of improvement may result in faster evaluation of individuals,
as well as a much more profitable trading solution [
          <xref ref-type="bibr" rid="ref12">14</xref>
          ].
        </p>
        <p>VI.</p>
        <p>GPU IN MACHINE LEARNING</p>
        <p>
          The use of GPUs in machine learning is widely used in
recent years. The most promising machine learning algorithm
is SVM, that can be conveniently adapted to parallel
architectures. During the last decade, many works have been
done for accelerating the time-consuming training phase in
SVM on many-core GPUs. Catanzaro et al. in [2] first proposed
the GPUSVM for binary classification problem and achieved
speedup of 9-35× over LIBSVM running on a traditional
processor. Later Herrero-Lopez et al. in [
          <xref ref-type="bibr" rid="ref16">18</xref>
          ] improved
Catanzaro’s work by adding the support for Multiclass
classification. They achieved the speedups in the range of 3-57x
for training and 3-112x for classification. Carpenter in [
          <xref ref-type="bibr" rid="ref17">19</xref>
          ]
presented cuSVM, a software package for high-speed Support
Vector Machine (SVM) training and prediction that exploits the
massively parallel processing power of Graphics Processors
(GPUs). Other authors in papers [
          <xref ref-type="bibr" rid="ref13">15</xref>
          ][
          <xref ref-type="bibr" rid="ref15">17</xref>
          ][
          <xref ref-type="bibr" rid="ref21">23</xref>
          ] also reported that
GPU optimization of SVM achieves better performance to
compare with CPU. Vaněk et al. in [
          <xref ref-type="bibr" rid="ref18">20</xref>
          ]. introduced a novel
GPU approach of the support vector machine training:
Optimized Hierarchical Decomposition SVM (OHD-SVM). It
uses a hierarchical decomposition iterative algorithm that
allows using matrix-matrix multiplication to calculate the
kernel matrix values. They declared that algorithm is
significantly faster than all other implementations for all
datasets. The biggest difference was on the largest datasets
where they achieved speed-up up to 12 times in comparison
with the fastest already published GPU implementation.
        </p>
        <p>
          Another challenging research area is Deep Learning, which
largely involve simple matrix manipulations and are therefore
well suited to be implemented on graphic processors. Raina et
al. in [
          <xref ref-type="bibr" rid="ref19">21</xref>
          ] developed general principles for massively
parallelizing unsupervised learning tasks using graphics
processors and shown that these principles can be applied to
successfully scaling up learning algorithms for both deep belief
networks (DBNs) and sparse coding. Their implementation of
DBN learning is up to 70 times faster than a dual-core CPU
implementation for large models. Dean et al. in [
          <xref ref-type="bibr" rid="ref20">22</xref>
          ] presented
that training large deep learning models with billions of
parameters using 16000 CPU cores could dramatically improve
training performance. Krizhevsky et al. in [
          <xref ref-type="bibr" rid="ref27">29</xref>
          ] showed that
training a large deep convolutional network with 60 million
parameters and 650,000 neurons on a large data set was in great
performance based on GPU processors [
          <xref ref-type="bibr" rid="ref14">16</xref>
          ]. Coates et al. in
[
          <xref ref-type="bibr" rid="ref22">24</xref>
          ] presented their own system based on Commodity
Off-TheShelf High Performance Computing (COTS HPC) technology:
a cluster of GPU servers with Infiniband interconnects and
MPI. Their system is able to train 1 billion parameter networks
on just 3 machines in a couple of days, and they showed that it
can scale to networks with over 11 billion parameters using just
16 machines. They have shown that can comfortably train
networks with well over 11 billion parameters—more than 6.5
times as large as the one reported in [
          <xref ref-type="bibr" rid="ref20">22</xref>
          ] (the largest previous
network), and using fewer than 2% as many machines. Chen et
al. in [
          <xref ref-type="bibr" rid="ref23">25</xref>
          ] implemented a variant of the deep belief network
(DBNs), called folded-DBN, on NVIDA’s Tesla K20 GPU.
Results showed, that comparing execution time of the
finetuning process, the GPU implementation results 7 to 11 times
speedup over the CPU platform.
        </p>
        <p>
          Others authors in their researches also approved that
proposed models on GPU achieved the better results. Hung and
Wang in [
          <xref ref-type="bibr" rid="ref24">26</xref>
          ] proposed a GPU-accelerated PSO (GPSO)
algorithm that uses the NVIDIA Tesla C1060 GPU to improve
the timing efficiency of PSO. Numerical results showed that the
GPU architecture fits the PSO framework well by reducing
computational timing, achieving high parallel efficiency and
finding better optimal solutions by using a large number of
particles. Cai et al. in [
          <xref ref-type="bibr" rid="ref25">27</xref>
          ] proposed approach to forecast large
scale conditional volatility and covariance using neural network
on GPU. Tran and Cambria in [
          <xref ref-type="bibr" rid="ref26">28</xref>
          ] developed an ensemble
application of extreme learning machine (ELM) and GPU for
real-time multimodal sentiment analysis that leverages on the
power of sentic memes (basic inputs of sentiments that can
generate most human emotions). Their proposed multimodal
system is shown to achieve an accuracy of 78%. In term of
processing speed, their method shows improvements of several
orders of magnitude for feature extraction compared to
CPUbased counterparts.
        </p>
        <p>VII.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSIONS</title>
      <p>In this article we have presented both the opportunities and
challenges of the algorithmic trading and machine learning
approach on GPU. The empirical study of algorithmic trading
on GPU was presented, which proved the advantage of GPU
versus CPU.</p>
      <p>High frequency trading and machine learning is new and
growing phenomenon. It provides interesting research
opportunities in Financial management, market dynamics,
FPGA hardware, multicomputing on platforms like CUDA.</p>
      <p>Review of works in the area of machine learning based on
GPU is also presented in this paper and led to the conclusion
that this technique is very promising in classification,
forecasting tasks and could be used in big data areas. The
systems implemented on GPU is able to process a huge volume
of parameters faster than CPU. The findings of this paper may
be applied in the future works.</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENT</title>
      <p>
        We would also like to show our gratitude to the
NANOTICK for providing high frequency data in
microseconds of 5 commodity futures contracts.
[5] [
        <xref ref-type="bibr" rid="ref6">8</xref>
        ] NVIDIA Corporation. (2008) NVIDIA CUDA Compute Unified
      </p>
      <p>Device Architecture.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Margara A.</given-names>
            ,
            <surname>Cugola</surname>
          </string-name>
          <string-name>
            <surname>G.</surname>
          </string-name>
          (
          <year>2011</year>
          ),
          <article-title>“High performance content-based matching using GPUs”</article-title>
          ,
          <source>Proceedings of the 5th ACM international conference on Distributed event-based system</source>
          , New York, USA [2]
          <string-name>
            <surname>Catanzaro</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundaram</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Keutzer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2008</year>
          ,
          <article-title>July)</article-title>
          .
          <article-title>Fast support vector machine training and classification on graphics processors</article-title>
          .
          <source>In Proceedings of the 25th international conference on Machine learning</source>
          (pp.
          <fpage>104</fpage>
          -
          <lpage>111</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>MechanicalForex.</surname>
          </string-name>
          (
          <year>2016</year>
          ), mechanicalforex.com. [ONLINE] Available at: http://mechanicalforex.com/
          <year>2016</year>
          /02/trading-and
          <article-title>-the-gpu-wastedpower</article-title>
          .
          <source>html. [Accessed 12 January</source>
          <year>2018</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Preis</surname>
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2011</year>
          ), “GPU - computing
          <source>in econophysics and statistical physics”</source>
          ,
          <source>The European Physical Journal Special Topics</source>
          , Vol.
          <volume>194</volume>
          , pp.
          <fpage>87</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Kaya</surname>
            <given-names>O.</given-names>
          </string-name>
          (
          <year>2016</year>
          ), “High - frequency trading.
          <source>Reaching the limits”</source>
          ,
          <source>Automated trader magazine</source>
          . Vol.
          <volume>41</volume>
          , p.
          <fpage>23</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Fox</surname>
            <given-names>M. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glosten</surname>
            <given-names>L. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rauterberg</surname>
            <given-names>G. V.</given-names>
          </string-name>
          (
          <year>2015</year>
          ), “The New Stock Market: Sense and Nonsense” , 65 Duke L.J.
          <volume>191</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Herlemont</surname>
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2013</year>
          ), “Pairs Trading, Convergence Trading, Cointegration”,
          <string-name>
            <surname>Quantitative</surname>
            <given-names>Finance</given-names>
          </string-name>
          , Vol.
          <volume>12</volume>
          (
          <issue>9</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Zubulake</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2011</year>
          ),
          <article-title>“The High frequency game changer: how automated trading strategies have revolutionized the markets”</article-title>
          , Aite group. Wiley trading.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Brogaard</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendershott</surname>
            <given-names>J. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riordan</surname>
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2013</year>
          ), “
          <article-title>High frequency trading and price discovery”</article-title>
          , ECB Lamfalussy fellowship programme/ Working paper series, No 1602,
          <string-name>
            <surname>European</surname>
            <given-names>central</given-names>
          </string-name>
          bank Press.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Jaramillo</surname>
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2016</year>
          ), “
          <article-title>The Revolt against High-Frequency Trading: From Flash Boys</article-title>
          , to Class Actions, to IEX”,
          <source>Review of banking &amp; financial law</source>
          , Vol.
          <volume>35</volume>
          , pp.
          <fpage>483</fpage>
          -
          <lpage>499</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Kirchner</surname>
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2015</year>
          ), “
          <article-title>High frequency trading: Fact and fiction”</article-title>
          ,
          <source>Policy: A Journal of Public Policy and Ideas</source>
          , Vol.
          <volume>31</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>8</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Limaye</surname>
            <given-names>S. S.</given-names>
          </string-name>
          (
          <year>2014</year>
          ),”
          <article-title>Electronically aided High frequency trading”</article-title>
          ,
          <source>International Journal of Engineering Research and Applications</source>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14]
          <string-name>
            <surname>McKenny</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>White</surname>
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2012</year>
          ), “
          <article-title>Stock Trading Strategy Creation Using GP on GPU”</article-title>
          ,
          <string-name>
            <surname>Soft</surname>
            <given-names>Computing</given-names>
          </string-name>
          , Vol
          <volume>16</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>247</fpage>
          -
          <lpage>259</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Salleh</surname>
            <given-names>N. S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baharim</surname>
            <given-names>M. F.</given-names>
          </string-name>
          (
          <year>2015</year>
          ), “
          <article-title>Performance Comparison of Parallel Execution Using GPU and CPU in SVM Training Session”</article-title>
          .
          <source>In Advanced Computer Science Applications and Technologies (ACSAT)</source>
          ,
          <year>2015</year>
          4th International Conference on, pp.
          <fpage>214</fpage>
          -
          <lpage>217</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Li</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>K.</given-names>
          </string-name>
          , Zhang G.,
          <string-name>
            <surname>Zheng</surname>
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2015</year>
          ), “
          <article-title>Deep Learning and Its Parallelization: Concepts and Instances”</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Sopyła</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drozda</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Górecki</surname>
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2012</year>
          ),
          <article-title>“SVM with CUDA accelerated kernels for big sparse problems”</article-title>
          .
          <source>In International Conference on Artificial Intelligence and Soft Computing</source>
          , pp.
          <fpage>439</fpage>
          -
          <lpage>447</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Herrero-Lopez</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            <given-names>J. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanchez</surname>
            <given-names>A</given-names>
          </string-name>
          . (
          <year>2010</year>
          ),
          <article-title>“Parallel multiclass classification using SVMs on GPUs”</article-title>
          .
          <source>In Proceedings of the 3rd Workshop on general-purpose computation on graphics processing units</source>
          , pp.
          <fpage>2</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Carpenter</surname>
            <given-names>A. U. S. T. I. N.</given-names>
          </string-name>
          (
          <year>2009</year>
          ),
          <article-title>“CUSVM: A CUDA implementation of support vector classification and regression”</article-title>
          . patternsonscreen. net/cuSVMDesc. pdf, pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Vaněk</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michálek</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Psutka</surname>
            <given-names>J</given-names>
          </string-name>
          . (
          <year>2017</year>
          ),
          <article-title>“A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support Vector Machine Training”</article-title>
          .
          <source>IEEE Transactions on Parallel and Distributed Systems</source>
          ,
          <volume>28</volume>
          (
          <issue>12</issue>
          ), pp.
          <fpage>3330</fpage>
          -
          <lpage>3343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Raina</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madhavan</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          (
          <year>2009</year>
          ), “
          <article-title>Large-scale deep unsupervised learning using graphics processors”</article-title>
          .
          <source>In Proceedings of the 26th annual international conference on machine learning</source>
          , pp.
          <fpage>873</fpage>
          -
          <lpage>880</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Dean</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monga</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devin</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            <given-names>M.</given-names>
          </string-name>
          , ... ,
          <string-name>
            <surname>Ng</surname>
            <given-names>A. Y.</given-names>
          </string-name>
          (
          <year>2012</year>
          ), “
          <article-title>Large scale distributed deep networks”</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pp.
          <fpage>1223</fpage>
          -
          <lpage>1231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Li</surname>
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salman</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kecman</surname>
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2010</year>
          ),
          <article-title>“An intelligent system for accelerating parallel SVM classification problems on large datasets using GPU”</article-title>
          .
          <source>In Intelligent Systems Design and Applications (ISDA)</source>
          ,
          <year>2010</year>
          10th International Conference on, pp.
          <fpage>1131</fpage>
          -
          <lpage>1135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Coates</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huval</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catanzaro</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrew</surname>
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2013</year>
          ), “
          <article-title>Deep learning with COTS HPC systems”</article-title>
          .
          <source>In International Conference on Machine Learning</source>
          , pp.
          <fpage>1337</fpage>
          -
          <lpage>1345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Chen</surname>
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2014</year>
          ),
          <article-title>“A fast deep learning system using GPU”</article-title>
          .
          <source>In Circuits and Systems (ISCAS)</source>
          ,
          <source>2014 IEEE International Symposium on</source>
          , pp.
          <fpage>1552</fpage>
          -
          <lpage>1555</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Hung</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2012</year>
          ),
          <article-title>“Accelerating parallel particle swarm optimization via GPU”</article-title>
          .
          <source>Optimization Methods and Software</source>
          ,
          <volume>27</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>33</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Cai</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lai G. Lin</surname>
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2013</year>
          ), “
          <article-title>Forecasting large scale conditional volatility and covariance using neural network on GPU”</article-title>
          .
          <source>The Journal of Supercomputing</source>
          ,
          <volume>63</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>490</fpage>
          -
          <lpage>507</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Tran</surname>
            <given-names>H. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cambria</surname>
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2018</year>
          ), “
          <article-title>Ensemble application of ELM and GPU for real-time multimodal sentiment analysis”</article-title>
          .
          <source>Memetic Computing</source>
          ,
          <volume>10</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>3</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Krizhevsky</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            <given-names>G. E.</given-names>
          </string-name>
          (
          <year>2012</year>
          ), “
          <article-title>Imagenet classification with deep convolutional neural networks”</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          (pp.
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Bonanno</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Capizzi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sciuto</surname>
            ,
            <given-names>G. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pappalardo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tramontana</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>A novel cloud-distributed toolbox for optimal energy dispatch management from renewables in igss by using wrnn predictors and gpu parallel solutions</article-title>
          .
          <source>In International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEEDAM)</source>
          , (pp.
          <fpage>1077</fpage>
          -
          <lpage>1084</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pappalardo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tramontana</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zappalà</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>A cloud-distributed GPU architecture for pattern identification in segmented detectors big-data surveys</article-title>
          .
          <source>The Computer Journal</source>
          ,
          <volume>59</volume>
          (
          <issue>3</issue>
          ),
          <fpage>338</fpage>
          -
          <lpage>352</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>