<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Financial World, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Unconverged Learning of Pairs Trading Strategies with Representation Labeling Mechanism</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wei-Lun Kuo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tian-Shyr Dai</string-name>
          <email>cameldai@mail.nctu.edu.tw</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wei-Che Chang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Management and Finance, National Chiao Tung University</institution>
          ,
          <addr-line>1001 University Road, Hsinchu</addr-line>
          ,
          <country country="TW">Taiwan</country>
          <addr-line>300, ROC</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science and Engineering, National Chiao Tung University</institution>
          ,
          <addr-line>1001 University Road, Hsinchu</addr-line>
          ,
          <country country="TW">Taiwan</country>
          <addr-line>300, ROC</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Data Science and Engineering, National Chiao Tung University</institution>
          ,
          <addr-line>1001 University Road, Hsinchu</addr-line>
          ,
          <country country="TW">Taiwan</country>
          <addr-line>300, ROC</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Recently</institution>
          ,
          <addr-line>reinforcement learning (abbreviated as RL</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>deep learning (abbreviated as DL hereafter) all fail to con-</institution>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>verge. To resolve this problem</institution>
          ,
          <addr-line>we develop a represen-</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>0</volume>
      <fpage>1</fpage>
      <lpage>05</lpage>
      <abstract>
        <p>A pairs trading strategy (PTS) constructs a market-neutral portfolio whose value typically moves back and forth around a mean price level; investors short (long) the portfolio when its value reaches the upside (downside) opening threshold and close the position when the value reverts to the mean to earn the price diference. Recent machine learning models select the open and stop-loss thresholds either heuristically or chosen from a limited set, which significantly limits the investment performance. We address this by creating a wider set of open/stop-loss threshold recommendations that generally cover all possible scenarios; but regression- or classification-based deep learning methods for recommending thresholds fail to converge. Thus, we design a representative labeling mechanism that selects representative open and stop-loss thresholds from all possible optimal thresholds according to the selection frequencies of the thresholds and the  -means algorithm. Experiments suggest that training the multi-scale residual network with stock pairs relabeled by representative thresholds yields better investment performance than other methods in the literature.</p>
      </abstract>
      <kwd-group>
        <kwd>Pairs trading</kwd>
        <kwd>Representation labelling</kwd>
        <kwd>ResNet</kwd>
        <kwd>Opening and stop-loss triggers tuning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A pairs trading strategy (abbreviated as PTS hereafter)
is a popular market-neutral investment strategy
introduced by Wall Street econometricians no later than the
1990s. Instead of guessing at unpredictable financial
market trends, a PTS eliminates market tendency risk by
simultaneously longing one stock and shorting another
at a specific ratio. The net value of this long-short
portfolio, referred to as the “spread”, moves back and forth
around a certain mean price level without being
influenced by financial market trends, as suggested by the
“market-neutral” modifier. A portfolio with this mean
reverting property can be constructed by finding a pair
integration properties per the Johansen co-integration
test [see 1]. We long (short) the portfolio when the spread
is below (above) the mean price level to reach a lower
(higher) opening threshold, and then close the portfolio
when the spread converges to the mean level to earn the
price diference.
ating high quality stock pairs for PTS, recommendations
for a customized threshold for each stock pair have not
been well-studied. In addition, a PTS is a “statistical”
MUFin21: International Workshop on Modelling Uncertainty in the
tative labeling mechanism that selects 25 representative They heuristically set six overly simplistic actions, which
thresholds (determined by the Elbow method) to repre- significantly limits the profitability as shown later. In
adsent 300 thresholds by picking most frequently selected dition, they train each PTS-eligible stock pair with a DQN,
thresholds or use the  -means method. Each stock pair is which necessitates a large number of DQNs. Confirming
then relabeled with a representative threshold. Our alter- their observations, we find that co-integration properties
native to learn from the 300-label stock pairs is changed for most stock pairs are not durable over a long period
to learn from 25-relabeled stock pairs. Experiments show of time; thus only a small amount of stock pairs
conthat training a multi-scale residual network (abbreviated tain enough data to train the DQN. We instead train our
as ResNet) proposed by Li et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] with relabeled stock machine learning model on trading data from all stock
pairs facilitates smooth and quick convergence. They pairs; the resultant model recommends thresholds for all
also show that this representative labeling mechanism stock pairs. Brim [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposes the Double-DQN with
outperforms past work. three actions, but the low win rate limits the practical
      </p>
      <p>
        Our paper is organized as follows: Section 2 reviews value of the model. Xu and Tan [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] uses
determinisPTS research and studies on relevant machine learning tic policy gradient (DPG) to predict open and stoploss
models. In Section 3, we discuss the construction of opti- timing for P T S and the value weights of pairs to form a
mal open and stop-loss thresholds and the representative return-maximized portfolio. Hsu et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] uses several
labeling mechanism adopted to address the failure to deep learning models with the opinions on social media
converge. Sections 4 describes how we select and in- to predict the price movement in P T S . Threshold
seleccorporate the multi-scale ResNet into our PTS trading tion problem can be found in other domain. In order to
model. The experimental results in Section 5 confirm the manage dynamic network trafic, the number of Virtual
superiority of our models. Section 6 concludes the paper. Network Functions(VNFs) instances need to be chosen.
Rahman et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] models the problem as a classification
problem, and uses several machine learning models to
2. Preliminaries predict the number of VNFs.
      </p>
      <p>
        The PTS literature largely adopts RL but here we use
2.1. Literature Review DL with representative labeling. Since our experiments
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] shows that the techniques for finding stock pairs show that a DL model with only a few layers fails to
eligible for P T S can be classified into five approaches. learn eficiently and accurately, we adopt the residual
Our stock pair generation method is based on the co- network (ResNet) model proposed by He et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], which
integration approach, as Rad et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Huck and uses deeper layers to capture complex features/patterns
Afawubo [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] argue that this approach is better than other in financial markets. He et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] provide extensive
approaches. Engle and Granger [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Johansen [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] de- empirical data demonstrating that ResNets are simpler
velop diferent statistical tests to determine whether the to optimize and achieve higher learning precision due to
price processes of a stock pair possess the co-integration their greater numbers of hidden layers. Furthermore, Li
property; that is, there exists a linear combination of two et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] extend ResNet from a single scale to multiple
stock prices that make the value process of this two-stock scales by adding convolution kernels of various sizes to
portfolio a stationary process. The stationary property adaptively detect data features from diferent aspects.
ensures that statistical properties such as the mean of Our paper combines representative labeling with
multithe value process do not change with time. Thus we can scale ResNet to yield superior investment performance.
buy (sell) the portfolio when its value is below (above)
the mean and cash out when the value converges back. 2.2. Co-integration Method and PTS
These tests are used by Vidyamurthy [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and Rad et al.
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to detect stock pairs that are eligible for PTS. A trading duration—in this paper a business day—is
di[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] use the reinforcement learning vided into a formation period and a trading period: data
(R L ) method to determine opening/stoploss thresholds in the formation period is used to select PTS-eligible stock
for P T S . Fallahpour et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] enumerate 39 actions (i.e., pairs in the trading period. We use the co-integration
ap39 combinations of open and stop-loss thresholds) and proach [
        <xref ref-type="bibr" rid="ref10 ref16 ref17 ref18 ref6">16, 10, 17, 18, 6</xref>
        ] to find eligible stock pairs from
reduce the threshold selection problem to a multi-armed a stock pool, for instance, the 0050 constituent stocks
bandit problem solved using a single-state RL model. Our from the Taiwan stock market. Let the  -th pair be
comexperiments show that this naive mechanism fails to cap- posed of stocks  1 and  2, and let the capital invested
ture various properties of diferent stock pairs and is out- in these two stocks be  1 ∶  2 (if the stock pair is
eligiperformed by other approaches in terms of investment ble). We extract the logarithmic stock price processes
results. Kim and Kim [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] instead use a deep Q-network ln  1() and ln  2() from the formation period to form a
(DQN), which outperforms Fallahpour et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]’s model. two-dimensional vector  () ≡ ( ln  1(),ln  2())′. The
co-integration property of  () can be tested using the
Johansen co-integration test [see 1] with the following
vector error correction model (VECM):
−1
=1
△ () = Π ( − 1) +
∑   △ ( − ) +   ,
      </p>
      <p>(1)
portfolio
where △ () ≡  () −  ( − 1)</p>
      <p>
        , the rank of the 2 × 2 matrix
Π denotes the number of co-integration relations,  − 1
denotes the VECM order,   is also a 2×2 matrix, and  
denotes a 2×1 white noise vector. We follow Lütkepohl et al.
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] in using the power test which decomposes Π as   ′
      </p>
      <p>,
where the 2 × 1 co-integration vector  ≡ ( 1,  2</p>
      <p>)′
determines the ratios of the capital invested in the two stocks.</p>
      <p>If the  -th stock pair  1 and  2 passes the co-integration
test, then we construct a portfolio by investing the two
stocks at the ratio  1 ∶  2. The spread process of this</p>
      <p>() ≡  1 ln  1() +  2 ln  2()
is market neutral and moves back and forth around the
mean of the spread (  ()). We could also measure the
variation of   () by calculating its standard derivation   .</p>
      <p>If we purchase this portfolio at time  and sell it at time
 ′, the profit (or loss) can be expressed as product of the
investment amount  and the diference of the spread:
 2
 2( )
 1( ′)
 1( )
+  2
 2( ′)
 2( )</p>
      <p>)
[ 2( ′) −  2( )] ,
 × (  ( ′) −   ( )) =  × ( 1 ln</p>
      <p>[ 1( ′) −  1( )] +
≅
 1
 1( )
where ln 

 ( ′)

 ( )

over the time period [ ,  ′].  ( )


shares for trading   at time  .2</p>
      <p>denotes the return rate for investing</p>
      <p>denotes the numbers of</p>
      <p>The market-neutral nature of Equation (2) allows us
to long (short) the portfolio when the spread is below
(above) its mean and close the position when it converges
to the mean to make a profit as illustrated in Figure 1. To
increase the profit in Equation ( 3), which simultaneously
covers the transaction cost, we find a suitable open
threshold, defined as the product of a scalar   ′ and the volatility
  . We also find another stop-loss threshold, defined as
the product of a scalar   ′ and   , to prevent occasional
failures of the market-neutral property from seriously
influencing profits. The intersection of the spread
either element of the trigger pair ( ′</p>
      <p>,  ′) determines the
timing to long/short the portfolio or to stop loss,
respectively. Specifically, if the spread    reaches the upper
opening trigger (denoted by node  ), then we short the
portfolio with the value investment ratio  1 ∶  2 for
stocks  1 and  2. After shorting the portfolio,    may</p>
      <p>with
2We long (short)   if this value is positive (negative).</p>
      <p>A
Begin</p>
      <p>D
Trading Period
E(Pi(t)) 'O i</p>
      <p>E(Pi(t))
E(Pi(t) ) 'O i
End</p>
      <p>E(Pi(t) ) 'S  i
H</p>
      <p>G
(2)
(3)
red line, blue lines, and black lines denote the mean of   () , the
triggers for opening the portfolio, and the triggers for stoping losses.</p>
      <p>The values of these triggers are listed on the right of these lines.</p>
      <p>The orange and green curves illustrate all possible scenarios to open
the portfolio and to close the portfolio. After opening the portfolio,
we use dash curves and solid curves to denote that the portfolio
is closed to get profit and to stop loss, respectively. The dot curve
denotes that the portfolio is closed at the end of the trading period.</p>
      <p>The period begins from time 0 and ends at time  . Node  and 
occur at time  and  ′, respectively.
still reach node  , in which case we close the portfolio
to stop loss. Otherwise, it may fall to node  , in which
case we close the portfolio to gain a profit. On the other
hand, if    falls to the lower opening trigger (denoted by
node  ), then we long the portfolio. After longing the
portfolio,    may still fall to node  , in which case we
close the portfolio to stop loss. Otherwise, it may reach
node  , in which case we close the portfolio to gain a
profit. Finally, the portfolio may still be open at the end
of the trading period, say, the closing time of Taiwan
Stock Exchange. In this case the portfolio is forced to
close to avoid incurring risks to keep cross-day positions.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Representation Labeling</title>
    </sec>
    <sec id="sec-3">
      <title>Mechanism</title>
      <p>We first describe our dataset and the preprocessing of the
stock tick data, after which we discuss why it is dificult to
train naive deep learning methods for regression or
classification to pick feasible open and stop-loss thresholds
for PTS. We propose the core idea and several variations
of the representation labeling mechanism (abbreviated
as RLM hereafter) to address this problem by
generating “representation” thresholds and labels suitable for
training deep learning methods.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset and Preprocessing</title>
        <p>The dataset used to develop and examine pair trading
strategies is composed of the constituent stocks of Taiwan
Top 50 ETF (0050) from January 1, 2013 to December 31,
2018. We adopt a day-trading strategy without holding
positions overnight, as day trades provide 50% discounts
on transaction costs,3 which significantly increases win
rates and profits. Figure</p>
        <p>2 illustrates the overall
procedure of the proposed PTS; Step 1 describes the data
preprocessing. We first set non-overlapping training and
testing periods from 2013 to 2018. The stock tick data
for each business day in the training period generates
the spread features and labels needed to train the RLM
model, whose performance is then verified at each
business day of the testing period. Daily trading is conducted
from 9:00 a.m. to 13:30 p.m. for each business day,
divided into the formation period (the first 166 minutes,
ignoring the beginning of the first 16 minutes) and the
trading period (e.g., the rest of the business day). We use
the tick data from the formation period to calculate the
weighted average stock price for each minute, and use the
resultant time series to construct eligible stock pairs and
corresponding investment ratios based on the Johansen
co-integration method, as described in Section 2.2. The
feature of the  -th stock pair is the spread process  
constructed by substituting the stock price processes of

 1 and  2 during the formation period (training period)
 (  )
into Equation (2).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Labeling: Finding the Optimal</title>
      </sec>
      <sec id="sec-3-3">
        <title>Trigger Pair</title>
        <p>Now we label the  -th spread process in the formation
period    with the optimal trigger threshold (  ,</p>
        <p>maximizes the profit when executing the PTS.
Specifi ) that


rises to     ), as defined in Figure</p>
        <p>( 
ing spread process in the trading period    reaches −</p>
        <p>) and stop loss when the process falls to −     (or</p>
        <p>
          ing PTS profit is calculated by Equation ( 3). Note that
both open and stop-loss triggers can be any positive real
number, which makes the search for the optimal
trigger thresholds (or the labeling process) intractable. In
the literature [
          <xref ref-type="bibr" rid="ref2 ref3">3, 2</xref>
          ] either fixed triggers are used or
opselecting the opening trigger threshold  ’  from set  and
the stop-loss trigger threshold  ’  from set  . In addition,
cally, we long (short) the portfolio when the correspond- cost and the price slippage, the opening trigger generally
1; the correspond- range determined by 0.5 and the maximum derivation of
timal triggers are found from a limited set which is de- the condition 1.5 ×  ’  &lt;  ’  is enforced to prevent the
termined heuristically, which significantly weakens the
performance as verified later. To search for the optimal
two thresholds from being too close together, as such
proximal thresholds increase the likelihood of closing
trigger threshold over the whole solution space with- the portfolio to stop loss immediately after opening the
out incurring excessive computational resources, we first
collect all the spread processes of all business days in
the training period, after which we define the maximum
derivation for each spread process during the formation
period as max ( 
 () − E ( 
 ())). A feasible stop-loss
portfolio, which results in degraded trading performance.
        </p>
        <p>In addition, we add one more combination (10, 25)with
extremely high open and stop-loss thresholds to filter
out stock pairs that are not suitable for trading. Then
4Here we replace minimum and maximum derivations with
3The transaction cost is 0.3% but reduces to 0.15% for day trading. nearby numbers 1.5 and 24, respectively.
we trade stock pair  1 and  2 by using the spread at the
trading period    and the trigger threshold ( ’  ,  ’  ) to
determine the timing for opening and stopping loss as in
  for training the proposed machine learning models.</p>
        <p>
          There are about 300 combinations5 for opening and
stoploss trigger thresholds that have been selected by at least
one stock pair. Note that many aforementioned
enumerated trigger thresholds are never selected by any stock
pair. This is because the stock price is quoted as
integral multiples of basic units (i.e., ticks) rather than
continuously. Thus many trigger thresholds would not fit
discrete changes of the spread process defined in
Equation (2) due to discrete stock price quotes and thus will
never be selected as optimal trigger thresholds. This
explains why heuristically setting trigger thresholds [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
would significantly limit PTS performance. Since
deriving proper trigger thresholds from discrete changes of
the spread process could be dificult, thus we enumerate
many thresholds and use Equation (3) to filter improper
ones.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.3. Failure of Naive</title>
      </sec>
      <sec id="sec-3-5">
        <title>Regression/Classification Deep</title>
      </sec>
      <sec id="sec-3-6">
        <title>Learning Approaches</title>
        <p>Since the feasible ranges of both open and stop-loss
thresholds are positive real numbers, it is natural to
pre</p>
        <p>,   ) by adopting
dict the optimal trigger threshold (
a two-output-neuron regression-based deep neural
network (RDNN) using the spread and stock processes (i.e.,



,  1(), and  2()) during the formation period as input.</p>
        <p>The RDNN training loss is defined as the mean square
error (MSE) between the predicted thresholds and the
optimal thresholds produced in Section 3.2. The training
loss illustrated in Figure 3(a) does not converge as
training proceeds, which shows that RDNN does not capture
discontinuous relationships between open and stop-loss
thresholds and profits. This is because minor shifts in
either threshold can yield large changes in PTS profits,
as illustrated in Figure 1. For example, a slight increment
in the upper open threshold from the upper blue solid
line to the blue dashed line removes the chance to short
the portfolio at point  for the solid orange spread and
reduces the profit to zero.</p>
        <p>Instead of predicting open and stop-loss thresholds
with regression-based approaches, we could select the
optimal trigger threshold from all possible thresholds
generated by the method described in Section 3.4 using
5This value changes with the training set data.
(a) Losses of Training DNN
(b) Lossese of Training CNN
(c) Losses of Training ResNet
which significantly reduces the number of labels without
significantly sacrificing threshold quality. Our
experiments show that this mechanism improves the ResNet
training accuracy and the resultant PTS investment
performance.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.4. Representation labelling</title>
        <p>
          In contrast to machine-learning based work such as Kim
and Kim [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and Fallahpour et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], which use RL to
pose a novel deep learning model with a representation
labeling mechanism to train above 25 representative
trigger thresholds (determined by the Elbow method) that
represent the 300 optimal thresholds determined in
Section 3.2. Experimental results in Section 5 show that
using representations of the optimal trigger thresholds
yields performance significantly better than that of the
RL approach with heuristically selected thresholds.
        </p>
        <p>The representation mechanism resolves the training
problem of non-convergence by reducing the number
of classifications and maintains trading performance via
properly selected representations. Recall that we divide
formation period part    and a trading period  
the  -th spread process defined in Equation ( 2) into a</p>
        <p>, and
learn 6 to 60 trigger thresholds (i.e., actions), we pro- in Figures 4(b) and 4(c); we term the resulting
representamal trigger threshold (</p>
        <p>,   )that maximizes the benefit
then substitute    into Equation (4) to extract the opti- in Equation (2) as inputs and the representative
thresholds generated in Section 3.4 as labels, we can compare
(the black nodes) does not coincide with any optimal
threshold, as  -means clustering calculates each cluster
center by averaging; however, as mentioned above, slight
shifts in the threshold (e.g., from the upper blue solid
line to the dashed one in Figure 1) could yield significant
changes to the profit. To prevent disturbances in
lowprobability thresholds from degrading the quality of the
representation labels, we apply  -means on thresholds
with probabilities larger than 0.1% and 0.5%, as illustrated
tion label settings K m e a n ( 1 ) and K m e a n ( 2 ) , respectively. In
addition, to ensure that the representative thresholds
coincide with an optimal threshold, we could alternatively
choose as representation thresholds those trigger
thresholds with the top 25 highest probabilities (denoted by
the orange nodes), as shown in Figure 4(d); we term this
the HighFreq label setting. In Section 5, we show how
these representation labeling mechanisms outperform
previous approaches.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Learning Model Constructions</title>
      <p>Given the stock and spread prices processes determined
be viewed as the label for  
for trading the stock pair  1 and  2. Then (


,    ) can
 ; the trigger threshold
distributions are illustrated in Figure 4(a). Pink, yellow,
green, and blue reflect the magnitude of the probability
for choosing a threshold as the optimal threshold. By
excluding thresholds with probabilities lower than 0.1%
and 0.5%, we obtain Figures 4(b) and 4(c). We observe
that the trigger threshold distribution is widespread and
far from uniform. In addition, for some thresholds, the
probability of selecting them (denoted by pink or yellow
nodes) as optimal thresholds is much higher than that
of other thresholds. This significant lack of smoothness
could explain why regression-based DL fails to converge
in Figure 3(a).</p>
      <p>We address this lack of training convergence problem
by setting representative trigger thresholds via either 
means or by using the thresholds with the top- highest
probabilities. In the first method, we partition all trigger
thresholds into a reasonable number of clusters by the 
means algorithm; the cluster number 25 is determined by
the Elbow approach. The set of representation thresholds
R are defined as the centers of aforementioned clusters,
and we call this label setting as K m e a n ( 0 ) . Then the
threshold for the  -th spread process is relabeled by picking one
of these representation thresholds that maximize profit
as follows:
(</p>
      <p>,   ) ≡ argmax( ′
 ,  ′)∈R [Profit (  
,   ′ ,   ′)].</p>
      <p>
        (5)
Note that each representation threshold selected by K m e a n ( 0 )
the investment performance of several deep learning
models and select the best one as the model used in step
5 of Figure 2. The input   = [ 1

,  2, 
  ] is formed by the
price processes of the  -th stock pair  1 and  2 in the
for
mation period and the corresponding spread process  
determined in Equation (2). The input   with length 300
(i.e., the number of half-minute data in the 150-minute
formation period) is extended to 512 by padding the
remaining positions with zeros. We number each of the 25
representative thresholds with a unique integer from the
range [
        <xref ref-type="bibr" rid="ref1">1, 25</xref>
        ]. The number for the representative
threshold recommended by the representative label mechanism
(see Section 3.4),   , is used as the label for the stock
pair  . We train the “plain” CNN (i.e., without adopting
ResNet), the single-scale ResNet [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and the multi-scale
ResNet [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] with input   and ground truth   for each
stock pair  from the training period. The CNN includes
a one-dimensional convolutional layer with three
channels (the spread and the two stock price processes) and
25 1 × 5 kernel maps. The output is sent to the batch
normalization layer [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] to stabilize and speed up the
training process; we use Leaky-ReLU activations. The
results are passed through a one-dimensional
convolutional layer with 50 kernel maps, a layer with 100 kernel
maps, and a layer with 200 kernel maps, sequently; and
the final outputs are then sent to a fully connected layer.
The single-scale ResNet uses one size-3 convolution
kernel, which applies to one chain of residual blocks. The
three-scale ResNet adds size-5 and size-7 convolution
(a) The Distribution of Trigger Thresholds
(b) Trigger Thresholds Excluding the Thresholds with
Probabilities Lower than 0.1%
(c) Trigger Thresholds Excluding the Thresholds with
Probabilities Lower than 0.5%
(d) Trigger Thresholds and the Thresholds with Top-25
Highest Probabilities
100
80
rcayu
acc 60
g
n
ii
an
rT 40
20
Multi-scale ResNet
CNN
      </p>
      <p>Single-scale ResNet
0
20
40
60
80</p>
      <p>100</p>
      <p>Epoch
0
20
40</p>
      <p>60
Epoch</p>
      <p>Multi-scale ResNet
CNN
Single-scale ResNet
80 100
kernels and two corresponding chains of blocks.6 The
features extracted by the three convolution kernels (i.e.,
the outputs from the three chains of residual blocks) are
concatenated to form a feature vector which is then sent
to a fully connected network.</p>
      <p>The training results for these three models are shown
in Figure 5. The training accuracy measures the
percentage of correct predictions of all pairs in the training set;
training loss is measured by cross entropy. The
training accuracy for the CNN model, denoted by the orange
curves, increases slowly, while the training loss
oscillates significantly, which renders this model impractical.
Thus we use a residual network, which employs hidden
layers to capture complex features in financial markets.
Although both the single- and three-scale ResNet achieve
almost 100% accuracy and 0% loss after enough numerous
training epochs, the latter mechanism converges more
smoothly and quickly. Thus we adopt the three-scale
ResNet in the following experiments.</p>
      <p>To determine the number of training epochs, the data
are divided into the training set and the validation set.
We train the model on the training set data and run the
resulting model on the validation dataset to calculate the
accuracy and loss. To fairly retrieve useful information
from the training dataset without overfitting, training is
6The structure of ResNet can be found in https://github.com/
geekfeiw/Multi-Scale-1D-ResNet. The convolution kernel sizes 3, 5,
and 7 are suggested by that website.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Empirical Tests</title>
      <p>halted when the win rate of the validation set reaches a
maximum.</p>
      <p>The maximum drawdown is the maximum cumulative
daily loss during the testing period. The win rate is
deifned as the number of profit-making trades divided by
the total number of trades made in the testing period.</p>
      <p>The normal close rate is defined as the number of trades
whose spread process converges back to the mean7
divided by the total number of trades. The profit per open
is the average profit for each trade.</p>
      <p>In the experiments in Section 5.1, we first compare the
various DL methods and representative labeling methods
discussed in Sections 4 and 3.4. We find that combining
multi-scale R e s N e t and K M e a n ( 0 ) (or H i g h F r e q ) produce
best investment results; thus we will use these settings in
the following experiments. Section 5.2 demonstrates that
the proposed mechanism for representative thresholds
outperforms past threshold selection mechanisms.</p>
      <p>We conducted experiments on the Taiwan Top 50 ETF
component stocks from 2013 to 2018 to back-test
improvements in PTS performance due to the proposed
representative labeling mechanisms. As illustrated in
Figure 2, information on stock pair eligibility and
investment ratios is obtained by applying the Johansen
co-integration test on half-minute average stock price
data during the formation period. We then label the
optimal trigger threshold for each stock pair (Sec. 3.2),
relabel each pair with a representative threshold (Sec. 3.4),
and train the ResNet model with stock pairs and
representative labels retrieved from each training day in the
training period. To evaluate the trading performance, 5.1. Selection of Learning Models and
we extract each trading day  from the testing period, Representative Labeling
retrieve stock pairs by applying the Johansen test to the Mechanisms
formation period of day  , and predict the representative
trigger threshold for each pair using the trained ResNet. To ensure the eficiency of training described in step 5
We then use the retrieved stock pair and the threshold of Figure 2, it is necessary to select the proper machine
to execute tick-by-tick pair trading in the day’s trading learning models and representative labeling mechanisms.
period. The transaction tax is set to 0.15%, as defined in Table 1 compares the performance when training with
the Taiwan Stock Exchange for day trading. To simulate CNN, the single-scale ResNet, and the multi-scale ResNet.
price slippage efects, all trades are executed one tick The unstable, slow convergence of CNN clearly yields
after the spread process hits the trigger threshold. poor results. Multi-scale ResNet outperforms single-scale</p>
      <p>To compare the trading performance of diferent PTS ResNet, as applying more convolution kernels with
difover the trading period, we list the (overall) profit, the ferent sizes extracts more information from the input
win rate, the normal close rate, the number of trades, the data. Accordingly, in subsequent experiments we use the
Sharpe ratio (SR) calculated on the daily base or the pair multi-scale ResNet as the training model.
base, the maximum drawdown (MDD), the (maximum) Table 2 compare diferent representative labeling
mechrequired capital, and the average profit (per trade), as anisms proposed in Section 3.4. In column 4, K M e a n ( 0 ) ,
illustrated in the first column of each table. The (overall) K M e a n ( 1 ) , and K M e a n ( 2 ) denote the representative label
profit is the sum of the daily profit (or loss) for all trading settings that applying the  -mean methods on total
opdays in the testing period, where the profit of day  is timal thresholds (see Figure 4(a)), the optimal
threshthe sum of the profits when trading all PTS-eligible stock olds selected with probability larger than 0.1% (see
Figpairs on the day. The profit of each trade is calculated in ure 4(b)), and 0.5% (see Figure 4(c)), respectively. H i g h F r e q
Equation (3). The required capital for day  is measured picks the trigger thresholds with top 25 highest
probabilas the sum of the capital required to execute each PTS on ities as in Figure 4(d).
the day. The maximum required capital is defined as the We observe that both the win rate and the normal close
maximum of the required capital for each trading day in rate are high for these label mechanisms, as the spread
the testing period. The daily (pair) return is then calcu- processes after applying the co-integration test described
lated as the daily (pair) profit divided by the maximum in Section 2.2 are likely to have the mean reverting
proprequired capital (the capital required to trade the pair). erty. This suggests that a mechanism with large total
The Sharpe ratio, which estimates the excess investment opening numbers yields high profits and Sharpe ratios.
return divided by the corresponding risk, is calculated Also, the total open number for KMean(0) is the highest
either on a daily basis as of the four mechanisms as it does not exclude information
Daily return − Risk-free return from other trigger thresholds with lower probabilities.</p>
      <p>, However, unlike representative thresholds produced by</p>
      <p>Standard derivation of daily return  -means, which typically do not coincide with optimal
or on a pair basis as</p>
      <p>Pair return − Risk-free return
Standard derivation of pair return
.</p>
      <p>7That is, the portfolio is neither closed to stop loss (like nodes 
of  ) or forced to close (like node  ) as illustrated in Figure 1.
The training period, the validation period, and the testing period are listed in the first, second, and the third row, respectively.
The performance of the CNN model (CNN), the single-scale ResNet (S-ResNet), and the multi-scale ResNet (M-ResNet) are
compared with the performance indicators listed in the first column. The training data are the price processes of spreads
and the two stocks (of the pair). Representative thresholds are generated by H i g h F r e q . SR and MDD are abbreviations of
Sharpe ratios and maximum drawdown, respectively.
thresholds due to the average calculation, every thresh- the co-integration method yields more pairs that are
eligiold recommended by HighFreq is directly an optimal ble for trading (i.e., have larger total open numbers) than
threshold with the highest 25 probabilities. Absent the both TLS and OLS; thus the overall profit, the Sharpe
radisturbances on opening/stop-loss thresholds, HighFreq tios, and the average profit of the co-integrated method
yields better pair-based results (i.e., SR (pair) and profit are all significantly better. To fairly compare all machine
per open) than other  -mean-based mechanisms. Since learning methods for threshold selections, for subsequent
KMean(0) and HighFreq possess advantages in diferent experiments we generated pairs and investment weights
aspects, subsequent experiments use either KMean(0) or using the co-integrated method.</p>
      <p>
        HighFreq for comparison. Table 4 compares diferent threshold selection
mechanisms with our representative labeling mechanism.
Fal5.2. Comparisons among Past Works lahpour et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] reduce the threshold selection problem
to a multi-armed bandit problem and solve it using a
The statistical tests used to determine eligible stock pairs reinforcement learning model with 39 actions (i.e, open
and investment weights (see Equation (2)) significantly and stop-loss thresholds) generated by Equation (4) with
influence PTS performance, as illustrated in Table 3. In ad- a much narrower set  ∈ {0.5, 1, ⋯ 3} and  ∈ {0.5, 1, ⋯ 5}
ditional to the co-integration test described in Section 2.2, (denoted by method 1). Kim and Kim [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] use deep
reKim and Kim [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] propose the ordinary least squares (OLS) inforcement learning to select one of six
heuristicallyand total least squares (TLS) methods. The win rate and generated actions for trading (denoted by method 4).
normal close rate of TLS and OLS are relatively low and Note that as the number of actions (the threshold choices)
the overall profits from 2016 to 2018 and the average in their papers is relatively small, their models do not
sufprofit (for each trade) are nearly all negative. In addition,
forms DRL in almost every aspect, even though DRL rec- during training when using regression-based DL models
representative thresholds. Increasing the number of rep- labels from harming training performance, we relabel
fer from the training non-convergence problem described
in Section 3.3. However, limiting the number of actions
(or threshold choices) also limits PTS performance.
      </p>
      <p>
        For a fair comparison with the six actions of the DRL
approach [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we add the HighFreq performance with six
representative thresholds as method 3: HighFreq
outperommends more trading opportunities (i.e., has a higher
total open number) which requires higher (daily) trade
capital. However DRL’s low win rate translates to lower
overall profits and SR metrics than HighFreq with six
resentative thresholds from 6 to 25 (denoted as method
2) increases both the win rate and the total open number,
and also improves profit and SR, and reduces risk (proxied
by MDD). In addition, the naive reinforcement learning
model proposed by Fallahpour et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] performs poorly
with a win rate lower than 50% and negative profits. If
transaction costs are ignored, as in their experiments,
the profit of their RL model becomes positive; thus their
model fails to find proper thresholds to filter out
unprofitable trades due to transaction costs. Generally speaking,
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>To improve PTS investment performance, we adopt a
supervised learning paradigm to recommend feasible
opening and stop-loss triggers that difer greatly from existing
RL-based approaches. To address the lack of convergence
to learn trigger thresholds, we reformulate the problem
as a classification problem as follows. We first label each
spread process with an optimal trigger pair that
maximizes the trading profit. To avoid a huge number of
each spread process with our proposed representative
labeling mechanism. Then we train the multi-scale R e s N e t
with stock pairs relabeled by representative thresholds.
Experimental results show that our proposed approach
outperforms other existing approaches in terms of
investment performance measures.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We thank the Ministry of Science and Technology (MOST)
performs other relevant work.
the proposed representative threshold mechanism
outfor supporting our work under 109-2622-H-009 -001</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Johansen</surname>
          </string-name>
          ,
          <source>Likelihood-Based Inference in Cointegrated Vector Autoregressive Models</source>
          , Oxford University Press,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fallahpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hakimian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Taheri</surname>
          </string-name>
          , E. Ramezanifar,
          <article-title>Pairs trading strategy optimization using the reinforcement learning method: a cointegration approach</article-title>
          ,
          <source>Soft Computing</source>
          <volume>20</volume>
          (
          <year>2016</year>
          )
          <fpage>5051</fpage>
          -
          <lpage>5066</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Optimizing the pairs-trading strategy using deep reinforcement learning with trading and stop-loss boundaries</article-title>
          ,
          <year>Complexity 2019</year>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zhang, Multi-scale residual network for image super-resolution</article-title>
          ,
          <source>in: Proceedings of the European Conference on Computer Vision (ECCV)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>517</fpage>
          -
          <lpage>532</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Krauss</surname>
          </string-name>
          ,
          <article-title>Statistical arbitrage pairs trading strategies: Review and outlook</article-title>
          ,
          <source>Journal of Economic Surveys</source>
          <volume>31</volume>
          (
          <year>2017</year>
          )
          <fpage>513</fpage>
          -
          <lpage>545</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Rad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K. Y.</given-names>
            <surname>Low</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Faf</surname>
          </string-name>
          ,
          <article-title>The profitability of pairs trading strategies: distance, cointegration and copula methods</article-title>
          ,
          <source>Quantitative Finance</source>
          <volume>16</volume>
          (
          <year>2016</year>
          )
          <fpage>1541</fpage>
          -
          <lpage>1558</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Huck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Afawubo</surname>
          </string-name>
          ,
          <article-title>Pairs trading and selection methods: is cointegration superior?</article-title>
          ,
          <source>Applied Economics</source>
          <volume>47</volume>
          (
          <year>2015</year>
          )
          <fpage>599</fpage>
          -
          <lpage>613</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Engle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. W. J.</given-names>
            <surname>Granger</surname>
          </string-name>
          ,
          <article-title>Co-integration and error correction: Representation, estimation, and testing</article-title>
          ,
          <source>Econometrica</source>
          <volume>55</volume>
          (
          <year>1987</year>
          )
          <fpage>251</fpage>
          -
          <lpage>276</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Johansen</surname>
          </string-name>
          ,
          <article-title>Statistical analysis of cointegration vectors</article-title>
          ,
          <source>Journal of Economic Dynamics and Control</source>
          <volume>12</volume>
          (
          <year>1988</year>
          )
          <fpage>231</fpage>
          -
          <lpage>254</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Vidyamurthy</surname>
          </string-name>
          , Pairs Trading:
          <article-title>quantitative methods and analysis</article-title>
          , volume
          <volume>217</volume>
          , John Wiley &amp; Sons,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Brim</surname>
          </string-name>
          ,
          <article-title>Deep reinforcement learning pairs trading with a double deep q-network</article-title>
          ,
          <source>in: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>0222</fpage>
          -
          <lpage>0227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>Dynamic portfolio management based on pair trading and deep reinforcement learning</article-title>
          ,
          <source>in: 2020 The 3rd International Conference on Computational Intelligence and Intelligent Systems</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>T.-W. Hsu</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-C. Chen</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Huang</surname>
            ,
            <given-names>M. Chang</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Chen</surname>
          </string-name>
          ,
          <article-title>Hedging via opinion-based pair trading strategy</article-title>
          ,
          <source>in: Companion Proceedings of the Web Conference</source>
          <year>2020</year>
          ,
          <year>2020</year>
          , pp.
          <fpage>69</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tornatore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <article-title>Auto-scaling vnfs using machine learning to improve qos and reduce cost</article-title>
          ,
          <source>in: 2018 IEEE International Conference on Communications (ICC)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Sarmento</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Horta</surname>
          </string-name>
          ,
          <article-title>Enhancing a pairs trading strategy with the application of machine learning</article-title>
          ,
          <source>Expert Systems with Applications</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rudy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dunis</surname>
          </string-name>
          , G. Giorgioni,
          <string-name>
            <given-names>J.</given-names>
            <surname>Laws</surname>
          </string-name>
          ,
          <article-title>Statistical arbitrage and high-frequency data with an application to eurostoxx 50 equities</article-title>
          , Available at SSRN 2272605 (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Broumandi</surname>
          </string-name>
          , T. Reuber,
          <article-title>Statistical arbitrage and fx exposure with south american adrs listed on the nyse</article-title>
          ,
          <source>Financial Assets and Investing</source>
          <volume>3</volume>
          (
          <year>2012</year>
          )
          <fpage>5</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lütkepohl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Saikkonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Trenkler</surname>
          </string-name>
          ,
          <article-title>Maximum eigenvalue versus trace tests for the cointegrating rank of a var process</article-title>
          ,
          <source>The Econometrics Journal</source>
          <volume>4</volume>
          (
          <year>2001</year>
          )
          <fpage>287</fpage>
          -
          <lpage>310</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Iofe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          ,
          <article-title>Batch normalization: Accelerating deep network training by reducing internal covariate shift</article-title>
          ,
          <source>in: Proceedings of the 32nd International Conference on Machine Learning</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>448</fpage>
          -
          <lpage>456</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>