1. Introduction

Financial World, November

Unconverged Learning of Pairs Trading Strategies with Representation Labeling Mechanism

Wei-Lun Kuo

2 3 4 5

Tian-Shyr Dai

cameldai@mail.nctu.edu.tw 0 3 4 5

Wei-Che Chang

1 3 4 5 0 Department of Information Management and Finance, National Chiao Tung University , 1001 University Road, Hsinchu , Taiwan

300, ROC

1 Institute of Computer Science and Engineering, National Chiao Tung University , 1001 University Road, Hsinchu , Taiwan

300, ROC

2 Institute of Data Science and Engineering, National Chiao Tung University , 1001 University Road, Hsinchu , Taiwan

300, ROC

3 Recently , reinforcement learning (abbreviated as RL 4 deep learning (abbreviated as DL hereafter) all fail to con- 5 verge. To resolve this problem , we develop a represen-

2021

0 1 05

A pairs trading strategy (PTS) constructs a market-neutral portfolio whose value typically moves back and forth around a mean price level; investors short (long) the portfolio when its value reaches the upside (downside) opening threshold and close the position when the value reverts to the mean to earn the price diference. Recent machine learning models select the open and stop-loss thresholds either heuristically or chosen from a limited set, which significantly limits the investment performance. We address this by creating a wider set of open/stop-loss threshold recommendations that generally cover all possible scenarios; but regression- or classification-based deep learning methods for recommending thresholds fail to converge. Thus, we design a representative labeling mechanism that selects representative open and stop-loss thresholds from all possible optimal thresholds according to the selection frequencies of the thresholds and the -means algorithm. Experiments suggest that training the multi-scale residual network with stock pairs relabeled by representative thresholds yields better investment performance than other methods in the literature.

Pairs trading Representation labelling ResNet Opening and stop-loss triggers tuning

1. Introduction

A pairs trading strategy (abbreviated as PTS hereafter) is a popular market-neutral investment strategy introduced by Wall Street econometricians no later than the 1990s. Instead of guessing at unpredictable financial market trends, a PTS eliminates market tendency risk by simultaneously longing one stock and shorting another at a specific ratio. The net value of this long-short portfolio, referred to as the “spread”, moves back and forth around a certain mean price level without being influenced by financial market trends, as suggested by the “market-neutral” modifier. A portfolio with this mean reverting property can be constructed by finding a pair integration properties per the Johansen co-integration test [see 1]. We long (short) the portfolio when the spread is below (above) the mean price level to reach a lower (higher) opening threshold, and then close the portfolio when the spread converges to the mean level to earn the price diference. ating high quality stock pairs for PTS, recommendations for a customized threshold for each stock pair have not been well-studied. In addition, a PTS is a “statistical” MUFin21: International Workshop on Modelling Uncertainty in the tative labeling mechanism that selects 25 representative They heuristically set six overly simplistic actions, which thresholds (determined by the Elbow method) to repre- significantly limits the profitability as shown later. In adsent 300 thresholds by picking most frequently selected dition, they train each PTS-eligible stock pair with a DQN, thresholds or use the -means method. Each stock pair is which necessitates a large number of DQNs. Confirming then relabeled with a representative threshold. Our alter- their observations, we find that co-integration properties native to learn from the 300-label stock pairs is changed for most stock pairs are not durable over a long period to learn from 25-relabeled stock pairs. Experiments show of time; thus only a small amount of stock pairs conthat training a multi-scale residual network (abbreviated tain enough data to train the DQN. We instead train our as ResNet) proposed by Li et al. [ 4 ] with relabeled stock machine learning model on trading data from all stock pairs facilitates smooth and quick convergence. They pairs; the resultant model recommends thresholds for all also show that this representative labeling mechanism stock pairs. Brim [ 11 ] proposes the Double-DQN with outperforms past work. three actions, but the low win rate limits the practical

Our paper is organized as follows: Section 2 reviews value of the model. Xu and Tan [ 12 ] uses determinisPTS research and studies on relevant machine learning tic policy gradient (DPG) to predict open and stoploss models. In Section 3, we discuss the construction of opti- timing for P T S and the value weights of pairs to form a mal open and stop-loss thresholds and the representative return-maximized portfolio. Hsu et al. [ 13 ] uses several labeling mechanism adopted to address the failure to deep learning models with the opinions on social media converge. Sections 4 describes how we select and in- to predict the price movement in P T S . Threshold seleccorporate the multi-scale ResNet into our PTS trading tion problem can be found in other domain. In order to model. The experimental results in Section 5 confirm the manage dynamic network trafic, the number of Virtual superiority of our models. Section 6 concludes the paper. Network Functions(VNFs) instances need to be chosen. Rahman et al. [ 14 ] models the problem as a classification problem, and uses several machine learning models to 2. Preliminaries predict the number of VNFs.

The PTS literature largely adopts RL but here we use 2.1. Literature Review DL with representative labeling. Since our experiments [ 5 ] shows that the techniques for finding stock pairs show that a DL model with only a few layers fails to eligible for P T S can be classified into five approaches. learn eficiently and accurately, we adopt the residual Our stock pair generation method is based on the co- network (ResNet) model proposed by He et al. [ 15 ], which integration approach, as Rad et al. [ 6 ] and Huck and uses deeper layers to capture complex features/patterns Afawubo [ 7 ] argue that this approach is better than other in financial markets. He et al. [ 15 ] provide extensive approaches. Engle and Granger [ 8 ] and Johansen [ 9 ] de- empirical data demonstrating that ResNets are simpler velop diferent statistical tests to determine whether the to optimize and achieve higher learning precision due to price processes of a stock pair possess the co-integration their greater numbers of hidden layers. Furthermore, Li property; that is, there exists a linear combination of two et al. [ 4 ] extend ResNet from a single scale to multiple stock prices that make the value process of this two-stock scales by adding convolution kernels of various sizes to portfolio a stationary process. The stationary property adaptively detect data features from diferent aspects. ensures that statistical properties such as the mean of Our paper combines representative labeling with multithe value process do not change with time. Thus we can scale ResNet to yield superior investment performance. buy (sell) the portfolio when its value is below (above) the mean and cash out when the value converges back. 2.2. Co-integration Method and PTS These tests are used by Vidyamurthy [ 10 ] and Rad et al. [ 6 ] to detect stock pairs that are eligible for PTS. A trading duration—in this paper a business day—is di[ 2 ], [ 3 ], [ 11 ], and [ 12 ] use the reinforcement learning vided into a formation period and a trading period: data (R L ) method to determine opening/stoploss thresholds in the formation period is used to select PTS-eligible stock for P T S . Fallahpour et al. [ 2 ] enumerate 39 actions (i.e., pairs in the trading period. We use the co-integration ap39 combinations of open and stop-loss thresholds) and proach [ 16, 10, 17, 18, 6 ] to find eligible stock pairs from reduce the threshold selection problem to a multi-armed a stock pool, for instance, the 0050 constituent stocks bandit problem solved using a single-state RL model. Our from the Taiwan stock market. Let the -th pair be comexperiments show that this naive mechanism fails to cap- posed of stocks 1 and 2, and let the capital invested ture various properties of diferent stock pairs and is out- in these two stocks be 1 ∶ 2 (if the stock pair is eligiperformed by other approaches in terms of investment ble). We extract the logarithmic stock price processes results. Kim and Kim [ 3 ] instead use a deep Q-network ln 1() and ln 2() from the formation period to form a (DQN), which outperforms Fallahpour et al. [ 2 ]’s model. two-dimensional vector () ≡ ( ln 1(),ln 2())′. The co-integration property of () can be tested using the Johansen co-integration test [see 1] with the following vector error correction model (VECM): −1 =1 △ () = Π ( − 1) + ∑ △ ( − ) + ,

(1) portfolio where △ () ≡ () − ( − 1)

, the rank of the 2 × 2 matrix Π denotes the number of co-integration relations, − 1 denotes the VECM order, is also a 2×2 matrix, and denotes a 2×1 white noise vector. We follow Lütkepohl et al. [ 19 ] in using the power test which decomposes Π as ′

, where the 2 × 1 co-integration vector ≡ ( 1, 2

)′ determines the ratios of the capital invested in the two stocks.

If the -th stock pair 1 and 2 passes the co-integration test, then we construct a portfolio by investing the two stocks at the ratio 1 ∶ 2. The spread process of this

() ≡ 1 ln 1() + 2 ln 2() is market neutral and moves back and forth around the mean of the spread ( ()). We could also measure the variation of () by calculating its standard derivation .

If we purchase this portfolio at time and sell it at time ′, the profit (or loss) can be expressed as product of the investment amount and the diference of the spread: 2 2( ) 1( ′) 1( ) + 2 2( ′) 2( )

) [ 2( ′) − 2( )] , × ( ( ′) − ( )) = × ( 1 ln

[ 1( ′) − 1( )] + ≅ 1 1( ) where ln ( ′) ( ) over the time period [ , ′]. ( ) shares for trading at time .2

denotes the return rate for investing

denotes the numbers of

The market-neutral nature of Equation (2) allows us to long (short) the portfolio when the spread is below (above) its mean and close the position when it converges to the mean to make a profit as illustrated in Figure 1. To increase the profit in Equation ( 3), which simultaneously covers the transaction cost, we find a suitable open threshold, defined as the product of a scalar ′ and the volatility . We also find another stop-loss threshold, defined as the product of a scalar ′ and , to prevent occasional failures of the market-neutral property from seriously influencing profits. The intersection of the spread either element of the trigger pair ( ′

, ′) determines the timing to long/short the portfolio or to stop loss, respectively. Specifically, if the spread reaches the upper opening trigger (denoted by node ), then we short the portfolio with the value investment ratio 1 ∶ 2 for stocks 1 and 2. After shorting the portfolio, may

with 2We long (short) if this value is positive (negative).

A Begin

D Trading Period E(Pi(t)) 'O i

E(Pi(t)) E(Pi(t) ) 'O i End

E(Pi(t) ) 'S  i H

G (2) (3) red line, blue lines, and black lines denote the mean of () , the triggers for opening the portfolio, and the triggers for stoping losses.

The values of these triggers are listed on the right of these lines.

The orange and green curves illustrate all possible scenarios to open the portfolio and to close the portfolio. After opening the portfolio, we use dash curves and solid curves to denote that the portfolio is closed to get profit and to stop loss, respectively. The dot curve denotes that the portfolio is closed at the end of the trading period.

The period begins from time 0 and ends at time . Node and occur at time and ′, respectively. still reach node , in which case we close the portfolio to stop loss. Otherwise, it may fall to node , in which case we close the portfolio to gain a profit. On the other hand, if falls to the lower opening trigger (denoted by node ), then we long the portfolio. After longing the portfolio, may still fall to node , in which case we close the portfolio to stop loss. Otherwise, it may reach node , in which case we close the portfolio to gain a profit. Finally, the portfolio may still be open at the end of the trading period, say, the closing time of Taiwan Stock Exchange. In this case the portfolio is forced to close to avoid incurring risks to keep cross-day positions.

3. Representation Labeling Mechanism

We first describe our dataset and the preprocessing of the stock tick data, after which we discuss why it is dificult to train naive deep learning methods for regression or classification to pick feasible open and stop-loss thresholds for PTS. We propose the core idea and several variations of the representation labeling mechanism (abbreviated as RLM hereafter) to address this problem by generating “representation” thresholds and labels suitable for training deep learning methods.

3.1. Dataset and Preprocessing

The dataset used to develop and examine pair trading strategies is composed of the constituent stocks of Taiwan Top 50 ETF (0050) from January 1, 2013 to December 31, 2018. We adopt a day-trading strategy without holding positions overnight, as day trades provide 50% discounts on transaction costs,3 which significantly increases win rates and profits. Figure

2 illustrates the overall procedure of the proposed PTS; Step 1 describes the data preprocessing. We first set non-overlapping training and testing periods from 2013 to 2018. The stock tick data for each business day in the training period generates the spread features and labels needed to train the RLM model, whose performance is then verified at each business day of the testing period. Daily trading is conducted from 9:00 a.m. to 13:30 p.m. for each business day, divided into the formation period (the first 166 minutes, ignoring the beginning of the first 16 minutes) and the trading period (e.g., the rest of the business day). We use the tick data from the formation period to calculate the weighted average stock price for each minute, and use the resultant time series to construct eligible stock pairs and corresponding investment ratios based on the Johansen co-integration method, as described in Section 2.2. The feature of the -th stock pair is the spread process constructed by substituting the stock price processes of 1 and 2 during the formation period (training period) ( ) into Equation (2).

3.2. Labeling: Finding the Optimal Trigger Pair

Now we label the -th spread process in the formation period with the optimal trigger threshold ( ,

maximizes the profit when executing the PTS. Specifi ) that rises to ), as defined in Figure

( ing spread process in the trading period reaches −

) and stop loss when the process falls to − (or

ing PTS profit is calculated by Equation ( 3). Note that both open and stop-loss triggers can be any positive real number, which makes the search for the optimal trigger thresholds (or the labeling process) intractable. In the literature [ 3, 2 ] either fixed triggers are used or opselecting the opening trigger threshold ’ from set and the stop-loss trigger threshold ’ from set . In addition, cally, we long (short) the portfolio when the correspond- cost and the price slippage, the opening trigger generally 1; the correspond- range determined by 0.5 and the maximum derivation of timal triggers are found from a limited set which is de- the condition 1.5 × ’ < ’ is enforced to prevent the termined heuristically, which significantly weakens the performance as verified later. To search for the optimal two thresholds from being too close together, as such proximal thresholds increase the likelihood of closing trigger threshold over the whole solution space with- the portfolio to stop loss immediately after opening the out incurring excessive computational resources, we first collect all the spread processes of all business days in the training period, after which we define the maximum derivation for each spread process during the formation period as max ( () − E ( ())). A feasible stop-loss portfolio, which results in degraded trading performance.

In addition, we add one more combination (10, 25)with extremely high open and stop-loss thresholds to filter out stock pairs that are not suitable for trading. Then 4Here we replace minimum and maximum derivations with 3The transaction cost is 0.3% but reduces to 0.15% for day trading. nearby numbers 1.5 and 24, respectively. we trade stock pair 1 and 2 by using the spread at the trading period and the trigger threshold ( ’ , ’ ) to determine the timing for opening and stopping loss as in for training the proposed machine learning models.

There are about 300 combinations5 for opening and stoploss trigger thresholds that have been selected by at least one stock pair. Note that many aforementioned enumerated trigger thresholds are never selected by any stock pair. This is because the stock price is quoted as integral multiples of basic units (i.e., ticks) rather than continuously. Thus many trigger thresholds would not fit discrete changes of the spread process defined in Equation (2) due to discrete stock price quotes and thus will never be selected as optimal trigger thresholds. This explains why heuristically setting trigger thresholds [ 3 ] would significantly limit PTS performance. Since deriving proper trigger thresholds from discrete changes of the spread process could be dificult, thus we enumerate many thresholds and use Equation (3) to filter improper ones.

3.3. Failure of Naive Regression/Classification Deep Learning Approaches

Since the feasible ranges of both open and stop-loss thresholds are positive real numbers, it is natural to pre

, ) by adopting dict the optimal trigger threshold ( a two-output-neuron regression-based deep neural network (RDNN) using the spread and stock processes (i.e., , 1(), and 2()) during the formation period as input.

The RDNN training loss is defined as the mean square error (MSE) between the predicted thresholds and the optimal thresholds produced in Section 3.2. The training loss illustrated in Figure 3(a) does not converge as training proceeds, which shows that RDNN does not capture discontinuous relationships between open and stop-loss thresholds and profits. This is because minor shifts in either threshold can yield large changes in PTS profits, as illustrated in Figure 1. For example, a slight increment in the upper open threshold from the upper blue solid line to the blue dashed line removes the chance to short the portfolio at point for the solid orange spread and reduces the profit to zero.

Instead of predicting open and stop-loss thresholds with regression-based approaches, we could select the optimal trigger threshold from all possible thresholds generated by the method described in Section 3.4 using 5This value changes with the training set data. (a) Losses of Training DNN (b) Lossese of Training CNN (c) Losses of Training ResNet which significantly reduces the number of labels without significantly sacrificing threshold quality. Our experiments show that this mechanism improves the ResNet training accuracy and the resultant PTS investment performance.

3.4. Representation labelling

In contrast to machine-learning based work such as Kim and Kim [ 3 ] and Fallahpour et al. [ 2 ], which use RL to pose a novel deep learning model with a representation labeling mechanism to train above 25 representative trigger thresholds (determined by the Elbow method) that represent the 300 optimal thresholds determined in Section 3.2. Experimental results in Section 5 show that using representations of the optimal trigger thresholds yields performance significantly better than that of the RL approach with heuristically selected thresholds.

The representation mechanism resolves the training problem of non-convergence by reducing the number of classifications and maintains trading performance via properly selected representations. Recall that we divide formation period part and a trading period the -th spread process defined in Equation ( 2) into a

, and learn 6 to 60 trigger thresholds (i.e., actions), we pro- in Figures 4(b) and 4(c); we term the resulting representamal trigger threshold (

, )that maximizes the benefit then substitute into Equation (4) to extract the opti- in Equation (2) as inputs and the representative thresholds generated in Section 3.4 as labels, we can compare (the black nodes) does not coincide with any optimal threshold, as -means clustering calculates each cluster center by averaging; however, as mentioned above, slight shifts in the threshold (e.g., from the upper blue solid line to the dashed one in Figure 1) could yield significant changes to the profit. To prevent disturbances in lowprobability thresholds from degrading the quality of the representation labels, we apply -means on thresholds with probabilities larger than 0.1% and 0.5%, as illustrated tion label settings K m e a n ( 1 ) and K m e a n ( 2 ) , respectively. In addition, to ensure that the representative thresholds coincide with an optimal threshold, we could alternatively choose as representation thresholds those trigger thresholds with the top 25 highest probabilities (denoted by the orange nodes), as shown in Figure 4(d); we term this the HighFreq label setting. In Section 5, we show how these representation labeling mechanisms outperform previous approaches.

4. Learning Model Constructions

Given the stock and spread prices processes determined be viewed as the label for for trading the stock pair 1 and 2. Then ( , ) can ; the trigger threshold distributions are illustrated in Figure 4(a). Pink, yellow, green, and blue reflect the magnitude of the probability for choosing a threshold as the optimal threshold. By excluding thresholds with probabilities lower than 0.1% and 0.5%, we obtain Figures 4(b) and 4(c). We observe that the trigger threshold distribution is widespread and far from uniform. In addition, for some thresholds, the probability of selecting them (denoted by pink or yellow nodes) as optimal thresholds is much higher than that of other thresholds. This significant lack of smoothness could explain why regression-based DL fails to converge in Figure 3(a).

We address this lack of training convergence problem by setting representative trigger thresholds via either means or by using the thresholds with the top- highest probabilities. In the first method, we partition all trigger thresholds into a reasonable number of clusters by the means algorithm; the cluster number 25 is determined by the Elbow approach. The set of representation thresholds R are defined as the centers of aforementioned clusters, and we call this label setting as K m e a n ( 0 ) . Then the threshold for the -th spread process is relabeled by picking one of these representation thresholds that maximize profit as follows: (

, ) ≡ argmax( ′ , ′)∈R [Profit ( , ′ , ′)].

(5) Note that each representation threshold selected by K m e a n ( 0 ) the investment performance of several deep learning models and select the best one as the model used in step 5 of Figure 2. The input = [ 1 , 2, ] is formed by the price processes of the -th stock pair 1 and 2 in the for mation period and the corresponding spread process determined in Equation (2). The input with length 300 (i.e., the number of half-minute data in the 150-minute formation period) is extended to 512 by padding the remaining positions with zeros. We number each of the 25 representative thresholds with a unique integer from the range [ 1, 25 ]. The number for the representative threshold recommended by the representative label mechanism (see Section 3.4), , is used as the label for the stock pair . We train the “plain” CNN (i.e., without adopting ResNet), the single-scale ResNet [ 15 ], and the multi-scale ResNet [ 4 ] with input and ground truth for each stock pair from the training period. The CNN includes a one-dimensional convolutional layer with three channels (the spread and the two stock price processes) and 25 1 × 5 kernel maps. The output is sent to the batch normalization layer [ 20 ] to stabilize and speed up the training process; we use Leaky-ReLU activations. The results are passed through a one-dimensional convolutional layer with 50 kernel maps, a layer with 100 kernel maps, and a layer with 200 kernel maps, sequently; and the final outputs are then sent to a fully connected layer. The single-scale ResNet uses one size-3 convolution kernel, which applies to one chain of residual blocks. The three-scale ResNet adds size-5 and size-7 convolution (a) The Distribution of Trigger Thresholds (b) Trigger Thresholds Excluding the Thresholds with Probabilities Lower than 0.1% (c) Trigger Thresholds Excluding the Thresholds with Probabilities Lower than 0.5% (d) Trigger Thresholds and the Thresholds with Top-25 Highest Probabilities 100 80 rcayu acc 60 g n ii an rT 40 20 Multi-scale ResNet CNN

Single-scale ResNet 0 20 40 60 80

100

Epoch 0 20 40

60 Epoch

Multi-scale ResNet CNN Single-scale ResNet 80 100 kernels and two corresponding chains of blocks.6 The features extracted by the three convolution kernels (i.e., the outputs from the three chains of residual blocks) are concatenated to form a feature vector which is then sent to a fully connected network.

The training results for these three models are shown in Figure 5. The training accuracy measures the percentage of correct predictions of all pairs in the training set; training loss is measured by cross entropy. The training accuracy for the CNN model, denoted by the orange curves, increases slowly, while the training loss oscillates significantly, which renders this model impractical. Thus we use a residual network, which employs hidden layers to capture complex features in financial markets. Although both the single- and three-scale ResNet achieve almost 100% accuracy and 0% loss after enough numerous training epochs, the latter mechanism converges more smoothly and quickly. Thus we adopt the three-scale ResNet in the following experiments.

To determine the number of training epochs, the data are divided into the training set and the validation set. We train the model on the training set data and run the resulting model on the validation dataset to calculate the accuracy and loss. To fairly retrieve useful information from the training dataset without overfitting, training is 6The structure of ResNet can be found in https://github.com/ geekfeiw/Multi-Scale-1D-ResNet. The convolution kernel sizes 3, 5, and 7 are suggested by that website.

5. Empirical Tests

halted when the win rate of the validation set reaches a maximum.

The maximum drawdown is the maximum cumulative daily loss during the testing period. The win rate is deifned as the number of profit-making trades divided by the total number of trades made in the testing period.

The normal close rate is defined as the number of trades whose spread process converges back to the mean7 divided by the total number of trades. The profit per open is the average profit for each trade.

In the experiments in Section 5.1, we first compare the various DL methods and representative labeling methods discussed in Sections 4 and 3.4. We find that combining multi-scale R e s N e t and K M e a n ( 0 ) (or H i g h F r e q ) produce best investment results; thus we will use these settings in the following experiments. Section 5.2 demonstrates that the proposed mechanism for representative thresholds outperforms past threshold selection mechanisms.

We conducted experiments on the Taiwan Top 50 ETF component stocks from 2013 to 2018 to back-test improvements in PTS performance due to the proposed representative labeling mechanisms. As illustrated in Figure 2, information on stock pair eligibility and investment ratios is obtained by applying the Johansen co-integration test on half-minute average stock price data during the formation period. We then label the optimal trigger threshold for each stock pair (Sec. 3.2), relabel each pair with a representative threshold (Sec. 3.4), and train the ResNet model with stock pairs and representative labels retrieved from each training day in the training period. To evaluate the trading performance, 5.1. Selection of Learning Models and we extract each trading day from the testing period, Representative Labeling retrieve stock pairs by applying the Johansen test to the Mechanisms formation period of day , and predict the representative trigger threshold for each pair using the trained ResNet. To ensure the eficiency of training described in step 5 We then use the retrieved stock pair and the threshold of Figure 2, it is necessary to select the proper machine to execute tick-by-tick pair trading in the day’s trading learning models and representative labeling mechanisms. period. The transaction tax is set to 0.15%, as defined in Table 1 compares the performance when training with the Taiwan Stock Exchange for day trading. To simulate CNN, the single-scale ResNet, and the multi-scale ResNet. price slippage efects, all trades are executed one tick The unstable, slow convergence of CNN clearly yields after the spread process hits the trigger threshold. poor results. Multi-scale ResNet outperforms single-scale

To compare the trading performance of diferent PTS ResNet, as applying more convolution kernels with difover the trading period, we list the (overall) profit, the ferent sizes extracts more information from the input win rate, the normal close rate, the number of trades, the data. Accordingly, in subsequent experiments we use the Sharpe ratio (SR) calculated on the daily base or the pair multi-scale ResNet as the training model. base, the maximum drawdown (MDD), the (maximum) Table 2 compare diferent representative labeling mechrequired capital, and the average profit (per trade), as anisms proposed in Section 3.4. In column 4, K M e a n ( 0 ) , illustrated in the first column of each table. The (overall) K M e a n ( 1 ) , and K M e a n ( 2 ) denote the representative label profit is the sum of the daily profit (or loss) for all trading settings that applying the -mean methods on total opdays in the testing period, where the profit of day is timal thresholds (see Figure 4(a)), the optimal threshthe sum of the profits when trading all PTS-eligible stock olds selected with probability larger than 0.1% (see Figpairs on the day. The profit of each trade is calculated in ure 4(b)), and 0.5% (see Figure 4(c)), respectively. H i g h F r e q Equation (3). The required capital for day is measured picks the trigger thresholds with top 25 highest probabilas the sum of the capital required to execute each PTS on ities as in Figure 4(d). the day. The maximum required capital is defined as the We observe that both the win rate and the normal close maximum of the required capital for each trading day in rate are high for these label mechanisms, as the spread the testing period. The daily (pair) return is then calcu- processes after applying the co-integration test described lated as the daily (pair) profit divided by the maximum in Section 2.2 are likely to have the mean reverting proprequired capital (the capital required to trade the pair). erty. This suggests that a mechanism with large total The Sharpe ratio, which estimates the excess investment opening numbers yields high profits and Sharpe ratios. return divided by the corresponding risk, is calculated Also, the total open number for KMean(0) is the highest either on a daily basis as of the four mechanisms as it does not exclude information Daily return − Risk-free return from other trigger thresholds with lower probabilities.

, However, unlike representative thresholds produced by

Standard derivation of daily return -means, which typically do not coincide with optimal or on a pair basis as

Pair return − Risk-free return Standard derivation of pair return .

7That is, the portfolio is neither closed to stop loss (like nodes of ) or forced to close (like node ) as illustrated in Figure 1. The training period, the validation period, and the testing period are listed in the first, second, and the third row, respectively. The performance of the CNN model (CNN), the single-scale ResNet (S-ResNet), and the multi-scale ResNet (M-ResNet) are compared with the performance indicators listed in the first column. The training data are the price processes of spreads and the two stocks (of the pair). Representative thresholds are generated by H i g h F r e q . SR and MDD are abbreviations of Sharpe ratios and maximum drawdown, respectively. thresholds due to the average calculation, every thresh- the co-integration method yields more pairs that are eligiold recommended by HighFreq is directly an optimal ble for trading (i.e., have larger total open numbers) than threshold with the highest 25 probabilities. Absent the both TLS and OLS; thus the overall profit, the Sharpe radisturbances on opening/stop-loss thresholds, HighFreq tios, and the average profit of the co-integrated method yields better pair-based results (i.e., SR (pair) and profit are all significantly better. To fairly compare all machine per open) than other -mean-based mechanisms. Since learning methods for threshold selections, for subsequent KMean(0) and HighFreq possess advantages in diferent experiments we generated pairs and investment weights aspects, subsequent experiments use either KMean(0) or using the co-integrated method.

HighFreq for comparison. Table 4 compares diferent threshold selection mechanisms with our representative labeling mechanism. Fal5.2. Comparisons among Past Works lahpour et al. [ 2 ] reduce the threshold selection problem to a multi-armed bandit problem and solve it using a The statistical tests used to determine eligible stock pairs reinforcement learning model with 39 actions (i.e, open and investment weights (see Equation (2)) significantly and stop-loss thresholds) generated by Equation (4) with influence PTS performance, as illustrated in Table 3. In ad- a much narrower set ∈ {0.5, 1, ⋯ 3} and ∈ {0.5, 1, ⋯ 5} ditional to the co-integration test described in Section 2.2, (denoted by method 1). Kim and Kim [ 3 ] use deep reKim and Kim [ 3 ] propose the ordinary least squares (OLS) inforcement learning to select one of six heuristicallyand total least squares (TLS) methods. The win rate and generated actions for trading (denoted by method 4). normal close rate of TLS and OLS are relatively low and Note that as the number of actions (the threshold choices) the overall profits from 2016 to 2018 and the average in their papers is relatively small, their models do not sufprofit (for each trade) are nearly all negative. In addition, forms DRL in almost every aspect, even though DRL rec- during training when using regression-based DL models representative thresholds. Increasing the number of rep- labels from harming training performance, we relabel fer from the training non-convergence problem described in Section 3.3. However, limiting the number of actions (or threshold choices) also limits PTS performance.

For a fair comparison with the six actions of the DRL approach [ 3 ], we add the HighFreq performance with six representative thresholds as method 3: HighFreq outperommends more trading opportunities (i.e., has a higher total open number) which requires higher (daily) trade capital. However DRL’s low win rate translates to lower overall profits and SR metrics than HighFreq with six resentative thresholds from 6 to 25 (denoted as method 2) increases both the win rate and the total open number, and also improves profit and SR, and reduces risk (proxied by MDD). In addition, the naive reinforcement learning model proposed by Fallahpour et al. [ 2 ] performs poorly with a win rate lower than 50% and negative profits. If transaction costs are ignored, as in their experiments, the profit of their RL model becomes positive; thus their model fails to find proper thresholds to filter out unprofitable trades due to transaction costs. Generally speaking,

6. Conclusions

To improve PTS investment performance, we adopt a supervised learning paradigm to recommend feasible opening and stop-loss triggers that difer greatly from existing RL-based approaches. To address the lack of convergence to learn trigger thresholds, we reformulate the problem as a classification problem as follows. We first label each spread process with an optimal trigger pair that maximizes the trading profit. To avoid a huge number of each spread process with our proposed representative labeling mechanism. Then we train the multi-scale R e s N e t with stock pairs relabeled by representative thresholds. Experimental results show that our proposed approach outperforms other existing approaches in terms of investment performance measures.

Acknowledgments

We thank the Ministry of Science and Technology (MOST) performs other relevant work. the proposed representative threshold mechanism outfor supporting our work under 109-2622-H-009 -001

[1]

Johansen , Likelihood-Based Inference in Cointegrated Vector Autoregressive Models , Oxford University Press, 1995 .

[2]

Fallahpour ,

Hakimian ,

Taheri , E. Ramezanifar, Pairs trading strategy optimization using the reinforcement learning method: a cointegration approach , Soft Computing 20 ( 2016 ) 5051 - 5066 .

[3]

Kim ,

H. Y.

Kim , Optimizing the pairs-trading strategy using deep reinforcement learning with trading and stop-loss boundaries , Complexity 2019 ( 2019 ) 1 - 20 .

[4]

Li ,

Fang ,

Mei , G. Zhang, Multi-scale residual network for image super-resolution , in: Proceedings of the European Conference on Computer Vision (ECCV) , 2018 , pp. 517 - 532 .

[5]

Krauss , Statistical arbitrage pairs trading strategies: Review and outlook , Journal of Economic Surveys 31 ( 2017 ) 513 - 545 .

[6]

Rad ,

R. K. Y.

Low ,

Faf , The profitability of pairs trading strategies: distance, cointegration and copula methods , Quantitative Finance 16 ( 2016 ) 1541 - 1558 .

[7]

Huck ,

Afawubo , Pairs trading and selection methods: is cointegration superior? , Applied Economics 47 ( 2015 ) 599 - 613 .

[8]

R. F.

Engle ,

C. W. J.

Granger , Co-integration and error correction: Representation, estimation, and testing , Econometrica 55 ( 1987 ) 251 - 276 .

[9]

Johansen , Statistical analysis of cointegration vectors , Journal of Economic Dynamics and Control 12 ( 1988 ) 231 - 254 .

[10]

Vidyamurthy , Pairs Trading: quantitative methods and analysis , volume 217 , John Wiley & Sons, 2004 .

[11]

Brim , Deep reinforcement learning pairs trading with a double deep q-network , in: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC) , IEEE, 2020 , pp. 0222 - 0227 .

[12]

Xu ,

Tan , Dynamic portfolio management based on pair trading and deep reinforcement learning , in: 2020 The 3rd International Conference on Computational Intelligence and Intelligent Systems , 2020 , pp. 50 - 55 .

[13] T.-W. Hsu , C.-C. Chen , H. -H. Huang , M. Chang Chen , H. -H. Chen , Hedging via opinion-based pair trading strategy , in: Companion Proceedings of the Web Conference 2020 , 2020 , pp. 69 - 70 .

[14]

Rahman ,

Ahmed ,

Huynh ,

Tornatore ,

Mukherjee , Auto-scaling vnfs using machine learning to improve qos and reduce cost , in: 2018 IEEE International Conference on Communications (ICC) , IEEE, 2018 , pp. 1 - 6 .

[15]

He ,

Zhang , S. Ren,

Sun , Deep residual learning for image recognition , in: Proceedings of the IEEE conference on computer vision and pattern recognition , 2016 , pp. 770 - 778 .

[16]

S. M.

Sarmento ,

Horta , Enhancing a pairs trading strategy with the application of machine learning , Expert Systems with Applications ( 2020 ).

[17]

Rudy ,

Dunis , G. Giorgioni,

Laws , Statistical arbitrage and high-frequency data with an application to eurostoxx 50 equities , Available at SSRN 2272605 ( 2010 ).

[18]

Broumandi , T. Reuber, Statistical arbitrage and fx exposure with south american adrs listed on the nyse , Financial Assets and Investing 3 ( 2012 ) 5 - 18 .

[19]

Lütkepohl ,

Saikkonen ,

Trenkler , Maximum eigenvalue versus trace tests for the cointegrating rank of a var process , The Econometrics Journal 4 ( 2001 ) 287 - 310 .

[20]

Iofe ,

Szegedy , Batch normalization: Accelerating deep network training by reducing internal covariate shift , in: Proceedings of the 32nd International Conference on Machine Learning , 2015 , pp. 448 - 456 .