Research on DQN Stock Trading Strategy Based on Investor's Compound Sentiment 1

Research on DQN Stock Trading Strategy Based on Investor's Compound Sentiment 1 NingjingYang a*yangningjing@wust.edu.cn Science College Wuhan University of Science and Technology

430065 Wuhan China

DeyiLi lideyi@wust.edu.cn Science College Wuhan University of Science and Technology

430065 Wuhan China

Hubei Provincial Key Laboratory of Metallurgical Industry Process System Science Wuhan University of Science and Technology

430081 Wuhan China

YichengGong gongyicheng@wust.edu.cn Science College Wuhan University of Science and Technology

430065 Wuhan China

Hubei Provincial Key Laboratory of Metallurgical Industry Process System Science Wuhan University of Science and Technology

430081 Wuhan China

GuiciChen chenguici@wust.edu.cn Science College Wuhan University of Science and Technology

430065 Wuhan China

Hubei Provincial Key Laboratory of Metallurgical Industry Process System Science Wuhan University of Science and Technology

430081 Wuhan China

Research on DQN Stock Trading Strategy Based on Investor's Compound Sentiment 1 F6590F030DCFC40BA39F7719B11EE478 GROBID - A machine learning software for extracting information from scholarly documents stock trading sentiment analysis VADER reinforcement learning

Both individual investor and institutional investor sentiment affect investors' stock trading decisions. Most existing investor sentiment analyses ignore institutional investor sentiment and treat individual investor sentiment as overall investor sentiment. Quantifying the two types of investor sentiment and expressing them as comprehensive investor sentiment, then exploring their impact on stock investment strategies are conducive to optimizing investment decisions and their investment returns. Based on individual and institutional investors in the stock market, this paper proposes a stock composite sentiment score to measure overall investor sentiment through the text mining method and VADER sentiment analysis. It constructs a DQN single stock trading model based on the composite investor sentiment score using the reinforcement learning DQN algorithm. Through experimental comparison with buy-and-hold and DQN strategies on real stock data, the results show that investor sentiment can effectively optimize stock trading strategies and improve investment returns; compared with individual investor sentiment, comprehensive investor sentiment is better optimized.

Introduction

The efficient market hypothesis holds that: market participants are all rational economic persons, the prices in the financial market have already reflected all market information, and rational economic persons will make reasonable decisions based on stock prices. However, the efficient market hypothesis cannot explain some phenomena, such as the equity premium puzzle and the herding effect. Such phenomena suggest that investors' behavior is often influenced by imitative learning among groups and emotional contagion and that changes in investor sentiment can also affect stock prices and trading volumes. Therefore, studying investor sentiment is also important for studying the financial market.

The measurement of investor sentiment in the financial market is an evolving topic. Initially, researchers usually chose a single indicator to reflect investor sentiment; Fisher [1] and Schmeling [2] used the consumer confidence index as an indicator of investor sentiment and found that investor sentiment could predict stock market returns to a certain extent. Chinese scholars mostly use the CCTV watch index to measure investor sentiment. Gao [3] found that investor sentiment is related to short-term market returns. Subsequent studies have found that a single indicator is not representative enough to reflect investor sentiment. Scholars have tried to construct investor sentiment through multiple basic indicators. Baker [4] constructed an emotion index through principal component analysis using 6 basic indicators. Liu [5] also used multiple market variables to constitute an investor sentiment index and showed a positive effect of sentiment on the market reflection of surplus announcements through empirical analysis. The sentiment indicators expressed through the composite of multiple indicators result from the mutual equilibrium of multiple macro variables other than sentiment, which expresses the overall market sentiment and cannot directly represent investors' sentiment.

With the development of technology, more and more financial investors are gathering in financial forums and expressing their opinions. Through text analysis, researchers can extract investor sentiment directly from forum posts. Smailovic [6] and Bollen [7] showed that sentiment analysis of large Twitter text datasets could effectively predict stock market movements. Li [8][9] constructed a quantitative trader that uses publicly available online news and social media data and company-specific news sentiment data to predict stock price movements. Picasso [10] combined sentiment and technical analysis indicators analyzed from news articles to build a robust stock price prediction model. These studies extracted investor sentiment directly from social media for financial markets through sentiment analysis and achieved good results. However, these studies only focus on the sentiment of individual investors, ignoring the sentiment of institutional investors.

In recent years, deep reinforcement learning has been successfully applied to optimize stock trading strategies and portfolio allocation. Xiong [11] used the deep deterministic strategy gradient algorithm to conduct stock trading, which significantly improved the trading profit. Carta [12] proposed a method of repeatedly training DQN agents to reduce strategic risk and maximize investment return. Xu [13] proposed a deep reinforcement learning automated trading algorithm that combines CNN and experimental results on real stock data, showing that the method significantly outperforms other benchmark methods. Prahlad [14] proposed an adaptive deep reinforcement learning method to train agents to allocate their portfolios. This method not only uses historical stock price data but also senses the market sentiment of the portfolio. Experiments show that this method has a more robust investment return than the existing baseline. Existing research shows that reinforcement learning algorithm has been successfully applied to simulate stock trading, optimize stock trading strategies.

Financial investors are mainly composed of individual and institutional investors., this paper investigates the impact of investor sentiment on the trading strategy of single stock based on these two types of investors. This paper uses the reinforcement learning DQN algorithm to simulate the trading behavior of a single stock. It compares and analyzes the impact of single individual investor sentiment and composite investor sentiment on the stock trading strategy.

DQN stock trading model based on composite investor sentiment

In order to study the influence of investors' composite sentiment on stock trading strategies and investment returns, this paper combines sentiment analysis and reinforcement learning to construct a DQN single stock trading model (named DQN-CE) based on investors' composite emotions.

Fig.1 Framework of DQN stock trading model based on composite investor sentiment

The model shown in Fig. 1 can be divided into two modules, the investor sentiment quantification module and the reinforcement learning stock trading module. The investor sentiment quantification module is used to measure the overall sentiment of investors. In this module, VADER sentiment analysis is used for the relevant texts of individual and institutional investors to calculate the individual investor sentiment score 𝑧 and institutional investor sentiment score 𝑧 . The composite investor sentiment score sequence {𝑒 } is constructed based on the two types of investor sentiment scores. The sequence {𝑒 } will input the reinforcement learning stock trading module with stock price data. The reinforcement learning stock trading module simulates stock trading behavior and learns stock trading strategies.

Investor sentiment quantification module

VADER is based on a vast dictionary containing sentiment intensity scores for thousands of words, punctuation marks, and web terms. The score of each word in the text is evaluated by querying the intensity score in the dictionary. The sentiment score of the complete text can be obtained by weighting. It usually uses a compound sentiment score to indicate the sentiment tendency of the whole text. A compound sentiment score of -1 is the most negative, and 1 is the most positive.

As shown in the quantitative investor sentiment module in Fig. 1, the composite investor sentiment score 𝑒 is expressed as the mean of the individual investor sentiment score and the institutional investor sentiment score.

𝑒 = 𝑧 + 𝑧 2 , 𝑒 ∈ [−1,1](1)

In equation ( 1), 𝑧 and 𝑧 respectively represent the sentiment scores of individual investors and institutional investors of a single stock 𝑖 on day 𝑡. The composite sentiment score 𝑒 represents the overall sentiment of all investors towards stock 𝑖, which is more representative than the sentiment of a specific type of investor. The daily composite sentiment score of single stock constitutes a sequence {𝑒 }.

The text of individual investor sentiment analysis for the single stock is obtained from the respective stock forum on the Eastern Wealth website. All posts were grouped by date to calculate the sentiment score, and the sentiment scores of posts are weighted according to the number of reads and comments on the posts. As equation (2) calculates the individual investor sentiment score 𝑧 for day 𝑡.

𝑧 = ∑ log 𝑎 + 𝑏 * 𝑧 ∑ log 𝑎 + 𝑏(2)

In equation ( 2), 𝑛 denotes the number of posts about stock 𝑖 on day 𝑡, 𝑎 , 𝑏 denotes the number of reads and comments on the post, respectively, and 𝑧 denotes the sentiment score of the stock comment. The individual investor sentiment score 𝑧 represents the overall sentiment tendency of individual investors towards stock 𝑖 on day 𝑡.

The text for institutional investor sentiment analysis is derived from individual stock research reports on stock reporting websites. The complete research report is too lengthy to facilitate analysis. Therefore, only the report titles are used for sentiment analysis to calculate the institutional sentiment score 𝑧 , as shown in equation (3).

𝑧 = ∑ * 𝑧 𝑛(3)

In equation ( 3), 𝑛 denotes the number of stock research reports issued for single stock 𝑖 on day t, and 𝑧 denotes the sentiment score of the research report. The institutional investor sentiment score indicates the sentiment tendency of all institutional investors towards stock 𝑖 on day 𝑡.

Reinforcement learning based stock trading module

In the reinforcement learning stock trading module in Fig. 1, the DQN network receives information about the stock and makes trading actions according to the trading rules. The environment and trading rules of the real stock market are too complex for agent to get a glimpse of the whole picture. In order to facilitate the agent to learn stock trading strategies quickly, we propose three assumptions to simplify the stock trading environment.

Assumption (1): The amount of funds and stocks owned by the agent is limited, which is not enough to affect the stock trading environment.

Assumption (2): An agent can choose to trade once a day, choosing to buy, hold or sell the current stock, and the stock price is based on the closing price of the day.

Assumption (3): The number of shares per trade is the number of all shares held by the agent.

After limiting the assumptions about a single stock's trading environment and trading rules, we formalize the stock trading process as a Markov decision process, expressed as (s, a, p, r, γ), γ represents the discount factor.

Each state s in the state set 𝑆 is represented as a tuple (𝑣 , ℎ , 𝑏 , 𝑒 ). 𝑣 denotes the closing price of stock i on day t, 𝑣 ∈ 𝑅 ; ℎ denotes the number of stock i held by the agent on day t,ℎ ∈ 𝑅 ; 𝑏 denotes the agent's remaining cash balance, 𝑏 ∈ 𝑅 . 𝑒 is the composite sentiment score of stock i Ialculated by the quantitative investor sentiment module.

𝐴 is action space of an agent , 𝑎 ∈ {−1,0,1}, where

𝑎 = −1；sell 0； hold 1； buy(4)

The choice of trading behavior is based on greedy rules, and the model initially sets a greedy value ε. The program randomly generates a number that obeys the uniform distribution within the range of (0, 1) and compares it with ε. If the random number is less than the greedy value, the action with the largest Q value of the corresponding state action among the three actions is selected as the trading action; otherwise, any one of {-1, 0, 1} is selected as the trading action.

The reward is the immediate reward received by the agent after executing the transaction action. The agent adjusts the selection of subsequent transaction operations according to the received reward value. Moreover, use the accumulated rewards between all trading dates to measure the pros and cons of the trading strategy. Considering the impact of stock price changes and investor sentiment on stock trading strategies, the rewards of the model are defined as two parts, as shown in equation (5).

𝑟 = 𝑣 − 𝑣 𝑣 + 0.1𝑎 * 𝑒(5)

In equation ( 5), is the reward part of stock 𝑖. 𝑣 is the closing price of the stock i on day 𝑡, 𝑣 is the closing price on day 𝑡 + 1. 0.1𝑎 * 𝑒 is the sentiment reward part of the stock i, which is controlled by a factor of 0.1 so that the emotional reward part of the investor and the return reward part have the same size. Agents can also adjust strategies through the reward of investors' sentiment components.

The action value function 𝑄 (𝑠, 𝑎) represents the cumulative expected value obtained by selecting action 𝑎 following the policy π in the state s, which is expressed as equation (6).

𝑄 (𝑠, 𝑎) = 𝐸 ~ [𝑟(𝑠, 𝑎, 𝑠 ) + 𝛾𝑄 (𝑠 , 𝑎 )] (6)

The goal of Q-learning is to make 𝑄 (𝑠, 𝑎) maximization, i.e., maximization of the expected value of the future return of a given state-action pair, as shown in equation (7).

𝑄 * (𝑠, 𝑎) = 𝐸 ~ [𝑟(𝑠, 𝑎, 𝑠 ) + 𝛾𝑚𝑎𝑥 𝑄(𝑠 , 𝑎 )] (7)

As shown in Fig. 1, the agent composed of two networks continuously interacts with the environment to generate experience data (𝑠, 𝑎, 𝑟, 𝑠 ) and store it in the experience replay. When a sufficient amount of experience data is stored in the experience replay, a small batch of data is randomly selected from the experience replay to train the neural network. 𝑄(𝑠, 𝑎; 𝜃) is the output of the main Q-network, which is used to evaluate the value function of the current state action pair, and the parameter θ of 𝑄(𝑠, 𝑎; 𝜃) is updated in real-time; 𝑄(𝑠, 𝑎; 𝜃 ) represents the output of the target Q-network, which is used to calculate the objective function, as shown in equation (8).

𝑦 = 𝑟 + 𝛾𝑚𝑎𝑥 𝑄(𝑠 , 𝑎 ; 𝜃 ) (8)

Thus when the agent takes action 𝑎 in the environment, the loss function L(θ) can be calculated, as shown in equation (9).

𝐿(𝜃) = 𝐸 𝑟 + 𝛾𝑚𝑎𝑥 𝑄(𝑠 , 𝑎 ; 𝜃 ) − 𝑄(𝑠, 𝑎; 𝜃) (9)

According to the loss function L(θ) in equation ( 9) calculate the gradient of the error function for updating the parameters θ of the main Q-network. The main Q-network copies its parameters θ to the target Q-network 𝜃 after a certain number of iterations, thus completing the learning process once.

Experiments and results analysis 3.1 Experimental setup

The experiments used Python 3.7 to implement the algorithm model, the PyTorch function module to train the deep learning network, and the reinforcement learning DQN agent; the python matplotlib 3.1 library to visualize the data, and the Vader function module in the NLTK library to calculate the sentiment score. The transaction cost is set at 0.05% of the transaction amount. mini-batch size is equal to 512 and the initial learning rate is 0.01.

The experiment data comes from Yahoo Finance, and selects four stocks that have attracted much attention in the A-share market. They are Sany, Gree Electric, China Merchants Bank and UNIS. The data of all stocks are divided into two parts, the data from January 2018 to December 2020 are used as the training dataset for model training (denoted as 𝑀 ); the data from January 2021 to January 2022 were used as the test dataset (denoted as 𝑀 ) for model back-testing.

To explore the effect of investor sentiment on stock trading strategies and investment returns, a DQN stock trading model based on individual investor sentiment (denoted as DQN-SE) and a DQN stock trading model based on investor sentiment score (denoted as DQN-CE) are constructed, respectively. In order to compare the effectiveness of the models, they are compared with two benchmark strategies, the buy-and-hold strategy (denoted as B&H) and the DQN algorithmic trading strategy (denoted as DQN), and the models are evaluated using the cumulative stock returns and Sharpe ratios.

Experimental results and analysis

In the DQN-SE model, only the influence of individual investor sentiment on stock trading strategies and returns is considered. The individual investor sentiment score is used as a state component to replace the composite investor sentiment in equation ( 5 As can be seen in Fig. 2 and Fig. 3, the stock prices of Sany and GREE showed a clear downward trend, and all four trading strategies showed losses with negative returns. Among them, the return curve of DQN-CE trading strategy is higher than that of other strategies, with the smallest loss and the largest loss of B&H strategy.

Fig.4 UNIS cumulative return

As shown in Fig. 4, the overall trend of UNIS stock price is up, and all four trading strategies show profits. The return curve for trading with the DQN-CE strategy is the highest, with a return of over 30%, the DQN-SE strategy also has a return of over 20%, and the DQN and B&H strategies both have a return of around 10%. Compared to the B&H strategy, the three trading strategies DQN, DQN-SE, and DQN-CE had more returns on three stocks, Sany, Gree Electric, and UNIS, but the B&H strategy had the highest return curve on the China Merchants Bank stock. China Merchants Bank's stock is a high market capitalization stable stock, and the stock price has been showing a more stable upward trend during the training period, showing a cyclical, more significant fluctuation during the testing period; there is an inevitable lag between individual investors sentiment and the trend of stock price movement, so it isn't easy to profit in the stock market.

Compared to the DQN strategy model without investor sentiment analysis, the DQN-SE and DQN-CE models with investor sentiment analysis have higher return curves on all four stocks, with the DQN-CE model having the highest return curve. The results suggest that investor sentiment has an ameliorating effect on stock investment returns and that the use of composite investor sentiment has a more pronounced effect than individual investor sentiment.

After analyzing the model's investment return performance, the performance of different trading strategies is measured using the Sharpe ratio, which combines investment return and investment risk and describes the excess return that an investor can earn per unit of risk taken. As can be seen from Tab.1, for the stocks of Sany and GREE, whose share prices continue to fall and are in a loss-making state, the DQN-CE strategy reduces losses while reducing trading risk. For UNIS, whose stock price is in an uptrend, the trading return using the DQN-CE strategy is 35.52%, and the trading risk is also the lowest. As for China Merchants Bank, which has a volatile stock price, using investor sentiment does not optimize the trading strategy and improve investment returns, using the traditional B&H strategy would have generated higher investment returns. Compared to the DQN-SE strategy, the DQN-CE algorithmic strategy performed better on four stocks, with an average improvement in returns of about 8.16%.

Tab.1 Performance evaluation of the four models

From the experimental results, the DQN-CE strategy performs better compared to DQN-SE. The DQN-CE strategy has better performance on stocks with a clear trend of stock price changes, so the DQN-CE strategy is more suitable for stocks with a clear trend of movements. The DQN-CE strategy is less applicable to stocks whose stock prices fluctuate in a range with no obvious trend.

Conclusion

This paper combines two types of investor sentiment using composite investor sentiment score to measure investor sentiment in the stock market, simulates stock trading through the reinforcement learning DQN algorithm, and conducts empirical analysis using real data from four stocks. The results show that: (1) for stocks with more obvious stock price change trends, investor sentiment analysis is helpful in optimizing stock trading strategies and reducing investment risks. (2) Compared to single individual investor sentiment, composite investor sentiment is more conducive to optimizing trading strategies and improving investment returns.

), with the return set to 𝑟 = + 0.1𝑎 * 𝑧 . The models DQN-SE, DQN-CE, DQN were back-tested on the test set 𝑀 after 1000 rounds of training on the training set 𝑀 .

Fig. 2 Fig. 323Fig.2 Sany cumulative return

Fig. 55Fig.5 China Merchants Bank cumulative return As shown in Fig.5, the return curves of the four trading strategies on China Merchants Bank stocks vary widely, with the DQN-CE strategy, DQN strategy, and B&H strategy showing positive returns and the B&H strategy gaining the most; trading with the DQN-SE strategy shows negative returns.Compared to the B&H strategy, the three trading strategies DQN, DQN-SE, and DQN-CE had more returns on three stocks, Sany, Gree Electric, and UNIS, but the B&H strategy had the highest return curve on the China Merchants Bank stock. China Merchants Bank's stock is a high market capitalization stable stock, and the stock price has been showing a more stable upward trend during the training period, showing a cyclical, more significant fluctuation during the testing period; there is an inevitable lag between individual investors sentiment and the trend of stock price movement, so it isn't easy to profit in the stock market.Compared to the DQN strategy model without investor sentiment analysis, the DQN-SE and DQN-CE models with investor sentiment analysis have higher return curves on all four stocks, with the DQN-CE model having the highest return curve. The results suggest that investor sentiment has an ameliorating effect on stock investment returns and that the use of composite investor sentiment has a more pronounced effect than individual investor sentiment.After analyzing the model's investment return performance, the performance of different trading strategies is measured using the Sharpe ratio, which combines investment return and investment risk and describes the excess return that an investor can earn per unit of risk taken.

Consumer Confidence and Stock Returns FisherKenneth LStatman M The Journal of Portfolio J Investor Sentiment and Stock Returns: Some International Evidence MSchmeling Journal of Empirical Finance 16 3 2009 J Equity Transfers and Market Reactions LGao J]. Journal of Emerging Market Finance 7 3 2008 Investor Sentiment and the Cross-Section of Stock Returns MBaker JWurgler The Journal of Finance 61 4 2006 J Corporate governance, investor sentiment and surplus response JLiu YLWen JLXu J]. Statistics and Decision Making 36 07 2020 Stream-based Active Learning for Sentiment Analysis in the Financial Domain JSmailović MGrčar J </analytic> <monogr> <title level="j">Information Sciences 285 2014 Twitter Mood Predicts the Stock Market JBollen HMao Journal of Computational Science 2 1 2011 J QLi TWang QGong Media-aware Quantitative Trading Based on Public Web information </analytic> <monogr> <title level="j">Decision Support Systems 61 2014 The Effect of News and Public Mood on Stock Movements QLi TJWang PLi J </analytic> <monogr> <title level="j">Information Sciences 278 2014 Technical Analysis and Sentiment Embeddings for Market Trend Prediction APicasso SMerello J </analytic> <monogr> <title level="j">Expert Systems with Applications 135 2019 Practical Deep Reinforcement Learning Approach for Stock Trading ZRXiong XYLiu SZhong <idno>abs/1811.07522</idno> </analytic> <monogr> <title level="j">CoRR 2018 Multi-DQN: An Ensemble of Deep Q-learning Agents for Stock Market Forecasting SCarta AFerreira ASPodda J </analytic> <monogr> <title level="j">Expert Systems with Applications 164 113820 2021 Research on financial trading algorithms based on deep reinforcement learning JXu YKZhu CHXing J]. Computer Engineering and Applications 58 07 2022 Market Sentiment-aware Deep Reinforcement Learning Approach for Sock Portfolio Allocation PKoratamaddi KWadhwani MGupta Engineering Science and Technology an International Journal 24 4 2021 J