Analysis of the Causes and Evolution of Class Solidification Based on Q-Learning and Hawk-Dove Game 1

Analysis of the Causes and Evolution of Class Solidification Based on Q-Learning and Hawk-Dove Game 1 XuanZhou School of Science Wuhan University of Science and Technology

430065 Wuhan China

YichengGong School of Science Wuhan University of Science and Technology

430065 Wuhan China

Hubei Province Key Laboratory of Systems Science in Metallurgical Process

430065 Wuhan China

YuqiangFeng School of Science Wuhan University of Science and Technology

430065 Wuhan China

NingjingYang School of Science Wuhan University of Science and Technology

430065 Wuhan China

Analysis of the Causes and Evolution of Class Solidification Based on Q-Learning and Hawk-Dove Game 1 8BBEEB1EF323655BAAE9FB0F652FF2B1 GROBID - A machine learning software for extracting information from scholarly documents class solidification hawk-dove game reinforcement learning Q-learning

If the phenomenon of class solidification intensifies, it will lead to the lack of impetus for social production. Class solidification can be understood as the situation of low mobility formed by the interaction of aggressive and concession strategies among classes of different classes, and this can be regarded as a kind of equilibrium of repeated hawk-dove game among classes. To find measures to promote mobility between classes, firstly, a repeated class competition game model is constructed and theoretically analyzed.Then, the Q-learning method is used to simulate the characteristics of limited rational classes seeking the best decision through trial and error in reality, to test the explanatory power of the model.And the simulation experiment get the conclusion that the mobility of classes is inversely proportional to the strength ratio between classes and the income generated.Finally,the model is improved by transforming the benefits of both parties into their own strength, the experiment shows that the class mobility has increased by 11.6% compared to before the improvement.

Introduction

Class solidification has always been one of the continuous topics in the study of social stratification. Without class mobility, society can't form an effective incentive mechanism to make individual members believe that their social and economic status can be improved through their own efforts. In the long run, the lack of class mobility will hinder the sustainable development of society [1][2] .Although there are few empirical studies on class solidification with a long time span, it is found that people's attention to class solidification is closely related to the degree of social stability [3][4] .

Asymmetric Hawk-Dove Game is often used to study the strategic choice and balance of strengths and weaknesses among classes with unequal strengths, and its theoretical basis is gradually developing. Song Bo and Huang Jing established an hawk-dove game model from the perspective of asymmetric cooperation when studying the stability analysis of strategic alliances [5] , and obtained its Nash equilibrium solution of mixed strategy. At the same time,Bu Zhenxing interpreted the Sino-Japanese relationship by using the infinite game model of hawk and dove [6] , and put forward that self-strength, competition for interests, etc. are all important factors that affect the game strategy. At present, the research on repeated hawk-dove game generally adopts the evolutionary game perspective, which is based on the fact that the strength of both players in the game is asymmetric and fixed, which is in line with the actual situation of class solidification. However, there is no good starting point for the changing strength of the two sides of the game.

With the continuous development of reinforcement learning, the theoretical model of reinforcement learning model is basically consistent with the framework of game theory. Masuda & Nakamura et al.under the background of iterative prisoner's dilemma, conducted a numerical test on the performance of reinforcement learning model, and explored the relationship between learning and evolution through it [7] . Q-learning is the most commonly used RL algorithm when dealing with two-party game problems. It can be used online without the model of its environment, and it is very suitable for repeated games with unknown opponents [8] . Zhang Chunyang et al.realized the Q-learning algorithm in the prisoner's dilemma, and compared the Q-learning method with the algorithm [9] ; Liu Weibing, et al.combined reinforcement learning with evolutionary game, and established a multi-agent reinforcement learning model in evolutionary game [10] . The simulation results show that the multi-agent reinforcement learning model can make the players keep learning and seek the best strategy.

In this paper, Q-learning in reinforcement learning and asymmetric repeated hawk-dove game model are combined to model, and the evolution and causes of this phenomenon are discussed by simulating the changes of social classes' strategy choices from the perspective of class solidification, and some effective suggestions are obtained through simulation exploration.

Main framework

The research framework of this paper is shown in Figure 1： Figure 1 main framework flow chart According to Figure 1, the thesis is divided into three parts: The first part abstracts the phenomenon of class solidification as a less mobile equilibrium formed by the constant strategic interaction of different classes in social practice, while different social classes choose aggressive and concessionary strategies in line with the characteristics of the hawk-dove game.Thus, an asymmetric repeated hawk-dove game model with parameters is constructed, where the parameters are the asymmetry factor :

a b k k μ =

and the unit benefit of the conflict m V C = between the parties to the game (whereV is the benefit obtained and C is the cost paid by both parties when the conflict arises),and named the class competition game model.Meanwhile, the theoretical solution of the game model is obtained.

In the second part, under the premise of limited rationality of players, the model is simulated through reinforcement learning to test the explanatory power of game theory in practice, and to explore the relationship between different types of strategy choices and parameters.

The third part improves and simulates the parameters of the game model of class competition, compares the experimental results before and after the improvement, and proposes solutions to improve the solidification of the class and realize the fluidity of the class by revealing the phenomenon of class mobility.

Analysis of asymmetric repeated class competition game based on hawk-dove game

An asymmetric class competition model based on the hawk-dove game

The construction of the class competitive game model consists of three parts： (1) Strategy space of game subjects: assume that both social class A and social class B have two strategies to choose from, i.e., the hawk strategy and the dove strategy； (2) The parameters of the game model are generally the benefits of the gameV , the costs paid by both parties C when a conflict arises (in generalV C < )； (3) The asymmetry factor :

( ) 4 a V C k − , ( ) 4 b V C k − V,0 D 0,V a k V , b k V

Using the line drawing method can get Table 1 there are two pure strategy Nash equilibrium: (H, D) and (D, H). And there is also a mixed strategy Nash equilibrium, the probability 0 p is the probability that the social class chooses the dove strategy 0

a b a b C V C V p p C V k k V C V k k V   − − − = −   − + − +  0 ( ) ( ) ( ,1 ) ,1 ( ) 4 ( ) 4

For the parameters of the above asymmetric hawk-dove game 0

( ) ( ) 4 a b C V p C V k k V − = − +

, defined as the unit gain of the conflicting parties m V C = to the game, then

2 0 2 2 (1 )(1 ) (1 ) ( 1)m p m μ μ μ − + = + − −(2)

From Eq.2, it can be seen that the probability 0 p is related to the parameters μ and m : when μ is constant, the larger m , the smaller 0 p , i.e. the larger the unit gain at the time of conflict, the smaller the probability of choosing the dove strategy；When m is constant, the greater μ , the greater 0 p , the 0 p is minimized when 1 μ = , i.e. the probability of choosing the dove strategy is the smallest for classes of equal strength, the greater the difference in strength, the greater the probability of choosing the dove strategy.

A class competition model based on Q-learning

The purpose of this paper is to study the equilibrium reached by social classes with different power ratios after a long and repeated game process, so the environment of the Q-learning method for simulating game experiments is set as follows： (1) States

{ }

" "," "," "," "

{( 4 ), , 0, } t t a a V C k V k V − ; the reward set b R denoted is{( 4 ), , 0, } t t b b V C k V k V − ； (4)

( )

i i Q s i Q s A e p e α λ α λ α α ∈ =  ； This gives the Q-value update formula： ( ) ( ) ( ) ( ) ( ) ' 1 1 2 1 2 , , 1 , , + max ,b A A A A t tt t t t b Q s a a Q s a a r Q s α α γ + = − + (3) ( ) ( ) ( ) ( ) ( ) ' 1 1 2 1 2 , ,1 , , + max ,B B B B t tt t t t b Q s a a Q s a a r Q s b α α γ + = − +(4)

Simulation experiments on class competition model based on Q-learning

In this section, the laws of social class strategy selection are discovered through simulation experiments.

Parameter Setting

The parameters in the Q-learning model in this paper are 0.9 for the γ and 0.1 for the α , considering the realistic implications, the following two parameters are set： (1) The unit benefit of conflict between the two sides of the game m V C = , where the conflict cost C is set to 50 and the benefit is set to 20；

(2) Asymmetry factor of both sides of social classes μ : According to the difference of strength values of different classes, the simulation experiment is divided into 5 cases: : 0.5:0. Considering the stability of the model, the initial total value of the strength of both sides is set to 10000, the initial strength ratio is proportionally distributed, and the asymmetry factors μ of class A and class B do not change in each experiment.

A test of the explanatory power of game theory in practice

After 20,000 iterations of the model, it is obtained that different classes are stable in the combination of (H, D), (D, H) and (D, D) strategies at about 10,000 iterations, but the probability of choosing these three strategies combinations is related to parameters μ and m .Therefore, the control variables method is used to verify the frequency change of different social classes when the strength ratio μ is different but m unchanged (taken 20 50 m =

) versus when the strength ratio m is different but μ unchanged (taken 0.5 : 0.5

μ =

), and the model is repeated 100 times to count the frequency change when the strategies are stable to obtain Figure 2： From the Figure 2-a, it can be seen that with the ratio :

a b k k μ =

decreasing, that is, the strength gap between the class classes becomes smaller, the more the two sides tend to choose (D, D) strategy combination, and the probability of choosing (D, H) strategy combination and (H, D) strategy combination gradually decreases, which accords with the results given by theoretical values.Also, the experimental result means that the established Q-learning game model can basically represent the real game situation. However, when : 0.1: 0.9

a b k k μ = =

,the weak side will always choose to compromise and give in, the strong side will always choose the conflict strategy, and the probability of cooperation between the two sides is reduced.

From the 2-b diagram, we can see that as the gain of the gameV becomes larger and m increases, the unit gain of conflict between the two sides increases, the chance of both sides choosing the (D,D) strategy combination becomes smaller, the choice of (D,H) and (H,D) in general becomes more and more, and the chance of both social classes choosing to settle for the status quo becomes smaller.

Simulation experiments on class competition model with variable

parameters based on Q-learning

Parameter Setting

This section considers breaking class solidification under the premise of class solidification. The strength of different social classes in the game process will change because of some factors, which can be internal or external. The model sets the cumulative income after the game as the strength ratio of both sides after the game: i.e., the strength value t

t t a a t t a b R k R R − − − = + ， 1 1 1 t t b b t t a b R k R R − − − = + ，

the rest of the parameter settings are the same as those used for class solidification, defined as a class competition model with variable parameters.

Comparison of experimental results of class competition model simulation before and after parameter change

The strategies of both sides of the society are Q-learning with 20,000 iterations. During the experiment, it was found that after about 10,000 games, both sides of the game are stable in the combination of three strategies (H, D), (D, H) and (D,D), the probability of the combination of the three strategies is also related to the parameters μ and m . At the same time, there are obvious differences in the choice of strategy combination between participants before and after the model improvement. The two models are simulated for 100 times, and the probability of test results (D, D) is compared, as shown in Figure 3: Fig. 3 Plot of the probability of choosing the (D,D) strategy combination with μ value for the social class before and after parameter improvement It can be seen from Figure 3 that as the value or initial value μ keeps getting smaller, i.e., as the class strength gap gets bigger, the probability of the class choosing (D,D) will gradually increase, representing that the bigger the class gap is, the easier it is for the class class to lie flat. Particularly, when the class classes have different strengths ( 0.1:0.9) μ = , the lower class classes when the μ value remains unchanged basically do not choose the aggressive aggressive strategy, but choose to lie flat, while about 26% of the lower class classes when the value changes are willing to choose the aggressive aggressive strategy.

The difference is that the improved variable-parameter class competition model, on the whole, reduces the probability of the class choosing (D,D) to 11.6% compared with the class competition model under the phenomenon of class solidification , while the probability of choosing (H,D) and (D,H) has an average growth rate of 6.3% and 4.6%. It can be seen that the immutability of class status makes the disadvantaged classes more inclined to lie flat, while the variability of class status effectively stimulates the motivation of different class classes, constantly promotes the mobility between social classes and improves the dynamics of social development.

Conclusion and Outlook

Conclusion

In this paper, the asymmetric repeated hawk-dove game is used to model and theoretically analyze the class competition phenomenon, and the simulation experiments based on Q-learning algorithm show that: different class classes are likely to stabilize in either aggressive or comfortable strategies, but the parameters μ and m are roughly inversely proportional to the probability of class choice (D, D), and the model can be improved by changing the values of a k and b k to reduce the probability of class choice (D,D) to 11.6%. Finally, based on the results of the simulation experiment, this paper discusses three suggestions to avoid the phenomenon of class solidification and improve social mobility： (1)Reduce the gap between rich and poor. In this study, we found that the probability of both sides of the class being stable at (D,D) is inversely proportional to the strength ratio μ of both sides. The greater the strength ratio μ , the greater the probability of choosing to rest on one's laurels in social competition, and the probability of choosing to forge ahead is generally decreasing.Meanwhile,when the difference in strength between the two sides of the class is too great, the side with low strength will hardly adopt the strategy of forging ahead but lie flat to face challenges and opportunities, which will increase the degree of class solidification. Therefore, in order to avoid the aggravation of class solidification, it is necessary to promote measures to reduce the gap between the rich and the poor, so as to increase the motivation of different classes to advance up the class； (2) Appropriate load shedding mechanisms. Studies have shown that the probability of both classes settling at (D,D) is also inversely proportional to the unit benefit m of conflict between the two parties. The increase in the unit gain m of conflict between the two sides of the class can be interpreted as the gain obtained by the class becomes larger under the condition of a certain conflict cost, when the probability of both sides of the class settling in the status quo is reduced, effectively increasing the motivation of the class to continuously improve in social competition. When the gain cost is higher, the class will be more willing to take the initiative to improve themselves under the same conflict cost condition, instead of losing motivation in the continuous internal conflict. Therefore, in order to avoid the phenomenon of class solidification, it is necessary to appropriately increase the unit gain in the conflict to encourage people of different classes to continuously advance and achieve a greater possibility of class leap； (3) A fair and just social environment. The study shows that the changing a k , b k . that translates gains into their own strength values effectively reduces the probability of both sides of the game choosing to settle for the status quo by 11.6% on average, and increases the probability of choosing to be aggressive by 11%. This indicates that social classes find that their own strength values can be improved through acquired effort factors in continuous strategic interaction, which increases the motivation of classes to change themselves. Therefore, establishing a fair and just social environment for different social classes to establish a reasonable strategic interaction environment, so that the interests of both sides can be effectively protected, is an effective way to avoid the phenomenon of social solidification and improve social mobility.

resources are allocated by social class A and social class B).

( 3 )3t S hh hd dh dd = ，using the strategy choices of the previous round of social classes A and B as the state of this round, e.g. " " hh ； (2) Social class A and social class B have the same action set A, set to {h,d} A = , i.e. both A and B have two actions to choose from: the "hawk" strategy ( ) h and the "dove" strategy ( ) d ； The reward of the model R , representing the reward of social class A a R and the reward of social class B b R after adopting different actions , where the reward set a R denoted is

is not just a simple repetition game, we give the social class to choose a certain randomness of strategy, so choose the Boltzmann distribution to represent the probability of the social class to choose the action：( , )

Figure 22Figure 2 Plot of the variation of strategy with μ and m

akof social class A and the strength value t b k of social class B are determined by the cumulative gains of the class： 1 1 1

Table 11Class competition model revenue matrixSocial class BHD

-a 2-b

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (72031009) and Hubei Province Key Laboratory of Systems Science in Metallurgical Process(Y202105)

Traveling habitus' and the new anthropology of class: proposing a transitive tool for analyzing social mobility in global migration JaafarAlloul Mobilities 16 2 2021 Wealth Inequality and Social Mobility: A Simulation-Based Modelling Approach XYang PZhou J </analytic> <monogr> <title level="j">Cardiff Economics Working Papers 196 2022 Social Mobility and Stability of Democracy: Re-evaluating De Tocqueville DAcemoglu GEgorov KSonin J </analytic> <monogr> <title level="j">CEPR Discussion Papers 133 2 2016 Political Man:The Social Bases of Politics SMLipset J].Political Science Quarterly 75 2 1960 Stability analysis of strategic alliance from the perspective of asymmetric cooperation-based on the game model of hawk and dove HuangSong Bo Jing J </analytic> <monogr> <title level="j">Soft Science 27 2 2013 Using infinite game model of hawk and dove to study international relations-taking Sino-Japanese relations as an example BuZhenxing Journal of Sichuan Provincial Party School of CPC 01 2016 J Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner's dilemma NMasuda MNakamura Journal of Theoretical Biology 278 1 2011 J Emergence of cooperation and a fair system optimum in road networks:A game-theoretic and agent-based modelling approach NLevy IKlein EBen-Elia J].Research in Transportation Economics 68 2018 Q-learning algorithm and its implementation in prisoner's dilemma ZhangChunyang ChenXiaoping LiuGuiquan CaiQingsheng J]. Computer engineering and application 13 2001 Multi-agent reinforcement learning model in evolutionary game LiuWeibing WangXianjia J]. System engineering theory and practice 29 03 2009