<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Analysis of the Causes and Evolution of Class Solidification Based on Q-Learning and Hawk-Dove Game 1</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Xuan</forename><surname>Zhou</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Science</orgName>
								<orgName type="institution">Wuhan University of Science and Technology</orgName>
								<address>
									<postCode>430065</postCode>
									<settlement>Wuhan</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yicheng</forename><surname>Gong</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Science</orgName>
								<orgName type="institution">Wuhan University of Science and Technology</orgName>
								<address>
									<postCode>430065</postCode>
									<settlement>Wuhan</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Hubei Province Key Laboratory of Systems Science in Metallurgical Process</orgName>
								<address>
									<postCode>430065</postCode>
									<settlement>Wuhan</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yuqiang</forename><surname>Feng</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Science</orgName>
								<orgName type="institution">Wuhan University of Science and Technology</orgName>
								<address>
									<postCode>430065</postCode>
									<settlement>Wuhan</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ningjing</forename><surname>Yang</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Science</orgName>
								<orgName type="institution">Wuhan University of Science and Technology</orgName>
								<address>
									<postCode>430065</postCode>
									<settlement>Wuhan</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Analysis of the Causes and Evolution of Class Solidification Based on Q-Learning and Hawk-Dove Game 1</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">8BBEEB1EF323655BAAE9FB0F652FF2B1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T05:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>class solidification</term>
					<term>hawk-dove game</term>
					<term>reinforcement learning</term>
					<term>Q-learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>If the phenomenon of class solidification intensifies, it will lead to the lack of impetus for social production. Class solidification can be understood as the situation of low mobility formed by the interaction of aggressive and concession strategies among classes of different classes, and this can be regarded as a kind of equilibrium of repeated hawk-dove game among classes. To find measures to promote mobility between classes, firstly, a repeated class competition game model is constructed and theoretically analyzed.Then, the Q-learning method is used to simulate the characteristics of limited rational classes seeking the best decision through trial and error in reality, to test the explanatory power of the model.And the simulation experiment get the conclusion that the mobility of classes is inversely proportional to the strength ratio between classes and the income generated.Finally,the model is improved by transforming the benefits of both parties into their own strength, the experiment shows that the class mobility has increased by 11.6% compared to before the improvement.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Class solidification has always been one of the continuous topics in the study of social stratification. Without class mobility, society can't form an effective incentive mechanism to make individual members believe that their social and economic status can be improved through their own efforts. In the long run, the lack of class mobility will hinder the sustainable development of society <ref type="bibr" target="#b0">[1]</ref><ref type="bibr" target="#b1">[2]</ref> .Although there are few empirical studies on class solidification with a long time span, it is found that people's attention to class solidification is closely related to the degree of social stability <ref type="bibr" target="#b3">[3]</ref><ref type="bibr" target="#b5">[4]</ref> .</p><p>Asymmetric Hawk-Dove Game is often used to study the strategic choice and balance of strengths and weaknesses among classes with unequal strengths, and its theoretical basis is gradually developing. Song Bo and Huang Jing established an hawk-dove game model from the perspective of asymmetric cooperation when studying the stability analysis of strategic alliances <ref type="bibr" target="#b6">[5]</ref> , and obtained its Nash equilibrium solution of mixed strategy. At the same time,Bu Zhenxing interpreted the Sino-Japanese relationship by using the infinite game model of hawk and dove <ref type="bibr" target="#b8">[6]</ref> , and put forward that self-strength, competition for interests, etc. are all important factors that affect the game strategy. At present, the research on repeated hawk-dove game generally adopts the evolutionary game perspective, which is based on the fact that the strength of both players in the game is asymmetric and fixed, which is in line with the actual situation of class solidification. However, there is no good starting point for the changing strength of the two sides of the game.</p><p>With the continuous development of reinforcement learning, the theoretical model of reinforcement learning model is basically consistent with the framework of game theory. Masuda &amp; Nakamura et al.under the background of iterative prisoner's dilemma, conducted a numerical test on the performance of reinforcement learning model, and explored the relationship between learning and evolution through it <ref type="bibr" target="#b9">[7]</ref> . Q-learning is the most commonly used RL algorithm when dealing with two-party game problems. It can be used online without the model of its environment, and it is very suitable for repeated games with unknown opponents <ref type="bibr" target="#b10">[8]</ref> . Zhang Chunyang et al.realized the Q-learning algorithm in the prisoner's dilemma, and compared the Q-learning method with the algorithm <ref type="bibr" target="#b11">[9]</ref> ; Liu Weibing, et al.combined reinforcement learning with evolutionary game, and established a multi-agent reinforcement learning model in evolutionary game <ref type="bibr" target="#b12">[10]</ref> . The simulation results show that the multi-agent reinforcement learning model can make the players keep learning and seek the best strategy.</p><p>In this paper, Q-learning in reinforcement learning and asymmetric repeated hawk-dove game model are combined to model, and the evolution and causes of this phenomenon are discussed by simulating the changes of social classes' strategy choices from the perspective of class solidification, and some effective suggestions are obtained through simulation exploration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Main framework</head><p>The research framework of this paper is shown in Figure <ref type="figure">1：</ref> Figure <ref type="figure">1</ref> main framework flow chart According to Figure <ref type="figure">1</ref>, the thesis is divided into three parts: The first part abstracts the phenomenon of class solidification as a less mobile equilibrium formed by the constant strategic interaction of different classes in social practice, while different social classes choose aggressive and concessionary strategies in line with the characteristics of the hawk-dove game.Thus, an asymmetric repeated hawk-dove game model with parameters is constructed, where the parameters are the asymmetry factor :</p><formula xml:id="formula_0">a b k k μ =</formula><p>and the unit benefit of the conflict m V C = between the parties to the game (whereV is the benefit obtained and C is the cost paid by both parties when the conflict arises),and named the class competition game model.Meanwhile, the theoretical solution of the game model is obtained.</p><p>In the second part, under the premise of limited rationality of players, the model is simulated through reinforcement learning to test the explanatory power of game theory in practice, and to explore the relationship between different types of strategy choices and parameters.</p><p>The third part improves and simulates the parameters of the game model of class competition, compares the experimental results before and after the improvement, and proposes solutions to improve the solidification of the class and realize the fluidity of the class by revealing the phenomenon of class mobility.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Analysis of asymmetric repeated class competition game based on hawk-dove game</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">An asymmetric class competition model based on the hawk-dove game</head><p>The construction of the class competitive game model consists of three parts： (1) Strategy space of game subjects: assume that both social class A and social class B have two strategies to choose from, i.e., the hawk strategy and the dove strategy； (2) The parameters of the game model are generally the benefits of the gameV , the costs paid by both parties C when a conflict arises (in generalV C &lt; )； (3) The asymmetry factor :  </p><formula xml:id="formula_1">( ) 4 a V C k − , ( ) 4 b V C k − V,0 D 0,V a k V , b k V</formula><p>Using the line drawing method can get Table <ref type="table" target="#tab_0">1</ref> there are two pure strategy Nash equilibrium: (H, D) and (D, H). And there is also a mixed strategy Nash equilibrium, the probability 0 p is the probability that the social class chooses the dove strategy 0</p><formula xml:id="formula_2">a b a b C V C V p p C V k k V C V k k V   − − − = −   − + − +  <label>0 ( ) ( ) ( ,1 ) ,1 ( ) 4 ( ) 4</label></formula><p>For the parameters of the above asymmetric hawk-dove game 0</p><formula xml:id="formula_4">( ) ( ) 4 a b C V p C V k k V − = − +</formula><p>, defined as the unit gain of the conflicting parties m V C = to the game, then</p><formula xml:id="formula_5">2 0 2 2 (1 )<label>(1 ) (1 ) ( 1)</label></formula><formula xml:id="formula_6">m p m μ μ μ − + = + − −<label>(2)</label></formula><p>From Eq.2, it can be seen that the probability 0 p is related to the parameters μ and m : when μ is constant, the larger m , the smaller 0 p , i.e. the larger the unit gain at the time of conflict, the smaller the probability of choosing the dove strategy；When m is constant, the greater μ , the greater 0 p , the 0 p is minimized when 1 μ = , i.e. the probability of choosing the dove strategy is the smallest for classes of equal strength, the greater the difference in strength, the greater the probability of choosing the dove strategy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">A class competition model based on Q-learning</head><p>The purpose of this paper is to study the equilibrium reached by social classes with different power ratios after a long and repeated game process, so the environment of the Q-learning method for simulating game experiments is set as follows： (1) States</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>{ }</head><p>" "," "," "," "  </p><formula xml:id="formula_7">{( 4 ), , 0, } t t a a V C k V k V − ; the reward set b R denoted is{( 4 ), , 0, } t t b b V C k V k V − ； (4)</formula><p>( )</p><formula xml:id="formula_9">i i Q s i Q s A e p e α λ α λ α α ∈ =  ； This gives the Q-value update formula： ( ) ( ) ( ) ( ) ( ) ' 1 1 2 1 2 , , 1 , , + max ,b A A A A t tt t t t b Q s a a Q s a a r Q s α α γ + = − + (3) ( ) ( ) ( ) ( ) ( ) ' 1 1 2 1 2 , ,<label>1 , , + max ,</label></formula><formula xml:id="formula_10">B B B B t tt t t t b Q s a a Q s a a r Q s b α α γ + = − +<label>(4)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Simulation experiments on class competition model based on Q-learning</head><p>In this section, the laws of social class strategy selection are discovered through simulation experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Parameter Setting</head><p>The parameters in the Q-learning model in this paper are 0.9 for the γ and 0.1 for the α , considering the realistic implications, the following two parameters are set： (1) The unit benefit of conflict between the two sides of the game m V C = , where the conflict cost C is set to 50 and the benefit is set to 20；</p><p>(2) Asymmetry factor of both sides of social classes μ : According to the difference of strength values of different classes, the simulation experiment is divided into 5 cases: : 0.5:0. Considering the stability of the model, the initial total value of the strength of both sides is set to 10000, the initial strength ratio is proportionally distributed, and the asymmetry factors μ of class A and class B do not change in each experiment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">A test of the explanatory power of game theory in practice</head><p>After 20,000 iterations of the model, it is obtained that different classes are stable in the combination of (H, D), (D, H) and (D, D) strategies at about 10,000 iterations, but the probability of choosing these three strategies combinations is related to parameters μ and m .Therefore, the control variables method is used to verify the frequency change of different social classes when the strength ratio μ is different but m unchanged (taken 20 50 m =</p><p>) versus when the strength ratio m is different but μ unchanged (taken 0.5 : 0.5</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>μ =</head><p>), and the model is repeated 100 times to count the frequency change when the strategies are stable to obtain Figure 2： From the Figure <ref type="figure" target="#fig_4">2</ref>-a, it can be seen that with the ratio :</p><formula xml:id="formula_11">a b k k μ =</formula><p>decreasing, that is, the strength gap between the class classes becomes smaller, the more the two sides tend to choose (D, D) strategy combination, and the probability of choosing (D, H) strategy combination and (H, D) strategy combination gradually decreases, which accords with the results given by theoretical values.Also, the experimental result means that the established Q-learning game model can basically represent the real game situation. However, when : 0.1: 0.9</p><formula xml:id="formula_12">a b k k μ = =</formula><p>,the weak side will always choose to compromise and give in, the strong side will always choose the conflict strategy, and the probability of cooperation between the two sides is reduced.</p><p>From the 2-b diagram, we can see that as the gain of the gameV becomes larger and m increases, the unit gain of conflict between the two sides increases, the chance of both sides choosing the (D,D) strategy combination becomes smaller, the choice of (D,H) and (H,D) in general becomes more and more, and the chance of both social classes choosing to settle for the status quo becomes smaller.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Simulation experiments on class competition model with variable</head><p>parameters based on Q-learning</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Parameter Setting</head><p>This section considers breaking class solidification under the premise of class solidification. The strength of different social classes in the game process will change because of some factors, which can be internal or external. The model sets the cumulative income after the game as the strength ratio of both sides after the game: i.e., the strength value t </p><formula xml:id="formula_13">t t a a t t a b R k R R − − − = + ， 1 1 1 t t b b t t a b R k R R − − − = + ，</formula><p>the rest of the parameter settings are the same as those used for class solidification, defined as a class competition model with variable parameters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Comparison of experimental results of class competition model simulation before and after parameter change</head><p>The strategies of both sides of the society are Q-learning with 20,000 iterations. During the experiment, it was found that after about 10,000 games, both sides of the game are stable in the combination of three strategies (H, D), (D, H) and (D,D), the probability of the combination of the three strategies is also related to the parameters μ and m . At the same time, there are obvious differences in the choice of strategy combination between participants before and after the model improvement. The two models are simulated for 100 times, and the probability of test results (D, D) is compared, as shown in Figure <ref type="figure" target="#fig_1">3</ref>: Fig. <ref type="figure" target="#fig_1">3</ref> Plot of the probability of choosing the (D,D) strategy combination with μ value for the social class before and after parameter improvement It can be seen from Figure <ref type="figure" target="#fig_1">3</ref> that as the value or initial value μ keeps getting smaller, i.e., as the class strength gap gets bigger, the probability of the class choosing (D,D) will gradually increase, representing that the bigger the class gap is, the easier it is for the class class to lie flat. Particularly, when the class classes have different strengths ( 0.1:0.9) μ = , the lower class classes when the μ value remains unchanged basically do not choose the aggressive aggressive strategy, but choose to lie flat, while about 26% of the lower class classes when the value changes are willing to choose the aggressive aggressive strategy.</p><p>The difference is that the improved variable-parameter class competition model, on the whole, reduces the probability of the class choosing (D,D) to 11.6% compared with the class competition model under the phenomenon of class solidification , while the probability of choosing (H,D) and (D,H) has an average growth rate of 6.3% and 4.6%. It can be seen that the immutability of class status makes the disadvantaged classes more inclined to lie flat, while the variability of class status effectively stimulates the motivation of different class classes, constantly promotes the mobility between social classes and improves the dynamics of social development.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion and Outlook</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Conclusion</head><p>In this paper, the asymmetric repeated hawk-dove game is used to model and theoretically analyze the class competition phenomenon, and the simulation experiments based on Q-learning algorithm show that: different class classes are likely to stabilize in either aggressive or comfortable strategies, but the parameters μ and m are roughly inversely proportional to the probability of class choice (D, D), and the model can be improved by changing the values of a k and b k to reduce the probability of class choice (D,D) to 11.6%. Finally, based on the results of the simulation experiment, this paper discusses three suggestions to avoid the phenomenon of class solidification and improve social mobility： (1)Reduce the gap between rich and poor. In this study, we found that the probability of both sides of the class being stable at (D,D) is inversely proportional to the strength ratio μ of both sides. The greater the strength ratio μ , the greater the probability of choosing to rest on one's laurels in social competition, and the probability of choosing to forge ahead is generally decreasing.Meanwhile,when the difference in strength between the two sides of the class is too great, the side with low strength will hardly adopt the strategy of forging ahead but lie flat to face challenges and opportunities, which will increase the degree of class solidification. Therefore, in order to avoid the aggravation of class solidification, it is necessary to promote measures to reduce the gap between the rich and the poor, so as to increase the motivation of different classes to advance up the class； (2) Appropriate load shedding mechanisms. Studies have shown that the probability of both classes settling at (D,D) is also inversely proportional to the unit benefit m of conflict between the two parties. The increase in the unit gain m of conflict between the two sides of the class can be interpreted as the gain obtained by the class becomes larger under the condition of a certain conflict cost, when the probability of both sides of the class settling in the status quo is reduced, effectively increasing the motivation of the class to continuously improve in social competition. When the gain cost is higher, the class will be more willing to take the initiative to improve themselves under the same conflict cost condition, instead of losing motivation in the continuous internal conflict. Therefore, in order to avoid the phenomenon of class solidification, it is necessary to appropriately increase the unit gain in the conflict to encourage people of different classes to continuously advance and achieve a greater possibility of class leap； (3) A fair and just social environment. The study shows that the changing a k , b k . that translates gains into their own strength values effectively reduces the probability of both sides of the game choosing to settle for the status quo by 11.6% on average, and increases the probability of choosing to be aggressive by 11%. This indicates that social classes find that their own strength values can be improved through acquired effort factors in continuous strategic interaction, which increases the motivation of classes to change themselves. Therefore, establishing a fair and just social environment for different social classes to establish a reasonable strategic interaction environment, so that the interests of both sides can be effectively protected, is an effective way to avoid the phenomenon of social solidification and improve social mobility.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>resources are allocated by social class A and social class B).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>( 3 )</head><label>3</label><figDesc>t S hh hd dh dd = ，using the strategy choices of the previous round of social classes A and B as the state of this round, e.g. " " hh ； (2) Social class A and social class B have the same action set A, set to {h,d} A = , i.e. both A and B have two actions to choose from: the "hawk" strategy ( ) h and the "dove" strategy ( ) d ； The reward of the model R , representing the reward of social class A a R and the reward of social class B b R after adopting different actions , where the reward set a R denoted is</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>is not just a simple repetition game, we give the social class to choose a certain randomness of strategy, so choose the Boltzmann distribution to represent the probability of the social class to choose the action：( , )   </figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 2</head><label>2</label><figDesc>Figure 2 Plot of the variation of strategy with μ and m</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>ak</head><label></label><figDesc>of social class A and the strength value t b k of social class B are determined by the cumulative gains of the class： 1 1 1</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Class competition model revenue matrix</figDesc><table><row><cell>Social class B</cell><cell></cell></row><row><cell>H</cell><cell>D</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">-a 2-b</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Acknowledgments</head><p>This work was financially supported by the National Natural Science Foundation of China (72031009) and Hubei Province Key Laboratory of Systems Science in Metallurgical Process(Y202105)</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Traveling habitus&apos; and the new anthropology of class: proposing a transitive tool for analyzing social mobility in global migration</title>
		<author>
			<persName><forename type="first">Jaafar</forename><surname>Alloul</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Mobilities</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="178" to="193" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Wealth Inequality and Social Mobility: A Simulation-Based Modelling Approach</title>
		<author>
			<persName><forename type="first">X</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Cardiff Economics Working Papers</title>
		<imprint>
			<biblScope unit="volume">196</biblScope>
			<biblScope unit="page" from="307" to="329" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Social Mobility and Stability of Democracy: Re-evaluating De Tocqueville</title>
		<author>
			<persName><forename type="first">D</forename><surname>Acemoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Egorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sonin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">CEPR Discussion Papers</title>
		<imprint>
			<biblScope unit="volume">133</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="1041" to="1105" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Political Man:The Social Bases of Politics</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lipset</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J].Political Science Quarterly</title>
		<imprint>
			<biblScope unit="volume">75</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="326" to="328" />
			<date type="published" when="1960">1960</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Stability analysis of strategic alliance from the perspective of asymmetric cooperation-based on the game model of hawk and dove</title>
		<author>
			<persName><forename type="first">Huang</forename><surname>Song Bo</surname></persName>
		</author>
		<author>
			<persName><surname>Jing</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Soft Science</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="28" to="31" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Using infinite game model of hawk and dove to study international relations-taking Sino-Japanese relations as an example</title>
		<author>
			<persName><forename type="first">Bu</forename><surname>Zhenxing</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Sichuan Provincial Party School of CPC</title>
		<imprint>
			<biblScope unit="issue">01</biblScope>
			<biblScope unit="page" from="99" to="104" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note>J</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner&apos;s dilemma</title>
		<author>
			<persName><forename type="first">N</forename><surname>Masuda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nakamura</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Theoretical Biology</title>
		<imprint>
			<biblScope unit="volume">278</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="55" to="62" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
	<note>J</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Emergence of cooperation and a fair system optimum in road networks:A game-theoretic and agent-based modelling approach</title>
		<author>
			<persName><forename type="first">N</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ben-Elia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J].Research in Transportation Economics</title>
		<imprint>
			<biblScope unit="volume">68</biblScope>
			<biblScope unit="page" from="46" to="55" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Q-learning algorithm and its implementation in prisoner&apos;s dilemma</title>
		<author>
			<persName><forename type="first">Zhang</forename><surname>Chunyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chen</forename><surname>Xiaoping</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Liu</forename><surname>Guiquan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cai</forename><surname>Qingsheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J]. Computer engineering and application</title>
		<imprint>
			<biblScope unit="issue">13</biblScope>
			<biblScope unit="page" from="121" to="122" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Multi-agent reinforcement learning model in evolutionary game</title>
		<author>
			<persName><forename type="first">Liu</forename><surname>Weibing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wang</forename><surname>Xianjia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J]. System engineering theory and practice</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">03</biblScope>
			<biblScope unit="page" from="28" to="33" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
