=Paper=
{{Paper
|id=Vol-3878/23_main_long
|storemode=property
|title=History Repeats: Historical Phase Recognition from Short Texts
|pdfUrl=https://ceur-ws.org/Vol-3878/23_main_long.pdf
|volume=Vol-3878
|authors=Fabio Celli,Valerio Basile
|dblpUrl=https://dblp.org/rec/conf/clic-it/CelliB24
}}
==History Repeats: Historical Phase Recognition from Short Texts==
History Repeats:
Historical Phase Recognition from Short Texts
Fabio Celli1,* , Valerio Basile2
1
Gruppo Maggioli, Via Bornaccino 101, Santarcangelo di Romangna, 47822, Italy
2
Università di Torino, Via Pessinetto 12, 10149, Torino, Italy
Abstract
This paper introduces a new multi-class classification task: the prediction of the Structural-Demographic phase of historical
cycles - such as growth, impoverishment and crisis - from text describing historical events. To achieve this, we leveraged
data from the Seshat project, annotated it following specific guidelines and then evaluated the consistency between three
annotators. The classification experiments, with transformers and Large Language Models, show that 2 of 5 phases can be
detected with good accuracy. We believe that this task could have a great impact on comparative history and can be helped
by event extraction in NLP.
Keywords
Cultural Analytics, Structural Demographic Theory, LLMs, NLP for the Humanities,
1. Introduction And Background society and are eligible to become part of the
state. Who is considered part of the elite and how
In the last decade, at least since Brexit [1], many coun- someone gains or loses elite status depends on
tries in the world experienced a generalized polarization the type of government and the power dynamics
and phenomena of toxic language online have grown within a society.
[2]. Hate speech [3], misogyny [4], conspiracy theories • The state, formed by roughly 2% of the society, is
[5] and related phenomena are just visible manifesta- the government that enforces its will and man-
tions of deep structural social crises, ushering in periods ages resources from the population. It is com-
of shifting world order [6]. While crises may appear posed by one or more elite groups, depending on
sudden, they are often rooted in underlying factors like the social structure, and it crystallizes the culture
demographics, geopolitics, technological advancements, to keep the society alive.
and historical-economic cycles. Using scientific method,
mathematical modelling and the Structural Demographic The actors interact in five phases during the secular cycle,
Theory (SDT) [7] it was possible to formalise secular cy- progressively increasing social and political instability:
cles [8], that typically last between 75 to 100 years [9],
and predict outbreaks of political instability in complex 1. The growth phase. During this phase a fresh and
societies based on the rate of past crises [10]. The SDT effective culture creates social cohesion, the econ-
defines three actors and five phases of the secular cycle. omy is growing rapidly and the state is expand-
The three key actors are: ing its control over the population. This leads to
increased economic prosperity and stability but
• The population, which is the source of the so- raises the problem of sustainability. Periods of
ciety’s resources and manpower, represents ap- reconstruction immediately following wars, like
proximately 90% of the entire society and is the post-war Italy in the 1950s, are examples of this
part that follows instructions to produce goods phase.
and wealth, consuming only a small part of it. 2. The population immiseration phase. The pop-
• The elites, who typically cover around 8% of the ulation continues to grow in number while the
society, are the groups of people in charge of economy slows down. This happens because over
finding potential solutions to the problems of the the long term the rate of return on capital is typi-
cally greater than the growth rate of population
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics,
Dec 04 — 06, 2024, Pisa, Italy
salaries [11], as result the elites gets richer and the
*
Corresponding author. population gets poorer. Moreover, demography
$ fabio.celli@maggioli.it (F. Celli); valerio.basile@unito.it has a strong impact on the wealth of the popu-
(V. Basile) lation: the more workers of the same type are
https://github.com/facells/fabio-celli-publications (F. Celli); available, the less likely their wages are to grow.
https://www.unito.it/persone/vabasile (V. Basile)
0000-0002-7309-5886 (F. Celli); 0000-0001-8110-6832 (V. Basile)
The state’s ability to extract resources from the
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License population reaches its limits in this phase. This
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Figure 1: Time chart depicting the dynamics and phases described by the Structural-Demographic Theory.
can lead to increasing inequality, and social un- of historical events, including the French Revolution, the
rest begins. United States in the 1890s and 1970s American Civil War [13], the fall of the Qing Dynasty
are an example of this phase. [14], the Russian Revolution and the instability in the US
3. The elite overproduction phase. The population in recent years.
tries to access the elite ranks but overloads the In this paper we propose a novel multi-class classifi-
social lift mechanisms and yields a reduced capa- cation task: given a text describing the historical events
bility of the elite to solve problems in the society, of a decade, find the appropriate SDT phase label. To do
which raise the probability to have societal insta- so we exploited historical data from the Seshat project,
bility. USSR in the 1950s and US in the 1990s are produced textual descriptions for decades in the history
examples of this phase. of human societies and annotated each decade with SDT
4. The state stress phase. The state’s ability to gov- phases following specific annotation guidelines. We com-
ern the population and foster cooperation be- puted inter-annotator agreement between 3 annotators
tween population and elites begins to decline, and experimented with LLMs in classification. The paper
and the elites become increasingly fragmented. is structured as follows: in Section 2 we will describe the
This can lead to widespread violence and civil data, the guidelines for the annotation (Section 3), the
war. Moreover, the state tends to be in financial classification experiments in Section 4, the conclusion
distress as a consequence of slowed economy and and direction for future work in Section 5.
internal fragmentation, thus any triggering event
that the state cannot manage can break into a
crisis. Germany in the 1920s is an example. 2. Data
5. The crisis, collapse or recovery phase. The state
is either reformed by the elites who find an agree- It is not easy to design a dataset for historical data. There
ment or overthrown by internal or external forces. are specific datasets for event detection from text [15],
At the end of this phase a new social equilibrium for paleoclimatology [16], for census analysis through
is found and a new period of stability begins, time [17] and for information extraction from historical
restarting the cycle. Examples are France in the documents [18], but there are few long-term historical
1790s, UK in the 1940s, US in the 1860s under civil datasets for Structural-Demographic analysis. Crucially
war and also in 1930s under New Deal reforms. the Seshat project [19] produced a dataset that contains
machine-readable historical information about global his-
The dynamics described by the SDT are represented in tory. The basic concept of Seshat is to provide quanti-
figure 1 [12]. SDT has been used to explain a wide range
Figure 2: Distribution of the sampling zones. There are two sampling zone per World region: North America (US, Mexico),
Oceania (Hawaii, Madang - Papua New Guinea), South America (Ecuador, Peru), Europe (France, Italy), Africa (Egypt, Ghana),
Middle East (Levant, Iraq), Eurasia (Turkey, Siberia), South Asia (Uttar Pradesh - India, Java - Indonesia), East Asia (Henan -
China, Japan)
tative and structured or semi-structured data about the and philosophies When possible, report the refer-
evolution of societies, defined as political units (polities) ences about the information found.
from 35 sampling points across the globe in a time win-
dow from roughly 10000 BC to 1900 CE, sampled with We also extended the data to include the polities until the
a time-step of 100 years. A sampling frequency of 100 2010s CE. In order to limit the long and time-consuming
years is too much coarse-grained, not suitable to track the manual data wrangling, we reduced the number of sam-
internal phases of the secular cycle, thus we resampled pling zones from 35 to 18 but at the same time we kept the
the data with a sampling frequency to 10 years, manu- original variety of world regions [20]. This, combined
ally integrating data and descriptions from Seshat and with the extension of the time window, allowed us to
from Wikipedia. To do so, we followed these general obtain 366 polities (roughly the same number of polities
guidelines: as Seshat) and 3540 rows with a textual description. We
will call “Chronos” the dataset we produced. It contains
• For each polity in Sesaht create a number of rows the following features:
to represent each decade. There must be no gaps
between decades. If needed, add polities to fill • timestamp of each decade,
the gaps searching in Wikipedia. • the Age indicating the periods of history (prehis-
• Read the description of the polity provided in toric, ancient, medieval, early-modern, modern,
Seshat, identify dates and map the content to the post-modern),
corresponding decade. • the sampling zone as reported in Figure 2,
• Search Wikipedia to find more information about • the world regions related to the sampling zones,
the polity that can be mapped into decades. Fill • a Polity ID formatted with a standard method:
in as much decades as possible. When dates are 2 letters to indicate the area of origin of the
uncertain within a specific time period, use the culture, 3 letters to indicate the name of the
median decade of that period. polity, 1 letter to indicate the type of soci-
• Summarize the content to fit about 400 charac- ety (c=culture/community; n=nomads; e=empire;
ters. Focus on the following types of events: wars k=kingdom; r=republic) and 1 letter to indicate
or battles; reforms; rulers; population; elites; dis- the periodization (t=terminal; l=late; m=middle;
asters or epidemics; alliances or treaties; socio- e=early; f=formative; i=initial; *=any). For exam-
conomic context; famines or financial stress; ple “EsSpael” is the late Spanish Empire, “ItRomre”
protests or movements; changes of elite; religions is the early Roman Republic and “CnWwsk*” is
the period of the Warring States under the Wei Trial Examples Raters Labels K
Chinese dynasty, base 93 3 5 0.206
• a short textual description of the decade in Italian trained 93 3 5 0.455
and English. Table 1
Inter-Annotator Agreement (Fleiss’ Kappa) on the annotation
Short texts can contain one or more events and refer-
of secular cycle phases.
ences. Consider the following examples extracted from
the Chronos dataset:
1. introduction of iron from Vietnam by 300 BC [Bell- 2. Use polity identifiers to find the start and end
wood P. 1997. Prehistory of the Indo-Malaysian points of cultures. The end of a culture represents
Archipelago: Revised Edition pp. 268-307]. Old a crisis period.
Malay as lingua franca.
3. Starting from the beginning of a culture, initially
2. Siege of Constantinople in 626. The Byzantines won.
assign the sequence of labels of a standard secu-
Problems in the succession to the throne: Kavadh II
lar cycle model: 1,1,2,2,3,3,4,4,4,5 and then evalu-
is killed in 628. Years of war with Bizantines had
ate whether to keep or change the labels in each
exhausted the Sasanids who were further weak-
decade. It is possible to have longer or shorter
ened by economic decline; religious unrest and in-
cycles. There can be only one label 5 (crisis) per
creasing power of the provincial landholders. King
cycle. A polity can have one or more cycles.
Yazdegerd III (r. 632-651) could not stand against
4. Having in mind the key events in the textual de-
the Islamic conquest of Persia.
scription, select one of the following labels to
Example 1 contains a socio-economic context about the describe the decade: 1=growth. A society is gener-
Buni culture of Indonesia and example 2 contains events ally poor when it experiences renewal or change
about war, rulers, socio-economic context, religion and followed by demographic (but not always terri-
elite change about the late Sasanian Empire. The events torial or economic) growth. Reforms, alliances,
in the short textual description are specific to the SDT wars won or similar events are potential indi-
and help annotators in their decisions about the histor- cators of this phase. 2=impoverishment of the
ical phase labels. For example a good socio-economic population. Potential economic and/or territorial
context may be a clue of a growth phase and a disaster expansion slows while demography continues to
may trigger a crisis phase. For this reason we did not expand. The elite takes much of the wealth and
exploit the labels proposed in literature, such as second- defines the status symbols. Stability and exter-
level HTOED categories or the HISTO classes [21]. How- nal attacks are potential indicators of this phase.
ever, we acknowledge that this is an aspect that requires 3=Overproduction of the elites. The wealthy seek
further research. All events included in the texts were to translate their wealth into positions of author-
manually detected, and the data collectors were trained ity and prestige. The population becomes poor.
to recognize key events from the examples provided in Movements, protests, and wars are potential in-
the literature about SDT [12]. dicators of this phase. 4=State stress. The elites
want to institutionalize their advantages in the
form of low taxes and privileges that lead the
3. Annotation and Evaluation state into fiscal difficulties. Wars, protests and
The main problem with the annotation of phases of histor- changes in the elite are potential indicators of
ical cycles is its interpretability. While everyone agrees this phase. 5=Crisis. a triggering event such as
the 1789-1799 period in France was a time of crisis, reach- a war, revolt, famine or disaster that the state is
ing a consensus on the impact of the 1860s French inter- unable to manage leads to a new configuration
vention in Mexico proves more difficult. Did it trigger of society. Emigration of elites, subjugation to
a phase of impoverishment or of elite overproduction? other societies, civil wars or profound reforms
Moreover, did the rise of Mao Zedong as leader of China are potential indicators of this phase.
in the 1950s began a phase of growth or continued the 5. Use the progressive order of the phases if no tex-
previous crisis? tual description is available for the decade.
We defined the following guidelines for the annotation: 6. Make sure there is a progressive order of the la-
bels (e.g. phase 3 must follow phase 2). All labels
1. Read the textual description to identify key
can be repeated in the following decade except
events: wars, reforms, rulers, population, elites,
the crisis phase, which conventionally lasts one
disasters, epidemics, alliances or treaties, socio-
decade.
economic context, famines or financial stress,
protests or movements, religions. A single annotator annotated the entire corpus, then
training set.
We performed 5-fold cross validation and measured
the precision, recall, and F1 score of the predicted labels
compared against the gold standard. Table 2 shows the
results of the experiments.
English
Phase Precision Recall F1-score
1 0.542 0.486 0.513
2 0.338 0.256 0.291
3 0.242 0.048 0.080
4 0.319 0.601 0.416
5 0.330 0.364 0.346
Figure 3: Distribution of the labels in the Chronos dataset. Italian
Phase Precision Recall F1-score
1 0.489 0.510 0.499
2 0.321 0.211 0.254
we evaluated the annotation with two different trials in- 3 0.191 0.044 0.071
volving students, not expert in history. We compared 4 0.290 0.660 0.403
a subset of data annotated by two students to the same 5 0.397 0.186 0.254
subset annotated by the principal annotator. The first
trial was done just following the guidelines after a gen- Table 2
Results of 5-fold multiclass classification experiments. Results
eral explanation of the SDT. The second trial was done,
above the baseline (0.2) are marked in bold.
with different students, following the guidelines after a
training session, where the annotation was discussed and
agreed upon. Results, reported in Table 1, show that with The classification performance shows that the textual
a training session the agreement rises considerably (from descriptions in our dataset are sufficient to predict the
slight to moderate). The base agreement level is compara- corresponding phase to a certain extent, however in quite
ble to the one observed in the annotation of hate speech an imbalanced way. In particular, the classification of
among 5 trained judges on a non-binary scheme, which phases 1 and 4 achieves moderately good results, while
obtained a Fleiss K=0.19 [22] [23]. The distribution of the phase 3 in particular is almost never predicted, despite
labels in the Chronos dataset is depicted in Figure 3. In the rather balanced distribution of labels in the dataset.
the standard secular cycle model, the stress phase (label
4) is the most common, followed by the crisis phase (label
5), which is the least common. The other three phases
(labels 1, 2, and 3) occur with roughly equal frequency in
the data.
4. Classification and Discussion
In order to test the robustness of the Chronos dataset,
we performed cross-validation classification experiments.
The setting is straightforward: each line of the dataset
is considered independently from one another, and we
apply a supervised classification model to predict the
human-annotated label, i.e., the phase (from 1 to 5).
In this experiments, we ignored lines for which no
textual description is available and we used the chance
baseline of 𝐹 1 = 0.2. As learning model, we fine-tuned
RoBERTa large1 [24] for the English textual descriptions
and Italian BERT XXL2 for the Italian texts. We used
a learning rate of 10− 6 and applied early stopping and Figure 4: Confusion matrices of the classification of English
model checkpointing, validating each fold on 10% of the (above) and Italian (below) decade descriptions.
1
https://huggingface.co/FacebookAI/roberta-large
2
https://huggingface.co/dbmdz/bert-base-italian-xxl-cased The confusion matrices in Figure 4 further highlight
interesting trends. While the biases of the models in used for the model is shown in Figure 5. No particular
terms of phases are clear, it is worth noticing that mis- decoding strategy was applied for this experiment.
classification happens often between contiguous phases. Despite the dimension of this model, the classification
performance was poor, 5–10 F1 points below the super-
Structural Demographic Theory predicts vised classification results at the best try. Interestingly,
outbreaks of political instability in the zero-shot classification exhibited a similar pattern in
complex societies, based on three actors: terms of individual labels, with the model strongly biased
the population, the elite, and the state.
towards phase 1 and 4, and unable to properly predict
Each decade is associated with one of five
phases:
phases 2 and 3.
We suggest that, while phases 1 and 4 have similar
1. The ’growth’ phase, when a fresh types of events in most societies (i.e. reforms or won wars
and effective culture creates social in phase 1, famines or financial problems in phase 4) there
cohesion, the economy is growing rapidly is much more variability for phases 2, 3 and 5. It must be
and the state is expanding its control over noted that these experiments only scratches the surface
the population; of the learning capabilities of the Chronos dataset. In
particular, in this setting, the temporal interdependence
2. The ’population immiseration’ phase, of the decades is not considered, and specific algorithms
when the population continues to grow while
should be applied in the future to capture this temporal
the economy slows;
structure.
3. The ’elite overproduction’ phase,
when the population tries to access the
elite ranks but overloads the social lift
5. Conclusion and Future
mechanisms and yields a reduced capability
We introduced a new classification task named historical
of the elite to solve problems in the
society;
phase recognition. We believe that, once we improve
their performance, classification algorithms trained for
4. The ’state stress’ phase, when the this task will allow us to automatically annotate many
state’s ability to govern the population more polities with secular cycles with a potential disrup-
and foster cooperation between population tive improvement in the study of comparative history.
and elites begins to decline, and the We believe that inter-annotator agreement can be fur-
elites become increasingly fragmented; ther improved by having domain experts annotate the
data. Additionally, the automatic extraction of events
5. The ’crisis, collapse or recovery’ from short historical texts, or the definition of guidelines
phase, when the state is either reformed
for their annotation, can be a valuable tool both in the
by the elites or overthrown by internal or
annotation and classification tasks. By combining these
external forces;
two approaches, we can improve the dataset and make it
Act as a highly intelligent historian more reliable.
chatbot. You will be given the description For the future we plan to improve the performance of
of a decade and you are asked to predict classification by including the temporal interdependence
the phase number. Please output only a factors, and to improve the inter annotator agreement,
number from 1 to 5. also calculating the agreement between labels generated
by models and by humans. In the future it would be
Decade: textual description interesting to add event structure annotations such as
TimeML in Chronos. The poor performance in zero-shot
Phase:
classification using an LLM is likely a function of the
Figure 5: Prompt for zero-shot classification experiments sophisticated reasoning and world knowledge required
with LlaMa70B. to perform the task. The LLM could benefit from more
advanced prompting strategies (e.g. few-shot or chain-of-
This suggests that a more refined, regression-based thoughts) or even supervision in the form of fine-tuning.
learning setting could be more favorable to this kind of The Chronos dataset is accessible online in viewer/-
data. Finally, we performed a pilot experiment with a commenter mode4 . Edit and download access is available
large language model, namely LlaMa 3 70B3 , prompting under request.
the model to elicit zero-shot classifications of the phases
4
given the textual descriptions in English. The prompt we https://docs.google.com/spreadsheets/d/
1OW6CtmUudN3WTJ1VvWRZYZdTWVEjDJGns6Q8_I6EBwk/
3
https://huggingface.co/meta-llama/Meta-Llama-3-70B edit?usp=sharing
Acknowledgments Flattening the curve: Learning the lessons of world
history to mitigate societal crises, osf.io (2022).
This work was supported by the European Commission [13] P. Turchin, A Structural-Demographic Analysis of
grant 101120657: European Lighthouse to Manifest Trust- American History, Beresta Books Chaplin, 2016.
worthy and Green AI - ENFIELD. [14] G. Orlandi, D. Hoyer, H. Zhao, J. S. Bennett, M. Be-
nam, K. Kohn, P. Turchin, Structural-demographic
analysis of the qing dynasty (1644–1912) collapse
References in china, Plos one 18 (2023) e0289748.
[1] F. Celli, E. Stepanov, M. Poesio, G. Riccardi, Pre- [15] R. Sprugnoli, S. Tonelli, One, no one and one hun-
dicting brexit: Classifying agreement is better than dred thousand events: Defining and processing
sentiment and pollsters, in: Proceedings of the events in an inter-disciplinary perspective, Nat-
Workshop on Computational Modeling of People’s ural language engineering 23 (2017) 485–506.
Opinions, Personality, and Emotions in Social Me- [16] B. J. Van Bavel, D. R. Curtis, M. J. Hannaford,
dia (PEOPLES), 2016, pp. 110–118. M. Moatsos, J. Roosen, T. Soens, Climate and so-
[2] M. Lai, F. Celli, A. Ramponi, S. Tonelli, C. Bosco, ciety in long-term perspective: Opportunities and
V. Patti, Haspeede3 at evalita 2023: Overview of the pitfalls in the use of historical datasets, Wiley In-
political and religious hate speech detection task, in: terdisciplinary Reviews: Climate Change 10 (2019)
M. Lai, S. Menini, M. Polignano, V. Russo, R. Sprug- e611.
noli, G. Venturi (Eds.), Proceedings of the Eighth [17] R. Abramitzky, L. Boustan, K. Eriksson, J. Feigen-
Evaluation Campaign of Natural Language Process- baum, S. Pérez, Automated linking of historical data,
ing and Speech Tools for Italian. Final Workshop Journal of Economic Literature 59 (2021) 865–918.
(EVALITA 2023), Parma, Italy, September 7th-8th, [18] F. Boschetti, C. Andrea, D. Felice, G. Lebani, P. Lucia,
2023, volume 3473 of CEUR Workshop Proceedings, P. Paolo, V. Giulia, M. Simonetta, et al., Computa-
CEUR-WS.org, 2023. tional analysis of historical documents: An appli-
[3] D. Nozza, F. Bianchi, G. Attanasio, Hate-ita: Hate cation to italian war bulletins in world war i and
speech detection in italian social media text, in: Pro- ii, in: Proceedings of the LREC 2014 Workshop on
ceedings of the Sixth Workshop on Online Abuse Language resources and technologies for process-
and Harms (WOAH), 2022, pp. 252–260. ing and linking historical documents and archives
[4] E. W. Pamungkas, A. T. Cignarella, V. Basile, V. Patti, (LRT4HDA 2014), ELRA, 2014.
et al., Automatic identification of misogyny in en- [19] P. Turchin, H. Whitehouse, P. François, D. Hoyer,
glish and italian tweets at evalita 2018 with a multi- A. Alves, J. Baines, D. Baker, M. Bartokiak, J. Bates,
lingual hate lexicon, in: CEUR Workshop Proceed- J. Bennet, et al., An introduction to seshat: Global
ings, 1, CEUR-WS, 2018, pp. 1–6. history databank, Journal of Cognitive Historiogra-
[5] S. S. Tekiroglu, Y.-L. Chung, M. Guerini, Generating phy 5 (2020) 115–123.
counter narratives against online hate speech: Data [20] F. Celli, Feature Engineering for Quantitative Anal-
and strategies, arXiv preprint arXiv:2004.04216 ysis of Cultural Evolution, Technical Report, Center
(2020). for Open Science, 2022.
[6] R. Dalio, Principles for dealing with the changing [21] R. Sprugnoli, S. Tonelli, Novel event detection and
world order: Why nations succeed or fail, Simon classification for historical texts, Computational
and Schuster, 2021. Linguistics 45 (2019) 229–265.
[7] J. A. Goldstone, Demographic structural theory: 25 [22] F. Del Vigna, A. Cimino, F. Dell’Orletta, M. Petroc-
years on, Cliodynamics 8 (2017). chi, M. Tesconi, Hate me, hate me not: Hate speech
[8] A. V. Korotaev, Introduction to social macrodynam- detection on facebook, in: Proceedings of the first
ics: Secular cycles and millennial trends in Africa, Italian conference on cybersecurity (ITASEC17),
Editorial URSS, 2006. 2017, pp. 86–95.
[9] P. Turchin, S. A. Nefedov, Secular cycles, in: Secular [23] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco,
Cycles, Princeton University Press, 2009. V. Patti, Resources and benchmark corpora for hate
[10] P. Turchin, A. Korotayev, The 2010 structural- speech detection: a systematic review, Language
demographic forecast for the 2010–2020 decade: Resources and Evaluation 55 (2021) 477–523.
A retrospective assessment, PloS one 15 (2020). [24] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen,
[11] T. Piketty, Capital in the twenty-first century, Har- O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
vard University Press, 2014. Roberta: A robustly optimized BERT pretraining
[12] D. Hoyer, J. S. Bennett, H. Whitehouse, P. François, approach, CoRR abs/1907.11692 (2019). URL: http:
K. Feeney, J. Levine, J. Reddish, D. Davis, P. Turchin, //arxiv.org/abs/1907.11692. arXiv:1907.11692.