<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Benchmarking Historical Phase Recognition from Text and Events</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabio Celli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Rovera</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Maggioli Research</institution>
          ,
          <addr-line>Santarcangelo di Romagna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents preliminary studies on a benchmark for the Historical Phase Recognition task. This task explores the application of computational linguistics to the study of long-term historical dynamics. We compare the utility of Event Tagging and BERT embeddings for classifying the phases of secular cycles defined by the the Structural-Demographic Theory. We explore this task both as five-class classification (crisis, growth, population immiseration, elite overproduction, State stess) and binary classification (rise, decline), on the basis of human- and LLM-annotated labels. Our findings reveal that Event Tagging, when aligned with human annotations, yields good performance in multi-class classification, but not in binary classification. Conversely, using BERT to extract features directly from text yields better performances with LLM-generated labels, in particular on the binary classification task. We also report higher inter-annotator agreement between LLMs compared to humans when labeling historical phases.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Historical Phase Recognition</kwd>
        <kwd>Cultural Analytics</kwd>
        <kwd>Structural Demographic Theory</kwd>
        <kwd>Large Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>Historical Phase Recognition is a novel task that aims
at the classification of phases of past societies according
to existing theoretical frameworks. This task, based on
the idea that history is a complex adaptive system [1]
like language [2], can be useful for exploring and
comparing societal adaptation processes in their long-term
trends [3], to find replicable patterns. Societies have
historical and structural dimensions [4] and evolve through
dynamics that create cycles [5], following irreversible
developmental paths that eventually cause them to break
down [6] or recover. Crucially, much of historical
information is expressed in natural language [7], and it is
available from open sources like Wikipedia [8, 9], hence
computational linguistics tasks such as event detection
[10] can ofer a great contribution to this line of research. SDT has proven to be a valuable framework for
under</p>
      <p>
        A theoretical framework in this area that has proven to standing a diverse array of historical occurrences. For
be suitable for computational analysis is the Structural- instance, it has been applied to analyze the underlying
Demographic Theory (SDT) [11]. By integrating this causes of the French Revolution, the elite rivalries that
theory with data modeling techniques, researchers were fueled the American Civil War [14], and the factors
conable to make remarkably accurate predictions about the tributing to the collapse of the Qing Dynasty [15].
Furglobal crises that unfolded
        <xref ref-type="bibr" rid="ref2">in the 2020</xref>
        s [12]. This pre- thermore, SDT is also employed to analyze contemporary
dictive power underscores the value of SDT as a tool historical events, ranging from the Egyptian revolution
for analyzing complex socio-political dy
        <xref ref-type="bibr" rid="ref4">namics within of 2011</xref>
        [16] to the political instability experienced in the
historical datasets [13]. Specifically, the SDT posits that US in 2021 [17].
historical cycles are characterized by five distinct phases: Previous work in Historical Phase Recognition [18]
released the Chronos dataset, annotated by humans, and
CLiC-it 2025: Eleventh Italian Conference on Computational Linguis- demonstrated that systems can learn models with
perfortics, September 24 — 26, 2025, Cagliari, Italy mance above chance, although far from perfect. Recent
* Corresponding author. research in the field reports that LLMs can reach human
$ f0a0b0i0o-.0c0el0l2i@-7m30a9g-g58io8l6i.i(tF(.FC.eClleil)li); m.rovera@fbk.eu (M. Rovera) performance in Historical Phase labeling and report that
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License the intra-annotator agreement of LLMs is consistent [19].
Attribution 4.0 International (CC BY 4.0).
      </p>
      <p>• 0. Crisis (widespread conflict that results in a</p>
      <p>restructuring of the socio-political order);
• 1. Growth (a new order creates social cohesion,
triggering high productivity and increasing
competition for social status);
• 2. Population immiseration (increased
competition for status and resources leads to rising
inequality);
• 3. Elite overproduction (inequalities lead to
radical factionalism and frustrated individuals who
may become agents of instability) and
• 4. State stress (the rising instability brings fiscal
distress and both lead the State towards potential
crises with widespread conflicts, restarting the
cycle).</p>
      <p>Still there is no benchmark in Historical Phase Recog- changes, and religions. Descriptions are summarized to
nition, and there are research questions about this task an average of 400 characters per decade, with source
that remain unanswered, for instance: references when available. Each entry includes a
timestamp, historical age, sampling zone, world region, and
• (RQ1) Can Event Tagging provide a generalization a standardized Polity ID encoding origin, name, societal
that helps Historical Phase Recognition? type, and periodization. The dataset contains more than
• (RQ2) Can LLMs-as-annotators reach a higher 9000 rows, but most of them have no textual description,
consensus than humans in SDT labeling? especially those in remote times. Moreover, there are
• (RQ3) Which kind of label is easier to model, the duplicates, as some polities expanded over more than
one made by humans or by LLMs? one sampling zone, and were sampled more than once.
• (RQ4) Is it easier to perform Historical Phase The dataset also contains a flag to indicate whether the
Recognition as 5-class or as a binary classification historical information reported is recorded or supposed.
task? Using these information we created a benchmark.</p>
      <p>To answer RQ1 we use EventNet-ITA, a Frame Parser1
trained on a large Italian corpus, annotated with semantic
frames of events2. This tool provides a fast and efective
method for extracting Event Frames in Italian, achieving
a performance of 0.9 F1-score for Frame Identification
and 0.72 for Frame Element Identification on the original
dataset [20]. To answer RQ2 we employ GPT4 [21] and
Llama 3.1-400b [22] as annotators, producing a new SDT
annotation on data. To answer RQ3 we adopt a
perspectivist approach [23], running the classification task on
diferent label sets and even on combination of labels.</p>
      <p>Lastly, to answer RQ4, we aggregate phases 1 and 2
under the label "rise" and phases 3, 4, and 0 under the label
"decline," and then perform a binary classification task.</p>
      <p>The paper is structured as follows: In Section 2 we
describe how we created a benchmark from the Chronos
dataset to promote the reproducibility of future
experiments. In Section 3 we describe our experimental design,
with annotation guidelines, prompts, analysis of labels
and the results of the classification experiments. Finally,
in Section 4, we draw our conclusion.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data</title>
      <sec id="sec-2-1">
        <title>Previous work on the Historical Phase Recognition task</title>
        <p>made a huge efort to produce annotated data [ 18], but the
results of the previous classifications are not fully
replicable. Hence we decided to develop a benchmark with
ifxed training and test sets out of the Chronos dataset.</p>
        <p>
          The Chronos dataset, built upon the Seshat historical
databank [24] and augmented with Wikipedia content,
provides time-series data, in Italian and English, of
historical events for 366 polities across 18 sampling zones,
spann
          <xref ref-type="bibr" rid="ref2">ing from neolithic to the 2010</xref>
          s CE. Each row in
the dataset represents an historical decade of a polity
in a sampling zone. Textual descriptions of the selected
events that happened in the decade include information
about wars, reforms, rulers, population, elites, disasters,
alliances, socio-economic context, famines, protests, elite
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>1https://huggingface.co/mrovera/eventnet-ita 2https://huggingface.co/datasets/mrovera/eventnet-ita</title>
        <sec id="sec-2-2-1">
          <title>2.1. Annotation and Agreement</title>
          <p>
            First, we extracted event tags from the historical
descriptions in Italian with EventNet-ITA. Then we removed
duplicates and selected the rows with tags, text and recorded
information. We obtained 1422 rows with data spann
            <xref ref-type="bibr" rid="ref2">ing
from antiquity to 2010</xref>
            s. The data included also the
original SDT labels, annotated by human hand following the
points in these guidelines:
1. Read the textual description to identify key
events: wars, reforms, rulers, population, elites,
disasters, epidemics, alliances or treaties,
socioeconomic context, famines or financial stress,
protests or movements, religions.
2. Use polity identifiers to find the start and end
points of cultures. The end of a culture represents
a crisis period.
3. Starting from the beginning of a culture, initially
assign the sequence of labels of a standard
secular cycle model: 1,1,2,2,3,3,4,4,4,0 and then
evaluate whether to keep or change the labels in each
decade. It is possible to have longer or shorter
cycles. There can be only one label 0 (crisis) per
cycle. A polity can have one or more cycles.
4. Having in mind the key events in the textual
description, select one of the following labels to
describe the decade: 1=growth. A society is
generally poor when it experiences renewal or change
followed by demographic (but not always
territorial or economic) growth. Reforms, alliances,
wars won or similar events are potential
indicators of this phase. 2=impoverishment of the
population. Potential economic and/or territorial
expansion slows while demography continues to
expand. The elite takes much of the wealth and
defines the status symbols. Stability and
external attacks are potential indicators of this phase.
3=Overproduction of the elites. The wealthy seek
to translate their wealth into positions of
authority and prestige. The population becomes poor.
Movements, protests, and wars are potential
indicators of this phase. 4=State stress. The elites
want to institutionalize their advantages in the
form of low taxes and privileges that lead the
state into fiscal dificulties. Wars, protests and
changes in the elite are potential indicators of
this phase. 0=Crisis. a triggering event such as
a war, revolt, famine or disaster that the state is
unable to manage leads to a new configuration
of society. Emigration of elites, subjugation to
other societies, civil wars or profound reforms
are potential indicators of this phase.
5. Use the progressive order of the phases if no
textual description is available for the decade.
6. Make sure there is a progressive order of the
labels (e.g. phase 3 must follow phase 2). All labels
can be repeated in the following decade except
the crisis phase, which conventionally lasts one
decade.
          </p>
          <p>The annotation in the Chronos dataset was validated
with three human annotators, who independently
labeled a sample of 93 examples from the data. The initial
agreement was low (Fleiss’ k 0.206) because a single
disagreement has an exponential impact on the rest of the
sequence, but after a training session and the use of a
standard pattern to start with (the sequence of secular
cycle labels 1,1,2,2,3,3,4,4,4,0), the agreement between
humans raised to Fleiss’ k 0.455.</p>
          <p>In order to answer RQ3 (whether it is easier to predict
labels annotated by humans or LLMs) we produced new
labels using GPT4 (1.8 trillion parameters) and Llama
3.1 (405 billion parameters) with the prompt reported in
Figure 1 and temperature of 0.5. We provided the input
data in chunks containing sequential decades of one or
two polities per run. Despite the prompt explicitly
required to assume that the sequence of labels follows a
standard secular cycle model like the one used by
humans (1,1,2,2,3,3,4,4,4,0), sometimes the LLMs produced
as output unordered labels.</p>
          <p>In order to create a benchmark, we split the data into
training (1222 instances) and test set (200 instances). The
labels have comparable distributions in the training and
test set, as reported in Figure 2. While human and LLM
labels approximate a log-normal distribution, the
averaged labels approximate a normal distribution. This is
because averaging labels with big misalignments (such
as label "1" and label "4") tend to produce more labels "2",
which became a wastebasket label.</p>
          <p>
            We computed the inter annotator agreement over all
1422 examples and pairs of annotators, greatly expanding
the experiments presented in literature. We evaluated
results with k statistics and Krippendorf’s  [25].
Although pairs that mix human and LLM annotations have
an agreement comparable to previous results, here GPT4
Act as an expert historian and consider the Structural
Demographic Theory (SDT). Given a set of descriptions
of historical decades for diferent polities, label each
description with one of the following secular cycle phases
(sdtphase):
0=crisis (in this phase may happen societal collapse
patterns, power transitions, conflicts, administrative or
social structure changes, and external influences. Look for
signs of civil wars, military coups, environmental factors,
population movements, reform of tax systems, trade
network disruptions, class conflicts, and foreign invasions).
1=growth (a society recovers from a crisis finding a new
fresh culture that creates social cohesion. to recognize this
phase examine the power structure patterns, legitimacy
of rule, social organization, cultural elements, military
aspects, and social changes. Look for the presence of strong
elite classes, religious legitimation of power, centralized
administrative systems, trade networks, cultural practices,
territorial expansion, and population movements);
2=population impoverishment (growth slows and inequalities
begin to emerge. to recognize this phase evaluate the
power dynamics, economic patterns, military aspects,
cultural/religious elements, administrative features, and
infrastructure development. Look for succession struggles,
trade route development, territorial conquests, religious
tolerance, bureaucratic reforms, and construction projects);
3=elite overproduction (the number elite aspirants rises
and the social lift mechanisms deteriorate. To recognize
this phase assess power dynamics, governance, economic
patterns, social structures, cultural and technological
development, and common catalysts for change. Look for
power struggles, trade system developments, social unrest
between elite and population, religious developments,
and military conflicts), 4=state stress (elites struggle to
institutionalize their advantages. to recognize this phase
review political instability, power struggles, economic
challenges, military conflicts, administrative changes, and
social/religious tensions. Look for succession disputes,
financial crises, territorial loss, reforms to advantage
specific elite groups, social unrest and religious conflicts).
Initially assume that the sequence of labels follows a
standard secular cycle model: 1,1,2,2,3,3,4,4,4,0 and then
evaluate whether to keep or change the labels in each
decade. Evaluate each label on the basis of the preceding
and following ones. It is possible to have longer or shorter
cycles. A cycle cannot turn back and cannot skip phases. So
if in 1940 there is a phase 0, in 1950 there should be a phase
1, in 1960 there can be a phase 1 or phase 2. If in 1960 there
is a phase 2, in 1970 there can be a phase 2 or phase 3, not
a phase 4. If in 1970 there is a phase 3, in 1980 there can
be a phase 3 or 4, and if in 2000 there is phase 4,
            <xref ref-type="bibr" rid="ref2">in 2010</xref>
            there can be a phase 0 or another phase 4. The decade after
phase 0 the cycle restarts from phase 1.
          </p>
          <p>This is an example of the input (json): ⟨⟩
and this is the desired output (csv): ⟨⟩
set of descriptions to label (json): ⟨⟩
0.5; moreover, these findings closely match the results
obtained when both humans and LLMs received identical
instructions and the temperature was set to zero [19].</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>The evaluation with Krippendorf’s  , which could bet</title>
        <p>ter capture the importance of label order, shows results
similar to the ones computed with Fleiss and Cohen’s k,
suggesting that there might be disagreements on distant
labels, like 0 and 4. Results are reported in Table 1.
Results of inter-annotator agreement between pairs of
Histor</p>
        <sec id="sec-2-3-1">
          <title>2.2. Contents</title>
          <p>The final dataset contains the following features:
• a decade ID formatted with a standard method:
2 letters to indicate the area of origin of the
culture, 3 letters to indicate the name of the
polity, 1 letter to indicate the type of
society (c=culture/community; n=nomads; e=empire;
k=kingdom; r=republic), 1 letter to indicate
the periodization (t=terminal; l=late; m=middle;
e=early; f=formative; i=initial; *=any) and a
number corresponding to the decade. For example
"EgPdyk*-2960" is the pre-dynastic kingdom of</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Egypt in the 2960s b.C. "ItRomrm-220" is the middle Roman Republic in the 220s b.C. and "TrOttet1850" is the terminal phase of the Ottoman Empire in the 1850s;</title>
        <p>• a short Italian textual description of the decade
(the one used for the experiments);
• a short English textual description of the decade;
• the list of tags extracted from text;
• human annotated SDT labels;
• SDT labels annotated with GPT4,
• SDT labels annotated with Llama3.1,
• the average of all the SDT labels, turned into
integer values;
• the average of the SDT labels generated with LLMs,
turned into integer values;
• the binary labels annotated by humans obtained
from SDT labels (1,2=rise; 3,4,0=decline);
• the binary labels annotated by LLMs obtained
from SDT labels (1,2=rise; 3,4,0=decline).</p>
        <p>Examples of data follows3:</p>
      </sec>
      <sec id="sec-2-5">
        <title>1. JpKamk*1290, “al tempo del reggente Ho¯jo¯</title>
        <p>Sadatoki (r. 1284–1301) per il principe Hisaaki il
clan Ho¯jo¯ era alleato del clan Adachi. Tuttavia un
complotto di Adachi Yasumori per usurpare gli
Ho¯jo¯ portò al colpo di stato noto come incidente
Shimotsuki. vinse Hojo.”,“at the time of Regent
Ho¯jo¯ Sadatoki (r. 1284–1301) for Prince Hisaaki
the Ho¯jo¯ clan was allied of the Adachi clan.
However a plot by Adachi Yasumori to usurp the Ho¯jo¯
resulted in the coup known as Shimotsuki
incident. the Ho¯jo¯ won.”, process*PROCESS_START
activists*POLITICAL_ACTIONS
invader*INVADING PROCESS_START
PO</p>
        <p>LITICAL_ACTIONS INVADING,4,4,4,4,4,0,0
2. IqBabke-1750, “possibile apertura di una rotta
commerciale per beni di lusso e minerale di
stagno verso il Levante (Caanan) e l’Anatolia
orientale (occupata dagli Assiri).”,“possible
opening of a commercial route for luxury goods
and tin ore towards the Levant (Caanan) and
eastern Anatolia (occupied by Assyrians).”, Figure 3: Wordclouds of Event tags in the binary
classificaland*OCCUPANCY occupier*OCCUPANCY tion task. The wordclouds include only the examples where
OCCUPANCY,2,2,2,2,2,1,1 all annotations agreed on the same label. Event frames are
3. EgMamke1340,“peste nera ad Alessandria nel represented in uppercase while frame elements in lowercase.
1347. Serie di sultani di breve durata.”,“black death
in Alexandria in 1347. Series of short lived
Sultans.”, old*TAKE_PLACE_OF killer*KILLING
cause*DEATH place*DEATH time*DEATH
TAKE_PLACE_OF KILLING DEATH,4,1,3,3,2,0,1
on the same label. Figure 3 reports the wordclouds for
the binary classification task. As introduced in Section
2, Event Frames are shown in uppercase, while Frame</p>
        <p>Example 1 describes the Japanese Kamakura period in Elements in small caps, along with their Frame, in the
1290s and is a case where all the annotations agree about format frame_element*EVENT_FRAME. The larger and
phase 4 (or 0, "decline" in the case of binary labels). Ex- bolder a word, the more strongly it is associated with that
ample 2 reports a description of Kassite Babylon in 1750s particular phase. From the wordclouds is clear that there
b.C. and is a case where all annotations agree on phase 2 are overlapping Event Frames between the two phases
(or 1, "rise"). Example 3 describes Mamluk Egypt in 1340s (eg: CONQUERING, WAR, CHANGE_OF_LEADERSHIP,
and it is a case of disagreement between annotations. BEAT_OPPONENT), while the same Frame Elements</p>
        <p>We ordered the data alphabetically using the text col- seem to have diferent frequencies in the two phases.
umn, thus obtaining a pseudo-randomization of the in- Things are much more complicated in the multi-class
stances and breaking the temporal sequences. We dubbed classification task, depicted in Figure 4. In summary, the
this dataset "Chronos benchmark", which is freely avail- wordclouds show a progression where there are many
able on Huggingface4. overlaps of Event Frames between phases, in particular
the BEAT_OPPONENT and CONQUERING events.
However, Frame Elements help distinguish between phases:
3. Analysis and Discussion theme*CONQUERING clearly appears in the growth and
crisis phases, while other low-frequency elements, such
In order to answer RQ1 (whether Event Tagging is useful as process*PROCESS_START, and goal*ATTEMPT are
to recognize diferent phases), we performed an analysis distinctive of phases 3 and 4 respectively. In general,
of events per label. To do so, we extracted wordclouds wordclouds with smaller words, like the ones for phase
including only the examples where all annotators agreed 2, 3 and 4, highlight the need to capture weak signals for
the classification tasks.
3sEmVaElNlcTa_pFsR.AMES are shown in uppercase, frame_elements in Overall, the similarity of the tags between phases
il4https://huggingface.co/datasets/facells/ lustrate well how dificult is the Historical Phase
Recogchronos-historical-sdt-benchmark nition task.</p>
        <sec id="sec-2-5-1">
          <title>3.1. Experiments</title>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>In order to answer the research questions listed in Sec</title>
        <p>tion 1, we performed two distinct tasks: a multi-class
classification, and a binary classification. Both tasks have
comparable settings, with 768 features extracted with
a frequency token matrix from the EventNet-ITA tags
(events) and 768 features extracted with
BERT-ItalianXXL (bert). To ensure replicability, we used Learnipy
[26], a suite of algorithms for data science and machine
learning in Colab Notebooks available online5,</p>
        <p>Table 2 reports the balanced accuracy of diferent
classification models: Naive Bayes (nb), Gradient Boosting
(xgb), Linear Discriminant Analysis (lda) using the two
feature extraction methods (events, bert) to predict the
5 SDT phases. The models were trained and evaluated
on diferent sets of labels: human-annotated (human), an
average of LLM annotations (llms), and an average of all
annotations (all). The baseline for this task is 0.2.</p>
        <p>Interestingly, the combination of human labels, event
tags and an algorithm that captures weak signals
(Gradient Boosting) yields good performances, suggesting
that for the 5-class classification the event-based features
align well with the human understanding of the SDT
phases. However, the more robust results are achieved
using event tags on the average of all labels, possibly for
the normal distribution resulted from averaging the
labels. In contrast, BERT struggles with human labels: the
results show an average balanced accuracy lower than
the baseline.</p>
        <p>This might indicate that the contextual embeddings
from BERT, while powerful, don’t directly capture the
nuances of the SDT phases as efectively as the event-based</p>
      </sec>
      <sec id="sec-2-7">
        <title>5https://colab.research.google.com/drive/</title>
        <p>1G1VNHUCoDTso6wIWmrdvM21Z6D1PC6nL?usp=sharing</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion</title>
      <p>features when aligned with human annotations.
However, the best performance when using labels averaged
from LLMs is achieved with BERT features and Linear In conclusion, this study has taken initial steps in
leverDiscriminant Analysis. This hints that the patterns cap- aging computational linguistics for the complex task
tured by BERT might be more consistent with the way of Historical Phase Recognition within the
StructuralLLMs interpret and label the SDT phases, although less Demographic Theory framework. Our investigation into
transparent. the utility of Event Tagging revealed its promise,
partic</p>
      <p>An interesting point is that event tags show consistent ularly when aligned with human-annotated data,
achievperformance across diferent label sets (human, all, llms). ing the most robust performance in the 5-class
classificaThe event tagger features consistently provide compet- tion task. This suggests that explicitly identified event
itive results, often outperforming or closely matching structures resonate with human understanding of SDT’s
BERT, with the advantage of being transparent. This nuanced phases. Conversely, while powerful, BERT
emhighlights the value of explicit event information for this beddings struggled to capture these nuances as efectively
Historical Phase Recognition task. Overall, performance on human labels, hinting at a potential mismatch between
still needs improvement. While some results surpass the its learned representations and the human interpretation
baseline of 0.2, the balanced accuracy scores indicate of SDT.
that accurately classifying the 5 SDT phases remains a Interestingly, BERT showed better performance with
challenging task. LLM-generated labels, indicating a possible alignment</p>
      <p>Table 3 presents the results of the binary classifica- in their interpretation patterns, albeit with a loss of
tion task, where the 5 SDT phases were aggregated into transparency compared to event tags. Answering RQ1
"rise" (phases 1 and 2) and "decline" (phases 0, 3, and 4). (whether Event tagging is useful): our results show that
The same feature extraction methods and classification event tags help Historical Phase Recognition when
coualgorithms were used on human-derived binary labels pled with human annotations. Instead, having
LLM(human) and LLM-averaged binary labels (llms). generated labels, transformer models seem the best
choice. In general our results show similar improvements
over the baseline with the multi-class and binary
classification tasks. Hence, answering RQ3 (which kind of label</p>
      <p>Declaration on Generative AI</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Turchin</surname>
          </string-name>
          , Political instability may be a contributor
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>in the coming decade</article-title>
          ,
          <source>Nature</source>
          <volume>463</volume>
          (
          <year>2010</year>
          )
          <fpage>608</fpage>
          -
          <lpage>608</lpage>
          . F.C.
          <article-title>: conceptualization, experiments</article-title>
          and main [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Turchin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Korotayev</surname>
          </string-name>
          ,
          <article-title>The 2010 structuralmanuscript text; M.R.: data enrichment with Event demographic forecast for the 2010-2020 decade: Tagging, manuscript editing. All authors edited and A retrospective assessment</article-title>
          ,
          <source>PloS one 15</source>
          (
          <year>2020</year>
          ).
          <article-title>reviewed the manuscript</article-title>
          . [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Turchin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Structural-Demographic Analysis</surname>
          </string-name>
          of
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>American</given-names>
            <surname>History</surname>
          </string-name>
          , Beresta Books Chaplin,
          <year>2016</year>
          . Acknowledgments [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Orlandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Bennett</surname>
          </string-name>
          , M. Be-
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>nam</surname>
            , K. Kohn,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Turchin</surname>
          </string-name>
          ,
          <article-title>Structural-demographic This research was supported by the European Commis- analysis of the qing dynasty (1644-1912) collapse sion, grant 101120657: European Lighthouse to Manifest in china</article-title>
          ,
          <source>Plos one 18</source>
          (
          <year>2023</year>
          )
          <article-title>e0289748</article-title>
          .
          <article-title>Trustworthy and Green AI-ENFIELD</article-title>
          . [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Korotayev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zinkina</surname>
          </string-name>
          , Egypt's
          <year>2011</year>
          revolution: A
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>revolutions in the 21st century: The new waves of References revolutions, and the causes and efects of disruptive</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>political change</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>651</fpage>
          -
          <lpage>683</lpage>
          . [1]
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Maldonado</surname>
          </string-name>
          ,
          <article-title>History as an increasingly com-</article-title>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Turchin</surname>
          </string-name>
          ,
          <article-title>End times: elites, counter-elites, and the</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>plex system, History and Cultural Identity: Retriev- path of political disintegration</article-title>
          ,
          <source>Penguin</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>ing the Past, Shaping the Future (</article-title>
          <year>2011</year>
          )
          <fpage>129</fpage>
          -
          <lpage>152</lpage>
          . [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Celli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          , History repeats: Historical phase [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basso Fossali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mazur</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Ollagnier- recognition from short texts</article-title>
          ,
          <source>Proceedings of CLIC-</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Beldame</surname>
          </string-name>
          ,
          <article-title>Language is a complex adaptive system: it 2024 (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>Explorations and evidence</article-title>
          , Language Science Press, [19]
          <string-name>
            <given-names>F.</given-names>
            <surname>Celli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <article-title>Large language models rival</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          2022.
          <article-title>human performance in historical labeling</article-title>
          , in: Pro[3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Toynbee</surname>
          </string-name>
          <article-title>'s, A study of history</article-title>
          ,
          <source>Munich: List. ceedings of ARDUOUS</source>
          <year>2025</year>
          ,
          <article-title>co-located with ECAI,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Henry</surname>
            ,
            <given-names>William P.</given-names>
          </string-name>
          , Greek Historical Writing: A His-
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>toriographical Essay</surname>
          </string-name>
          (
          <year>1991</year>
          ). [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rovera</surname>
          </string-name>
          ,
          <article-title>Eventnet-ita: Italian frame parsing for [4</article-title>
          ]
          <string-name>
            <given-names>N.</given-names>
            <surname>Luhmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Baecker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gilgen</surname>
          </string-name>
          , Introduction to events,
          <source>in: Proceedings of the 8th Joint SIGHUM</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <article-title>systems theory</article-title>
          , Polity Cambridge,
          <year>2013</year>
          . Workshop on Computational Linguistics for Cul[5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Dalio</surname>
          </string-name>
          ,
          <article-title>Principles for dealing with the changing tural Heritage, Social Sciences, Humanities</article-title>
          and Lit-
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <article-title>world order: Why nations succeed or fail, Simon erature (LaTeCH-CLfL</article-title>
          <year>2024</year>
          ),
          <year>2024</year>
          , pp.
          <fpage>77</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>and Schuster</source>
          ,
          <year>2021</year>
          . [21]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Baktash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dawodi</surname>
          </string-name>
          ,
          <article-title>Gpt-4: A review on ad[6] I. Wallerstein, Historical systems as complex sys- vancements and opportunities in natural language</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>tems</surname>
          </string-name>
          ,
          <source>European Journal of Operational Research 30 processing, arXiv preprint arXiv:2305.03195</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          (
          <year>1987</year>
          )
          <fpage>203</fpage>
          -
          <lpage>207</lpage>
          . [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Deroy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maity</surname>
          </string-name>
          ,
          <article-title>Code generation</article-title>
          and algorith[7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Porter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Amodeo</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Miller, mic problem solving using llama 3.1 405b, arXiv</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Marston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Armal</surname>
          </string-name>
          ,
          <article-title>A natural language process-</article-title>
          preprint
          <source>arXiv:2409.19027</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>ing approach to understanding context in the ex-</article-title>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Abercrombie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Pedrani,
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>&amp; Management</source>
          <volume>59</volume>
          (
          <year>2022</year>
          )
          <article-title>102735. cessing: a survey, Language Resources</article-title>
          and Evalua[8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fisichella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ceroni</surname>
          </string-name>
          ,
          <article-title>Event detection in tion (</article-title>
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>wikipedia edit history improved by documents web</article-title>
          [24]
          <string-name>
            <given-names>P.</given-names>
            <surname>Turchin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Whitehouse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>François</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hoyer</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>tive Computing</source>
          <volume>5</volume>
          (
          <year>2021</year>
          )
          <article-title>34</article-title>
          . J.
          <string-name>
            <surname>Bennet</surname>
            , et al., An introduction to seshat: Global [9]
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Rovera</surname>
          </string-name>
          ,
          <article-title>A knowledge-based framework for history databank</article-title>
          ,
          <source>Journal of Cognitive</source>
          Historiogra-
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <article-title>events representation and reuse from historical phy 5 (</article-title>
          <year>2020</year>
          )
          <fpage>115</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          archives, in: European Semantic Web Conference, [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendorf</surname>
          </string-name>
          ,
          <article-title>Computing krippendorf's alpha-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Springer</surname>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>845</fpage>
          -
          <lpage>852</lpage>
          . reliability (
          <year>2011</year>
          ). [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tonelli</surname>
          </string-name>
          , One, no one and one hun- [26]
          <string-name>
            <given-names>F.</given-names>
            <surname>Celli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Casadei</surname>
          </string-name>
          , Learnipy: a Repository
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <article-title>events in an inter-disciplinary perspective</article-title>
          ,
          <source>Nat- Coding</source>
          ,
          <source>Technical Report</source>
          ,
          <year>2022</year>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <source>ural language engineering 23</source>
          (
          <year>2017</year>
          )
          <fpage>485</fpage>
          -
          <lpage>506</lpage>
          . //github.com/facells/fabio-celli-publications/blob/ [11]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Goldstone</surname>
          </string-name>
          , Demographic structural theory:
          <volume>25</volume>
          main/docs/2022_learnipy_techreport.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>years on, Cliodynamics</source>
          <volume>8</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>