<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>REVERINO: REgesta generation VERsus latIN summarizatiOn</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giovanni Puccetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Righi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilaria Sabbatini</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Esuli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Istituto di Scienza e Tecnologie dell'Informazione “A. Faedo”</institution>
          ,
          <addr-line>via G. Moruzzi 1, 56124, Pisa PI</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli Studi di Modena e Reggio Emilia - Dipartimento di Educazione e Scienze Umane</institution>
          ,
          <addr-line>viale Timavo, 93 - 42121, Reggio Emilia RE</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this work we introduce the REVERINO dataset, a collection of 4533 pairs of Latin regesta with their respective full text medieval pontifical document extracted from two collections, Epistolae saeculi XIII e regestis pontificum Romanorum selectae. (1216-1268) and Les Registres de Gregoire IX (1227/41). We describe the pipeline used to extract the text from the images of the printed pages and we make high level analysis of the corpus. After developing REVERINO we use it as a benchmark to test the ability of Large Language Models (LLMs) to generate the regestum of a given Latin text. We test 3 LLMs among the best performing ones, GPT-4o, Llama 3.1 70b and Llama 3.1 405b and find that GPT-4o is the best at generating text in Latin. Interestingly, we also find that for Llama models it can be beneficial to first generate a text in English and then translate it in Latin to write better regesta.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Regesta</kwd>
        <kwd>Latin Text Summarization</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Digital Humanities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        ITSERR [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (Italian Strengthening of the ESFRI RI RESILIENCE) is a interdisciplinary and distributed
Research Infrastructure for Religious Studies. In the context of this project, REVERINO is a novel dataset
of regesta with the medieval Latin texts they summarize and their apparatus. A dataset designed to
recreate the methodology of regesta generation, and specifically designed for the creation of an Artificial
Intelligence-based tool for summarizing medieval documents, with a particular focus on pontifical
documents. The decision to employ the system of regesta for organizing, indexing, and summarizing
medieval texts through generative AI stems from the integration of various scholarly needs, which
we explore and test in depth. To create a new automated organizational process tailored to historical
documents, we have chosen to focus on automatic summarization, drawing on an established and
scientifically validated methodology – namely the creation of regesta, a practice improved by humanists
and scholars since the 19th century. Scholars studying medieval charters often need to explore specific
topics, historical figures, or places within vast corpora of sources, sometimes employing a comparative
or longue durée approach. These corpora remain widely dispersed and are preserved across various
libraries and archives that are geographically distant and diferently organized, making them dificult to
access. This is particularly true for the extensive documentation produced by royal and papal chanceries.
Starting from this observation, we decided to work on the creation of a specifically designed dataset for
the development of a tool for the summarization of documents produced by medieval pontifs (c. 1200
to 1350).
      </p>
      <sec id="sec-1-1">
        <title>1.1. Regesta</title>
        <p>A regestum, the Latin word for list, enumeration, specification 1, is a summary of a document made for
the use of stakeholders and scholars, making the document content readily available without the need
to consult it in its entirety. Each regestum contains some essential information, namely, 1) the name of
the author (i.e. the Pope); 2) the name of the recipient; 3) an abstract of the content (with the object and
the operative verb); 4) the date (calculated from the year of pontificate) and 5) the place of production
of the document. A regestum always has a reference full text document, of which it is the “summary”
and both come together with an apparatus, a formal text indicating the collection and the manuscript
where the regestum is found.</p>
        <p>
          While the three components, regestum, full text and apparatus, are conceptually close, collecting
them together can be challenging. Indeed, there are three main issues when trying to retrieve both a
regestum and its full text from a collection or creating a new one: a) the regestum and the corresponding
document are often not collected in the same volume (as in the Potthast collection [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]); b) these modern
printed collections are not easily accessible and readable; c) the publication of new editions or the
update of existing regesta collections is extremely time consuming. For these reasons, many regesta
collections have been created in the past, especially of medieval documents produced by royal and
papal chanceries, but few of these have been updated or created from scratch in recent years.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Text summarization</title>
        <p>In the Natural Language Processing (NLP) literature, automatic text summarization is the task of
rewriting the content of a text passage into a shorter form while retaining the relevant information
without involving a writer in the process. Regesta fit well in this framework since they are summaries
of longer texts meant to expose specific information in an easy to consult form. Therefore, REVERINO
is well suited to be both an easy to inspect dataset of Latin regesta, as well as a training and testing
benchmark for Latin text summarization.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. The REVERINO Dataset</title>
      <p>There is an extensive number of printed collections of regesta in Latin, edited from several scholars
during the 19th and 20th century, however only a few examples that are digitally available, generally
as sets of high quality digital images, and only very rarely with a full text, machine readable version
(and often performed with old or bad quality OCRs). One of the most relevant fields in which regesta
have been produced is the corpus of the pontifical acts and letters issued by the popes and the papal
chancery during the Middle Ages. This corpus guarantees the presence of large collections of regesta
and extended texts that were already edited and published in printed versions during the 19th and 20th
centuries, such as Jafé and Potthast’s Regesta Pontificum Romanorum ; the collection published by the
Bibliothèque des Écoles françaises d’Athènes et de Rome (BEF); the editorial series of the Monumenta
Germaniae Historica (MGH).</p>
      <sec id="sec-2-1">
        <title>2.1. Data Selection</title>
        <p>Regesta collections are only available as images of full pages, this poses a first obstacle to the creation
of a large scale corpus of these documents. Each manuscript has diferent pagination, layout, writing
font, image quality, format, etc., and thus requires a custom pipeline for the extraction of the text into a
machine readable format. Nevertheless, manuscripts from one collection undergo a similar digitization
procedure and therefore can be processed together through a single pipeline, to extract the content of
all the documents.</p>
        <p>
          Given the need for a custom approach for each corpus, we choose to limit our collection to two sets
of printed collections, specifically, we identify 2 main sources:
1. MGH: Epistolae saeculi XIII e regestis pontificum Romanorum selectae. (1216-1268) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
2. Auvray: Les Registres de Gregoire IX (1227/41) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
        </p>
        <p>While conceptually similar these two collections are formally diferent: MGH is written in a single
column format while Auvray in two columns, the first has apparati visually separated from the original
document, while for the second they are part of the regestum. The diferent collections have thus several
smaller visual diferences linked to the layout and the font used.</p>
        <p>Finally, from a qualitative perspective they collect and summarize the documents related to two
diferent medieval popes: Gregory IX and Honorius III. In particular, Auvray collects only the documents
related to pope Gregory IX, and MGH collects the documents issued by both Gregory IX and Honorius
III. These collections were chosen as starting corpus because they allow for the collection of diferent
types of regesta. Indeed, although the creation of a regestum is based on specific rules shared in the
research domain, diferent scholars inevitably produce diferent regesta from the same document. It is
therefore important to consider regesta produced by diferent scholars in diferent periods.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data Curation</title>
        <p>The pipeline leading from a collection of images of printed pages to the REVERINO corpus is composed
of 4 steps: Annotation, Training, Extraction and Post-processing.</p>
        <p>
          Annotation We manually annotate a selected set of pages from each collection of regesta to use as a
training dataset, this is done on a local instance of the eScriptorium platform [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], an example of the
interface can be seen in Figure 1. Our pipeline involves both segmentation of the written parts of each
image as well as OCR. Annotating data for the latter is too time demanding and existing models are
efective enough, therefore we limit ourselves to annotating pages to train a segmentation model and
rely on available OCR ones.
        </p>
        <p>The models in eScriptorium ingest annotations with two kinds of information: 1. areas isolating
the parts of a page that contain text, and 2. lines identifying the text of a line and its position in the
page. Thus, each page is annotated in two steps, first the relevant areas are circled and then each line is
colored.</p>
        <p>
          Training To adapt models to the outline of a manuscript, we start from a working segmentation
model provided by eScriptorium, catmus print large [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This model works suficiently well on MGH
and we can use it as is. Diferently, the Auvray collection has a two columns format and we need to
train the model on a dataset collected in the Annotation step. We go back and forth between Training
and Annotation to fix the limitations of each trained model, reaching a total of 91 annotated pages. This
process led to high quality results in segmenting the outline of the Auvray manuscript and extracting
text.
        </p>
        <p>Extraction Once the model has been trained, we use it to process all the pages in each collection,
obtaining text lines that the model is able to identify along with their position on the page, and thus
giving us a clean continuous stream of text spanning the full document.</p>
        <p>Post-Processing The last step consists in using a series of heuristics based on the content and the
position information in the page of each text line extracted, to separate each regestum from the longer
text it summarizes and from the apparatus.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Data Statistics</title>
        <p>From a quantitative perspective, MGH is composed of a total of 2283 regesta and full text pairs and Auvray
of 3983. However, for Auvray, several of the full texts extracted were only short passages, often
quotations or incipits
that don’t contain the information
needed to generate a regestum, therefore
we drop them. After this cleaning there 60
are 2250 regesta left in Auvray. MGH</p>
        <p>While MGH and Auvray are similar, – Auvray
they are both collections of regesta –, they 40
show several diferences: they collect
documents written by diferent popes and 2 20
they are edited by diferent scholars, and ten
on top of this they are diferent as printed on
publications. Indeed, due to the layout opm 0
opfostehde otwfsoincgolleleccotilounmsn, apsagMeGsHwhiislecoAmu-- -tESCN 20
vray is composed of two columns pages.</p>
        <p>Also, the quality of the second dataset is
generally lower, due to minor errors in 40
OCR quality, few characters and numbers
are wrongly transcribed by our custom 60
model. Therefore we keep two separate 40 20 0 20 40
splits of the dataset. t-SNE Component 1</p>
        <p>
          To provide a qualitative understanding
of the diference, we use t-SNE [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], after Figure 2: T-SNE plot showing samples from the two
encoding the regesta using LaBERTA a manuscripts MGH and Auvray.
latin adaptation of BERT [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
Shared Prompt
1. The name of the author (i.e. the Pope); 2. The name of the recipient; 3. An abstract of the content
(with the object and the operative verb); 3. The date (calculated from the year of pontificate); 4.The
place. TEXT: ...
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Text Summarization in Latin</title>
      <p>
        Text summarization is a long standing task in Natural Language processing [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which in the past was
tackled through dedicated approaches often based on the retrieval of similar passages. Since LLMs have
shown the ability to generate free-form text [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] they are currently the best performing systems for
summarizing texts. An example of a widely used benchmark is the XSUM dataset [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which is a dataset
composed of CNN articles from 2021 along with their summary and the task consists in generating a
summary given the full article.
      </p>
      <p>
        To evaluate text summarization the most used metrics are based on text overlap, the most spread
one is Rouge [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], given an integer  Rouge measures the number of overlapping n-grams between the
generated and the reference text. We focus on Rouge-1, Rouge-2 and Rouge-L, the first two measure
respectively the number of overlapping words, 1-grams, and the number of overlapping word pairs,
2-grams, between the reference and the generated text. The third, Rouge-L, measures the longest
overlapping n-gram between the reference and the generated text.
      </p>
      <p>
        An alternative metric, Bleu [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], is also based on quantifying text overlap, but it measures overlapping
sub-strings instead of words.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Experimental Setup</title>
        <p>
          To understand how well LLMs can summarize text in Latin, we measure the performance of three
powerful LLMs, Llama 3.1 70b, Llama 3.1 405b and GPT-4o. The first two are openly available language
models released from Meta [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] while the third is a closed source model from OpenAI [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. We test
these models in two settings, in the first, format, the model is asked to generate the regestum directly
based on the full text it refers to in the second, backtranslate, when presented with the full text, the
model is asked to initially write a “regestum” in English and then to translate it in Latin.
        </p>
        <p>Each setting, format and backtranslate, is identified by the prompt we provide the LLM to make
it generate the regesta, Table 1 shows the prompt used for each setting as well as a Shared Prompt,
added next to the setting specific one, where we request the model to at least add the key elements of a
regestum, as mentioned in Section 1.1: the author, the recipient, the summary, the date and the place.
Finally, to facilitate the model during generation we add two full texts with their respective regesta.</p>
        <p>We let the models generate up to 8048 tokens and we use greedy decoding, i.e. we pick the most
likely word and avoid any form of sampling during inference since the regestum is meant to be a short
and detailed summary, we will ablate diferent sampling techniques in future works.</p>
        <p>To evaluate model performance we use both a quantitative and a qualitative analysis: first, the
quantitative analysis is based on Rouge and Bleu measuring the similarity between synthetic regesta
generated by an LLM and the original ones from the REVERINO dataset summarizing the same text,
second, the qualitative analysis is based on inspecting in detail a subset of the machine generated regesta
to understand which of the 5 key properties they lack.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Quantitative Results</title>
        <p>Table 2 shows Rouge and Bleu achieved by the three models we test: Llama 3.1 70b, Llama 3.1 405b and
GPT-4o. The first finding is that no model can generate regesta proficiently, this appears from the fact
that none of those we tested can achieve a Rouge higher than 0.40 and a Bleu above 0.15. We can also
see that GPT-4o strongly outperforms both Llama models. The highest Rouge-1 is 0.38, achieved by
GPT-4o on MGH, which also has the higher Rouge-1 on Auvray, although at a significantly lower value,
0.28, which we attribute to the lower quality of the Auvray dataset.</p>
        <p>The wide gap between Rouge-1 and Rouge-2 shows how generally LLMs output texts that share the
general context, higher value of 1-grams overlap, but they find it harder to generate actually similar
texts, lower value for 2-grams overlap.</p>
        <p>The two versions of Llama, Llama 3.1 70b and Llama 3.1 405b, show a small performance gap,
indicating that it is not useful to use the larger and more costly Llama 3.1 405b, comparing format and
backtranslate the first is the best setting for the GPT-4o model, while the opposite is true for Llama
models, which show higher performance when asked to translate in English before writing in Latin.
We have performed a limited prompt tuning that resulted in the choice of the format and backtranslate
settings and we will further explore this aspect in future works.</p>
        <p>Finally, we notice how in rare cases, between 2 and 6 for MGH and between 2 and 3 for Auvray,
the guardrails preventing GPT-4o to answer questions involving violence make it refuse to generate a
regestum, thus the lower values in the N. Samples column, while Llama models do not incur in this issue.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Qualitative Results</title>
        <p>For a more in-depth analysis of the model abilities as seen in the quantitative results, we identify a
corpus of 20 pairs of extended regesta (10 from MGH and 10 from Auvray) to conduct a qualitative
analysis of the results. The regesta are humanly checked by a domain expert in their diferent versions
produced by GPT-4o, Llama 3.1 70b and Llama 3.1 405b, for a total of 120 artificially created regesta
reviewed and compared with their original versions. Through a manual inspection it is possible to
identify the reasons for the results presented in Table 2 and possibly take corrective actions in the
future.</p>
        <p>In agreement with Table 2, also from a qualitative analysis GPT-4o performs better than both Llama
models. This mainly concerns the generation of Latin text and thus the summarization of the document
content in the regestum form. Indeed, Llama models encounter more problems in text generation, as
shown by the fact that the best results are obtained when the summarization is created in English and
then translated into Latin (i.e., backtranslate). More broadly, it can also be observed that the systems
perform better in the case of MGH, but as already mentioned, this can be traced back to how the dataset
is constructed.</p>
        <p>Finally, the qualitative analysis reveals the most critical failures in automatic summarization (i.e.,
automatic regesta creation). One of the problems identified concerns the recognition of documents’
author, namely the Pope. Indeed, in the case of MGH, which collects documents from multiple popes,
both systems show dificulties in recognizing the correct author. In fact, GPT-4o correctly recognizes
the Pope in 11 cases out on 20 taken into account. Another critical element concerns dating, which in
these medieval texts is based on the year of pontificate and not on the modern dating system. Although
Llama 3.1 70b, Llama 3.1 405b and GPT-4o recognize and identify the dating system used in the extended
text of the medieval document and show that they have the tools to accomplish the conversion, both
systems show dificulty in providing correct dating (either because they do not ofer it or because they
miscalculate it). Out of 20 manually inspected records generated by GPT-4o, only 3 cases correctly
report the date of the document. The result is improved in the case of the recognition of the document
recipient (often reported in the first line of the extended text), which in the same sample is recognized
correctly by GPT-4o in 15 out of 20 cases.</p>
        <p>Finally, it should be noted that in a few cases, since our prompt requests the data topica (the place)
to be extracted, the systems correctly extract it even when the original regestum does not report this
information. Thus lead to a lower score in the table, but to a qualitatively better result in regesta
generation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>In this work we have developed the REVERINO dataset, a dataset of 4533 pairs of regesta with their
respective full text (and apparatus). The texts in this dataset come from two collections of regesta,
Epistolae saeculi XIII e regestis pontificum Romanorum selectae. (1216-1268) (MGH) and Les Registres
de Gregoire IX (1227/41) (Auvray), to collect the dataset we have followed a pipeline composed of 4
steps: annotation, training, extraction and post-processing.</p>
      <p>Despite containing more than 4000 samples, REVERINO is too small to be used as a training set
for a language model that automatically generates regesta, however it can be used as a benchmark to
test the ability of existing LLMs to do summarization in Latin and thus to develop better tools and
methodologies in the future.</p>
      <p>We have tested 3 LLMs among the best performing ones, our general finding is that these models
can’t be used as-is to summarize texts in Latin. More precisely, we find that GPT-4o is the best and that
models from the Llama family are less able to generate text in Latin. Interestingly, for both Llama 3.1
70b and Llama 3.1 405b we find that initially translating to English is an efective technique to generate
better regesta.</p>
      <p>We also want to underline the limitations of our work, the samples in our dataset are automatically
extracted, and therefore a share of them contain transcription errors and imperfections. However,
we use the dataset only as a benchmark and it is still too small to serve as a training dataset for a
text-summarization model.</p>
      <p>Despite these limitations, we hope that REVERINO will foster future works on the development
of Language Models proficient in Latin and we will continue improving on by extending it to grow
larger than 10k samples, and by using it to train a custom Language Model specifically tailored to the
generation of regesta in Latin.
This work was supported by project "Italian Strengthening of ESFRI RI RESILIENCE" (ITSERR) funded
by the European Union under the NextGenerationEU funding scheme (CUP:B53C22001770006).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] ITSERR (Italian Strengthening of the ESFRI RI RESILIENCE</article-title>
          ),
          <year>2024</year>
          . URL: https://itserr.it.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Potthast</surname>
          </string-name>
          , Regesta Pontificum Romanorum, Rudolf de Decker,
          <year>1874</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Pertz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rodenberg</surname>
          </string-name>
          ,
          <article-title>Epistolae saeculi XIII e regestis pontificum Romanorum selectae</article-title>
          .
          <source>(1216-1268)</source>
          , volume
          <volume>1</volume>
          -3, Weidmann,
          <year>1894</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Auvray</surname>
          </string-name>
          , Les Registres de Gregoire IX (
          <volume>1227</volume>
          /41), volume
          <volume>1</volume>
          -3, Bibliothèque des Écoles françaises d'Athènes et de Rome,
          <year>1890</year>
          -
          <fpage>1918</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tissot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stokes</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Stoekl Ben Ezra, escriptorium: An open source platform for historical document analysis</article-title>
          ,
          <year>2019</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>19</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICDARW.
          <year>2019</year>
          .
          <volume>10032</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gabay</surname>
          </string-name>
          , T. Clérice, Catmus-print [large],
          <year>2024</year>
          . URL: https://doi.org/10.5281/zenodo.10592716. doi:
          <volume>10</volume>
          .5281/zenodo.10592716.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L. van der</given-names>
            <surname>Maaten</surname>
          </string-name>
          , G. Hinton,
          <article-title>Visualizing data using t-sne</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          (
          <year>2008</year>
          )
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          . URL: http://jmlr.org/papers/v9/vandermaaten08a.html.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Riemenschneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Frank</surname>
          </string-name>
          ,
          <article-title>Exploring large language models for classical philology, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL'23), Association for Computational Linguistics</article-title>
          , Toronto, Canada,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2305. 13698, to appear.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gambhir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Recent automatic text summarization techniques: a survey</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>47</volume>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Narayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lapata</surname>
          </string-name>
          ,
          <article-title>Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization</article-title>
          , in: E.
          <string-name>
            <surname>Rilof</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Chiang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hockenmaier</surname>
          </string-name>
          , J. Tsujii (Eds.),
          <source>Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Brussels, Belgium,
          <year>2018</year>
          , pp.
          <fpage>1797</fpage>
          -
          <lpage>1807</lpage>
          . URL: https://aclanthology.org/D18-1206. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D18</fpage>
          -1206.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>C.-Y. Lin</surname>
            ,
            <given-names>ROUGE:</given-names>
          </string-name>
          <article-title>A package for automatic evaluation of summaries, in: Text Summarization Branches Out, Association for Computational Linguistics</article-title>
          , Barcelona, Spain,
          <year>2004</year>
          , pp.
          <fpage>74</fpage>
          -
          <lpage>81</lpage>
          . URL: https://aclanthology.org/W04-1013.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Papineni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roukos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ward</surname>
          </string-name>
          , W.-J. Zhu,
          <article-title>Bleu: a method for automatic evaluation of machine translation</article-title>
          , in: P.
          <string-name>
            <surname>Isabelle</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Charniak</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
          </string-name>
          (Eds.),
          <article-title>Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Philadelphia, Pennsylvania, USA,
          <year>2002</year>
          , pp.
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          . URL: https://aclanthology.org/P02-1040. doi:
          <volume>10</volume>
          .3115/1073083.1073135.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>M. L. . Team,</surname>
          </string-name>
          <article-title>The llama 3 herd of models, 2024</article-title>
          . URL: https://arxiv.org/abs/2407.21783. arXiv:
          <volume>2407</volume>
          .
          <fpage>21783</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15] OpenAI, GPT-4
          <source>technical report, CoRR abs/2303</source>
          .08774 (
          <year>2023</year>
          ). URL: https://doi.org/10.48550/arXiv. 2303.08774. doi:
          <volume>10</volume>
          .48550/ARXIV.2303.08774. arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>