<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Usage of Language Model for the Filling of Lacunae in Ancient Latin Inscriptions: A Case Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Brunello</string-name>
          <email>andrea.brunello@uniud.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuela Colombi</string-name>
          <email>emanuela.colombi@uniud.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Locaputo</string-name>
          <email>locaputo.alessandro@spes.uniud.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Magnani</string-name>
          <email>stefano.magnani@uniud.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Saccomanno</string-name>
          <email>nicola.saccomanno@uniud.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Serra</string-name>
          <email>giuseppe.serra@uniud.it</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>This paper investigates the eficacy of LatinBERT in the task of infilling ancient Latin inscriptions. We contrast the baseline LatinBERT model with a version fine-tuned specifically for this task. A comprehensive experimental design evaluates the influence of various lacunae features, such as their length and relative position within the text, on the infilling process. In contrast to the results presented in LatinBERT's original publication, our findings indicate suboptimal performance. Interestingly, a parallel study of Greek inscriptions using models like PYTHIA and Ithaca demonstrated vastly superior performance in similar tasks. This disparity underscores the need for the development of more proficient models tailored for Latin inscriptions. Moreover, our study emphasizes the importance of robust and systematic evaluation methodologies to accurately assess model performance.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Epigraphy</kwd>
        <kwd>Lacunae</kwd>
        <kwd>Latin</kwd>
        <kwd>Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Over the years, many prominent collections of ancient inscriptions, such as the Corpus
Inscriptionum Latinarum, the Corpus Inscriptionum Graecarum, and L’Année épigraphique have
been digitized and collected in digital corpora, among others. Notable examples of such digital
corpora are the EAGLE project (Europeana network of Ancient Greek and Latin Epigraphy)1,
an online corpus that gathers inscriptions from various European epigraphic databases, and
the Cuneiform Digital Library Initiative2 for the preservation of text and images of cuneiform
inscriptions.</p>
      <p>
        This recent increase in the availability of such digitized corpora has enabled the application
of machine learning methods to the field of epigraphy. For instance, PYTHIA [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Ithaca [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
are two neural networks designed for filling lacunae in ancient Greek inscriptions, to expedite
the restoration process by assisting epigraphists .
      </p>
      <p>Models such as PYTHIA and Ithaca emphasize how these types of tools serve as useful
companions to epigraphists, by demonstrating the ability of the models to improve humans’
capabilities.</p>
      <p>
        In light of the success with Greek inscriptions, in this work we focus on Latin ones. Specifically,
we study the capacity of LatinBERT, a BERT-based [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] model trained on Latin, to autonomously
restore lacunae in ancient Latin texts. Through a thorough experimental design, based on
a public dataset of ancient Latin inscriptions, we evaluate how LatinBERT’s performance is
impacted by the inherent characteristics of the lacunae. As we will see, our observed results
are markedly inferior to those presented in the original LatinBERT article, highlighting two
fundamental criticalities: the need for a higher-performing model that can be used to infill Latin
inscriptions; and, the necessity of devising robust and systematic evaluation workflows for the
latter task.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The first specialized neural network designed to aid epigraphists in restoring ancient Greek
inscriptions is PYTHIA [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which utilizes a bi-directional LSTM to produce 20 hypotheses for
iflling the specified lacuna. The same bi-directional LSTM architecture was later employed
in the restoration of Akkadian inscriptions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Both of the aforementioned models require
the epigraphist to specify not only the location of the gaps to be filled but also their
dimensions in characters. To overcome this limitation, the Blank Language Model (BLM)[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a
Transformer-based model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] capable of filling gaps with an arbitrary number of characters,
was introduced. When evaluated on the same dataset used for assessing PYTHIA, BLM
demonstrated similar accuracy.
      </p>
      <p>
        Instead of just focusing on the infilling task, Ithaca [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] addressed the problem together with
two other fundamental tasks in the epigraphist workflow: the temporal and spatial attribution
of ancient Greek inscriptions. Ithaca’s architecture is inspired by BigBird [7] (i.e., another
Transformer-based language model), with its output passed on to three diferent Multi-Layer
1https://www.eagle-network.eu/
2https://cdli.mpiwg-berlin.mpg.de/
Perceptrons, one for each epigraphical task. The model’s Top-1 accuracy (62%) in filling the
lacunae surpassed PYTHIA’s Top-1 accuracy (32%). Moreover, the authors showed that the
best performance could be achieved when employing Ithaca to assist trained epigraphists,
improving their accuracy from 25% to 72%.
      </p>
      <p>
        When it comes to Latin, the only model whose performance has been assessed for the
problem of infilling is LatinBERT [ 8], a BERT-based [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] masked language model pre-trained
on an extensive corpus of 642.7 million Latin words, encompassing texts from the Classical
age to contemporary documents originating from the Latin Wikipedia. In the paper in which
LatinBERT was introduced, one of the case studies considered was the filling of literature
documents extracted from the Latin Library3. Unlike the other previously mentioned models,
the performances of LatinBERT were not assessed by artificially creating the gaps and comparing
them with the model’s predictions but rather by comparing the concordance of the model’s
predictions to the emendation made by an epigraphist, in which it scored a Top-1 accuracy of
33.1%.
      </p>
      <p>In this regard, our experimental workflow is radically diferent. First, we show how LatinBERT
performance, when tested on the same setting as the other previously described approaches,
becomes unsatisfactory for the filling of ancient inscriptions. Then, we fine-tune a LatinBERT
model by focusing precisely on ancient Latin inscriptions infilling, and finally, we study its
performance and how the latter changes when dealing with diferent types of lacunae.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>The dataset used for our experiments has been obtained from the Epigraphik-Datenbank
Clauss/Slaby (EDCS) 4, the most comprehensive collection of ancient inscriptions from the
Roman Empire. It also includes information from 45 external corpora, including the Corpus
Inscriptionum Latinarum and the inscriptions that are part of the EAGLE project, for a total of
over 537,000 inscriptions.</p>
        <p>Most of the inscriptions retrieved from EDCS are marked up according to a custom notation
that slightly difers from the standard Leiden Conventions [ 9], a set of rules and symbols used
by corpora’s editors to annotate inscriptions [10]. This markup includes the expansion of
abbreviated words, restoring erroneously omitted characters, and proposing missing letters. As
a result, the inscriptions underwent filtering, which involved discarding the empty inscriptions
and the repeated ones, and they were also cleaned of such notation. This resulted in a total of
211,601 cleaned inscriptions. During this process, due to the scarcity of data and the lack of
ground truth, we decided to retain all the emendations proposed by the editors, including the
integration of some of the lacunae.</p>
        <p>The preprocessed dataset was subsequently divided into three subsets: a training set, a
3http://thelatinlibrary.com/
4https://db.edcs.eu/</p>
        <p>Test set
validation set, and a test set, with a split of 60% for the training set and 20% each for the
validation and test sets. For the experiments, in the test set, only the inscriptions with a number
of tokens that fall between the first and third quartiles are considered (Figure 1), resulting in a
total of 22,926 inscriptions. This is done to ensure a balanced and representative sample that
avoids extreme outliers.
3.2. Model
The training data served for our realization of LatinBERT-epi, a specialized version of LatinBERT
ifne-tuned specifically for the infilling of lacunae in ancient Latin inscriptions. The model has
undergone fine-tuning for 15 epochs, determined by its performance on the validation set,
with an early stopping patience of 5 epochs and a learning rate of 1e-5. This fine-tuning was
conducted on a single NVIDIA RTX A5000 GPU with 24GB of VRAM. Meanwhile, the test set
mentioned above, limited to inscriptions within the first and third quartiles, was again used for
establishing its final performance.</p>
        <p>It is important to note that while the results of our experiments are significant, they may not
be directly comparable to those of Ithaca due to diferences in fundamental units of operation:
LatinBERT operates on tokens corresponding to sub-words, whereas Ithaca predicts at the
character level. On top of that, Ithaca requires the epigraphist to specify both the exact number
and position of the missing characters and utilizes this information to generate predictions of the
requested size exclusively. In contrast, LatinBERT only necessitates knowledge of the positions,
without regard for the length of the lacunae. This can be a disadvantage as its predictions may
not always match the size of the gap.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>In the following experiments, to better reflect real-world scenarios, we refrain from masking
entire words and instead focus on sub-word tokens. This decision is based on the understanding
that erosion typically afects portions of inscriptions rather than removing entire words. In
1
2</p>
      <p>Hic requiescit in pace
Paulus vir laudabilis servus dei miles
de numero Zaliorum qui vixit annis
plus minus XL depositus est
in pace sub die tertium Kalendas Februarias
per indictionem XI
hic requiesc it in pace
paulus vir laudabili s servus dei miles
de numero za liorum qui vixit annis
plus minus xl depositu s est
in pace sub die tertium kalendas februari as
per indict ionem xi
hic requiesc it in pace
paulus vir laudabili s servus dei miles
de numero za liorum qui vixit annis
plus minus xl depositu s est
in pace sub die tertium kalendas februari as
per indict ionem xi</p>
      <p>Hic requiesc&lt;i=E&gt;t in pace
Paulus v(i)r l(audabilis) ser&lt;v=B&gt;us d(e)i
mil&lt;es=IX&gt;
de num(ero) Zal(iorum) qui &lt;v=B&gt;ixit annis
plus minus XL depositus est
in pace su&lt;b=D&gt; die tertiu(m) Kal(endas)
Februar(ias)
per ind(ictionem) XI
hic requiesc it in pace
paulus vir laudabili s servus dei miles
de numero za liorum qui vixit annis
plus minus xl depositu s est
in pace sub die tertium kalendas februari as
per indict ionem xi
3
4
hic requiesc it in pace
paulus vir laudabili s servus dei miles
de numero za liorum qui vixit annis
plus minus xl depositu s est
in pace sub die tertium kalendas februari as
per indict ionem xi
PRON VERB VERB ADP NOUN
hic requiesc it in pace</p>
      <p>NOUN NOUN ADJ ADJ NOUN ADJ NOUN
paulus vir laudabili s servus dei miles
ADP NOUN NOUN NOUN PRON VERB NOUN
de numero za liorum qui vixit annis
ADV ADV NUM VERB VERB AUX
plus minus xl depositu s est
ADP NOUN ADP NOUN ADJ NOUN ADJ ADJ
in pace sub die tertium kalendas februari as
ADP NOUN NOUN NUM
per indict ionem xi</p>
      <p>Experiment 4.1, we assess the performance of the fine-tuned model compared to the base
model by applying the same Masked Language Model objective used during pre-training. In
Experiment 4.2, we analyze how model accuracy varies based on the location of the lacuna
within the inscription. In Experiment 4.3, the models are evaluated based on the number of
characters that make up the lacuna. Finally, in Experiment 4.4, we study the performance of the
models by masking tokens according to their POS (part-of-speech) tags in the sentence.</p>
      <sec id="sec-4-1">
        <title>LatinBERT-base</title>
      </sec>
      <sec id="sec-4-2">
        <title>LatinBERT-epi</title>
        <p>0.0242 ± 0.0008
0.0402 ± 0.0007
0.0628 ± 0.0012
0.0832 ± 0.0003
0.1189 ± 0.0016
0.1547 ± 0.0010</p>
        <sec id="sec-4-2-1">
          <title>4.1. First experiment: Mask 15% of the tokens</title>
          <p>To compare the performance of the LatinBERT base model (from now on referred to as
LatinBERT-base) and the one fine-tuned on the inscriptions, we evaluated both by applying the
same MLM (Masked Language Model) objective used in the pre-training phase. Specifically,
we masked 15% of the tokens in each inscription and measured the accuracy at 1, 10, and 50.
As shown in Table 1, the performances of LatinBERT-base on inscriptions are far from those
presented in the original paper, where the reported Top-1 accuracy was equal to 33.1%. This may
be due also to the fact that LatinBERT-base is predominantly pre-trained on literary documents.
Although these documents are written in a language similar to that of the inscriptions, they
exhibit a diferent syntactic structure, which is less strict than the one found in inscriptions. This is
the reason why we also considered LatinBERT-epi which nevertheless, although outperforming
the base model, still exhibits lower accuracy than expected.
4.2. Second experiment: Lacunae occurring in diferent positions
Lacunae can occur in any part of the text and can spread for any given length. Considering
this, the two models are here evaluated in three diferent scenarios: when the gap occurs at
the beginning, in the middle, or at the end of the text. For each scenario, we masked
consecutive spans of tokens equal to 10%, 20%, and 30% of the total number of tokens of each inscription.</p>
          <p>When examining the results in Table 2, it is important to consider that the set of inscriptions
used to evaluate the models contains short inscriptions (Figure 1). To ensure that at least one
token per text is masked, the number of tokens to be masked has been calculated as the ceiling
of the specified percentage. Thus, in many cases, masking 10% of the tokens results in the same
number of tokens being masked as when masking 20% of them.</p>
          <p>It should also be noted that inscriptions are often highly formulaic. For instance, in funerary
inscriptions, it is very common to begin with ‘Dis Manibus’5. Since this kind of inscription is
prevalent in the dataset and typically quite short, it helps explain why the performance of the
ifne-tuned model is best when masking only a few tokens at the beginning. Meanwhile, the
lowest overall performance reported for LatinBERT-epi is when the lacuna lies in the middle
part. This can be attributed to the fact that, unlike the beginning, the middle part contains
higher variability even when in formulaic inscriptions, for instance in funerary ones it is where
the name of the deceased is mentioned, which is very challenging for the model to predict.</p>
          <p>It is surprising to see that both Top-10 and Top-50 accuracy for LatinBERT-epi are higher
when the lacunae occur at the end of the text than when they occur in the middle. In the former
5It translates to: ‘to the spirits of the dead’
%
10
20
30
10
20
30
10
20
30
case, the context is limited to just one side, while in the latter case, the model can take advantage
of context on both the left and right.
4.3. Third experiment: Mask tokens of diferent length
Intuitively, a model should find it easier to fill single small gaps rather than long ones. To
evaluate this aspect, we mask tokens based on their length, ranging from single-character
tokens to tokens with a length of 9 characters (Table 3). For each token length, the metrics are
computed using a subset of the testing set, consisting of inscriptions that contain at least one
token of that length.</p>
          <p>This experiment highlights the dificulty of the model to correctly predict tokens of length
equal to 5. This can be put into context by looking at the number of unique tokens per token
length (Figure 3a). It can be noticed that tokens with a length equal to 5 are among those that
present a very high variability in the dataset, and thus are the hardest to predict.
4.4. Fourth experiment: Mask according to the PoS tag
For this experiment, the accuracy is measured according to the Part-of-Speech role of the
masked token. Thus, to distinguish between the diferent roles of each token, it is necessary to
train an additional model for this sole purpose.</p>
          <p>The Part-of-Speech (PoS) tagging of the test set was performed using a specialized version
of LatinBERT, fine-tuned specifically for the PoS tagging task 6. This model was trained on
18,184 tokens7 from the Perseus Latin Treebank [11], a corpus comprising Classical Latin texts
sourced from the Perseus Digital Library [12]. It is worth noting that while the Perseus Digital
6Diferently from the pretraining phase of LatinBERT that is done in an unsupervised manner, the fine-tuned model
produced is a classifier trained in a supervised manner.
7Each of these tokens has been manually tagged with the corresponding PoS tag, which serves as the ground truth
for the classifier.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>PRON ADV ADJ</title>
      </sec>
      <sec id="sec-4-4">
        <title>VERB</title>
      </sec>
      <sec id="sec-4-5">
        <title>NOUN DET</title>
      </sec>
      <sec id="sec-4-6">
        <title>PUNCT</title>
      </sec>
      <sec id="sec-4-7">
        <title>SCONJ</title>
      </sec>
      <sec id="sec-4-8">
        <title>CCONJ</title>
      </sec>
      <sec id="sec-4-9">
        <title>PROPN ADP</title>
      </sec>
      <sec id="sec-4-10">
        <title>PART</title>
        <p>NUM
AUX</p>
      </sec>
      <sec id="sec-4-11">
        <title>INTJ</title>
        <p>Library contains contemporaneous documents related to the considered inscriptions, it primarily
consists of literary sources with distinct syntactic structures, which becomes evident when
comparing the frequency of occurrence of each PoS tag of the two (Figure 3b). Consequently,
the PoS tagging accuracy of the model for inscriptions may be lower than the 94.3% reported in
the LatinBERT paper for Classical documents.</p>
        <p>In Table 4, the lowest performance is observed when masking coordinating conjunctions
(CCONJ), subordinating conjunctions (SCONJ), particles (PART), and interjections (INTJ). One
possible explanation for this is their infrequent use in inscriptions, which prevents the fine-tuned
model from learning how to correctly fill them.</p>
        <p>The only PoS tag for which the base model outperforms the fine-tuned one is the prediction
of auxiliary verbs (AUX). This can be attributed to the fact that inscriptions typically prioritize
brevity and conciseness, resulting in limited usage of auxiliary verbs. Another challenging task
for both models is predicting numerals (NUM) because, similar to proper nouns, they can be
dificult to infer from the context, as often there are multiple solutions that, while not correct,
still make perfect sense.</p>
        <p>Overall, LatinBERT-epi’s accuracy is higher than the base model for each PoS tag, highlighting
and confirming the diferent syntactical structures between inscriptions and literary documents.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>As hypothesized, Latin used in literature documents, which has been relied upon for the
pre-training of LatinBERT greatly difers from the Latin that appears in ancient inscriptions,
both due to a diferent syntactic structure and the evolution that the language has witnessed
over the centuries. Thus, it should not entirely come as a surprise that the performances of the
base model are lower than the ones reported in the original LatinBERT’s paper, nevertheless, the
ifne-tuned model, while improving the accuracy, still reported underwhelming performances,
especially when compared to PYTHIA and Ithaca results. In light of this, it is important to point
out the way in which LatinBERT evaluation was conducted: the authors did not randomly
mask parts of the text, but they rather measured the concordance of the model’s predictions
with epigraphists emendations; for doing so they restricted to those inscriptions where there is
a single emendation made of a single word of at least two characters, thus Experiment 4.3 is the
one closer to their experimental setting. However, it is important to notice that, when using
PYTHIA and Ithaca, the epigraphist has to specify which characters (their number and position)
the model has to predict, thus providing the model additional information regarding the
characteristics of the lacunae. This is not the case with LatinBERT, where the only information
provided by the epigraphist is about the location of the lacunae.</p>
      <p>The experiments did not uncover a specific aspect in which LatinBERT is lacking but rather
showed consistent dificulties in correctly filling the gaps. Given the lower-than-expected
performance of our model and the fact that many papers in this field often emphasize
collaboration with epigraphists rather than in-depth analysis of model performance, we recognize the
importance of establishing a well-defined pipeline of experiments to assess language models’
accuracy in filling lacunae and to develop a model based on Ithaca’s architecture also for Latin.</p>
      <p>We believe that the pipeline should possess at least the following requirements:
• It must consider the various positions where inscriptions can occur, given that inscriptions
are often highly formulaic. Consequently, certain parts may be easier to predict than
others, as emerged in Experiment 4.4.
• It must diferentiate favor models with a higher Top-10 accuracy than those with a
higher Top-50 accuracy since these tools are expected to be used in conjunction with an
epigraphist which has to evaluate every prediction of the model.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In this work, we presented a fine-tuned version of LatinBERT for filling lacunae in Ancient
Latin inscriptions and then evaluated it by comparing its performance to the baseline
LatinBERT model in the task of filling the lacunae without human intervention in diferent
scenarios, analyzing how the features of the inscriptions afect the model’s predictions. The
experiments highlighted the suboptimal performances of LatinBERT in this task, which, when
compared to the results showed by PYTHIA and Ithaca with ancient Greek inscriptions,
underscores the necessity of establishing a comprehensive and standardized set of experiments
to more accurately assess the performance of these models and the need of a more proficient
Latin-specific model.</p>
      <p>The remark made by PYTHIA and Ithaca about involving domain experts to better evaluate
these models still remains valid, although it should be considered that this is something that is
not always feasible, especially for the less-spoken languages.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by the Department Strategic Plan (DSP) of the University of
Udine—Interdepartmental Projects: Artificial Intelligence, Artificial Intelligence for Cultural
Heritage (AI4CH); PRIN 2022 - Project code: 2022YTE579.
[7] M. Zaheer, G. Guruganesh, K. A. Dubey, J. Ainslie, C. Alberti, S. Ontañón, P. Pham,
A. Ravula, Q. Wang, L. Yang, A. Ahmed, Big bird: Transformers for longer sequences,
in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural
Information Processing Systems 33: Annual Conference on Neural Information Processing
Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL: https://proceedings.
neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html.
[8] D. Bamman, P. J. Burns, Latin BERT: A Contextual Language Model for Classical Philology,
2020. arXiv:2009.10053.
[9] C. Bruun, J. Edmondson, The Oxford handbook of Roman epigraphy, Oxford University</p>
      <p>Press, 2014.
[10] J. Flanders, C. Roueché, Introduction to epidoc guidelines, 2006. URL: https://epidoc.stoa.</p>
      <p>org/gl/latest/intro-eps.html, accessed on October 17, 2023.
[11] D. Bamman, G. Crane, The Design and Use of a Latin Dependency Treebank (2006).
[12] Perseus Digital Library, Perseus digital library, 2023. URL: http://www.perseus.tufts.edu,
accessed on September 19, 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Assael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sommerschield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Prag</surname>
          </string-name>
          ,
          <article-title>Restoring ancient text using deep learning: A case study on Greek epigraphy</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>6367</fpage>
          -
          <lpage>6374</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1668.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Assael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sommerschield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shillingford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bordbar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pavlopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chatzipanagiotou</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Androutsopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Prag</surname>
          </string-name>
          , N. de Freitas,
          <article-title>Restoring and attributing ancient texts using deep neural networks</article-title>
          ,
          <source>Nature</source>
          <volume>603</volume>
          (
          <year>2022</year>
          )
          <fpage>280</fpage>
          -
          <lpage>283</lpage>
          . doi:
          <volume>10</volume>
          .1038/ s41586-022-04448-z.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fetaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lifshitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Aaron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gordin</surname>
          </string-name>
          ,
          <article-title>Restoration of fragmentary Babylonian texts using recurrent neural networks</article-title>
          ,
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>117</volume>
          (
          <year>2020</year>
          )
          <fpage>22743</fpage>
          -
          <lpage>22751</lpage>
          . doi:
          <volume>10</volume>
          .1073/pnas.2003794117.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Quach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Barzilay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jaakkola</surname>
          </string-name>
          ,
          <source>Blank Language Models</source>
          ,
          <year>2020</year>
          . arXiv:
          <year>2002</year>
          .03079.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>