<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Dating of Medieval Charters from Denmark⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>University of Copenhagen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>paggio}@hum.ku.dk</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Malta</institution>
        </aff>
      </contrib-group>
      <fpage>58</fpage>
      <lpage>72</lpage>
      <abstract>
        <p>Dating of medieval text sources is a central task common to the field of manuscript studies. It is a dificult process requiring expert philological and historical knowledge. We investigate the issue of automatic dating of a collection of about 300 charters from medieval Denmark, in particular how n-gram models based on diferent transcription levels of the charters can be used to assign the manuscripts to a specific temporal interval. We frame the problem as a classification task by dividing the period into bins of 50 years and using these as classes in a supervised learning setting to develop SVM classifiers. We show that the more detailed facsimile transcription, which captures palaeographic characteristics of a text, provides better results than the diplomatic level, where such distinctions are normalised. Furthermore, both character and word n-grams show promising results, the highest accuracy reaching 74.96 %. This level of classification accuracy corresponds to being able to date almost 75 % of the charters with a 25-year error margin, which philologists use as a standard of the precision with which medieval texts can be dated manually.</p>
      </abstract>
      <kwd-group>
        <kwd>Automatic dating</kwd>
        <kwd>Medieval charters</kwd>
        <kwd>Language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Dating of medieval text sources is a central task common to the field of manuscript
studies. The majority of medieval manuscripts are without explicit reference to
the time and place they were produced and by whom they were written. This
knowledge, however, is crucial in order to interpret the content and context of a
source. For example, philological research on historical text is very dependent on
correct interpretation of word forms, which is only possible when knowing the
⋆ This study was conducted at the University of Copenhagen within the
project Script and Text in Time and Space, a core group project supported
by the Velux Foundations. A general description of the project is
available from
https://humanities.ku.dk/research/digital-humanities/projects/writingand-texts-in-time-and-space/. We thank the project, in particular Alex Speed
Kjeldsen, for making the data available to this study.
origin of the given source. The dating of medieval texts is often a long and
laborious process requiring expert philological and historical knowledge. Introducing
automatic methods to facilitate this process, therefore, is a valuable efort.</p>
      <p>Dating may rely on a range of diferent criteria, including characteristics of
the handwriting in a document, the material state of the parchment or paper,
reference to historical events in the manuscript, linguistic evidence, etc.
Generally, precise dating is very dificult to achieve, and an error rate of 25 years is
considered acceptable.</p>
      <p>In this paper, we investigate the issue of automatic dating of charters from
medieval Denmark, in particular how n-gram models based on diferent
transcription levels of the charters can be used to assign the manuscripts to a specific
temporal interval.</p>
      <p>While seeking to develop knowledge on how far Natural Language Processing
(NLP) methods can take us in attacking the problem of medieval manuscript
dating, we also want to determine how diferent levels of transcription produced
according to recommended philological standards contribute to this task. In
particular, we will look at two levels of transcription, namely (i) a facsimile
transcription in which variations in handwriting are represented, and (ii) a diplomatic
transcription in which such diferences are normalised, but where diferences in
spelling are still present. To the best of our knowledge, this is the first attempt at
capitalising on the use of diferent philological transcription levels for automatic
dating of documents.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>Previous attempts at automatic dating of medieval charters fall into two basic
groups depending on whether they use visual features of the printed materials
obtained through image processing, or whether they use language models. In a
few cases, a combination of visual and language features is used. An additional
and orthogonal distinction concerns whether the task is approached as
continuous dating along the timeline or classification into a number of time intervals.</p>
      <p>
        Visual features capturing the strokes of handwritten characters were used for
instance in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for the automatic dating of a collection of 1,706 medieval
documents from the Dutch language area. Dating was treated as a regression
problem in the former study, and as classification in 25-year intervals in the
latter, which reports a mean absolute error of 20.9 years for the whole dataset.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] visual features were extracted to train various models, in
particular regression models as well as Convolutional Neural Networks (CNN)
for continuous dating of medieval charters from the Svenskt Diplomatariums
Huvudkartotek (SDHK) collection, which contains over 11,000 charters in Latin
and Swedish, of which about 5,300 are transcribed. These studies report absolute
errors of 18.3 and 36.8 years at the 50th and 75th percentiles, respectively, for a
Support Vector Regression (SVR) model. This corresponds to classifying 50 %
of the dataset with an error of 18:3 years and 75 % with an error of 36:8
years. For a CNN model, the absolute error is of 10 and 22 years at the 50th
and 75th percentiles, respectively.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], visual features were combined with language models to train several
Gaussian Mixture Models (GMM) for the same regression task. While the visual
features model the changes in pen stroke over the years, the language features
are character n-grams aiming at capturing changes over time of short character
sequences. The combined image and language model performs with an absolute
error of 12 years for 50 %, and 22 years for 75 % of the dataset, and constitutes an
improvement compared to similar GMM models only trained on visual features.
      </p>
      <p>In NLP research, dating of documents is usually approached as a temporal
document classification task. In contrast with the studies mentioned above,
visual features extracted from physical texts are ignored and instead the various
approaches try to capitalise on the way the lexicon, the morphology or the
syntax of a language changes over the years. The evaluation measures reported in
NLP classification studies do not generally refer to error measures, but rather to
precision or accuracy relative to diferent granularities of the temporal intervals
(or bins) used, and compared to a more or less naive baseline.</p>
      <p>
        An example of methods based on lexical knowledge is presented in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], where
temporal text classification is based on change in term usage, while [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] used the
Google Books Ngram corpus to identify neologisms and archaisms for the dating
of French journalistic texts. Similarly, in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] the same lexical resource was used
to assign political terms to temporal epochs of varying length depending on their
usage change. Stylistic features such as average sentence and word length, lexical
density, and lexical richness were used in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] for the temporal classification of
Portuguese historical texts.
      </p>
      <p>
        A diachronic text evaluation task [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] was proposed as part of the SemEval
2015 initiative. The task consisted in the temporal classification of newspaper
text snippets from 1700 to 2010 into time intervals of diferent sizes. The best
model was a multiclass Support Vector Machine (SVM) classifier using stylistic
features such as character, word and part of speech (POS) tag n-grams, but also
external estimates from the Google syntactic n-gram database, and achieved an
accuracy of 54.3 % on the 20-year interval classification task [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Word n-grams
of order 1–3 with and without their POS tags were also used in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to train
models for the temporal classification of Portuguese historical documents from
the period 16th to early 20th century. The study reports an accuracy of 74.1 %
for the best SVM classifier obtained in the task of temporal classification in
100-year temporal bins.
      </p>
      <p>
        Character n-grams were used in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to calculate the distance between
historical varieties of Portuguese. The authors argue that character sequences capture
not only morphological and lexical, but also phonological diferences between
language varieties. Interestingly for our purposes, the study experiments with
two diferent styles of transcription of the original texts, and shows that the best
results in distinguishing historical variants are obtained with the transcriptions
that preserve the original spelling instead of standardising it.
      </p>
      <p>To sum up, language features, in particular word and character n-grams, have
been applied with a certain degree of success to temporal text classification of
relatively modern text collections and to quantify the diference between
historical texts of diferent periods, but not to any large extent to the specific task of
medieval manuscript dating. Some evidence, however, has been presented that
they may contribute a useful addition to image-based features for that task. Our
goal in this paper is to provide additional evidence in this direction.
3</p>
    </sec>
    <sec id="sec-3">
      <title>The charters of St. Clara Convent</title>
      <p>
        The study revolves around the collection of charters that belonged to St. Clara
Convent outside the city of Roskilde in Denmark. The charters document the
property and status of the convent and they date from when it was founded in
1256 till it was closed after the Reformation. In 1561 the properties and buildings
of St. Clara became part of the University of Copenhagen and so did its archives.
The collection of charters is now part of the Arnamagnaean Collection [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The St. Clara Convent archive contains 471 charters in total. They are written
in several languages, most of them in Latin and Danish, and a few in Swedish
and Low German. Most of the charters are originals, handwritten on parchment
and with a seal attached, while others are copies of original charters and are
handwritten on paper.</p>
      <p>
        Most of the original charters can be time-stamped, either from explicit dates
in the text or indirectly from knowledge about the scribe and the persons
mentioned. The copies are more dificult to date: They do not have an explicit time
stamp from when they were written and since the content is a copy of earlier text,
the historical context cannot provide the dating of the charter either. Instead
knowledge about spelling variation and palaeographic diferences, historical
linguistics, or material evidence about paper and ink, may be used to assign a
possible date [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        The charters of the collection are being prepared for a digital scholarly
edition. First, digital photographs of the handwritten manuscripts are produced.
Then, the charters are transcribed at diferent levels of detail, namely
facsimile, diplomatic, and normalised levels. These three levels of transcription are
recommended by Menota (Medieval Nordic Text Archive) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] as means of
encoding medieval manuscripts through text representations as close to the original
manuscript as possible. In the facsimile transcription, palaeographic
characteristics, in other words diferences due to handwriting, are encoded. For example,
diferent ways of writing a ’d’ are represented by distinct characters (e.g., ’d’
or ’ꝺ’) and diferent types of abbreviating diacritics are preserved. In the
diplomatic transcription such diferences are normalised so that characters that are
not phonologically contrastive will be unified to a single character at this level
of transcription. Furthermore, all abbreviating symbols are expanded. Finally,
at the normalised level spelling variation is reduced to a common standard. In
addition to the three levels of transcription, all text in the charters will be
lemmatised.
      </p>
      <p>To illustrate the diferences between the diferent levels of transcription,
consider the first line of the text in Figure 1(b):
(a) ſoꝛoꝝ ⁊ onafteɼí ear u ı pofteru
(b) soror(um) (et) monasterij earu(m) in posterum
(c) sororum et monasterii earum in posterum</p>
      <p>In the facsimile transcription (a) diferent abbreviations are annotated, e.g.,
the Tironian et, ⁊, which was used by scribes in the medieval period, and diferent
allographs are represented by diferent characters, e.g., í vs. ı, and  vs. .
At the diplomatic level (b) the abbreviations have been expanded, marked by
parentheses, and the allographic variation has been normalised and, thus, we only
ifnd one type of i and only one m. In the normalised transcription (c) spelling is
standardised. In this particular example this only includes the spelling of final i
as j in monasterij.</p>
      <p>So far 293 charters have been transcribed at the facsimile level and a
diplomatic transcription has been generated automatically. Out of these, 291 are
originals and are dated either through explicit dating or based on the content of
the text. Two of the charters are copies of original manuscripts and have not yet
been dated. One of these originals is among the transcribed documents, while
the other is not known. The 291 transcribed and dated charters will constitute
the dataset of this article. Using both the facsimile and the autogenerated
diplomatic level of the dated originals, we wish to test how these diferent levels of
transcription (capturing spelling variation and palaeographic diferences) can be
used to model the production date of the medieval charters.</p>
      <p>To give an idea of the diference between the facsimile and diplomatic
transcriptions of the charters in terms of how much the variation is reduced across
the two transcription levels, in Table 1 we report token counts related to the
two levels for the Latin documents and those in Danish. From the counts it can
be calculated that the reduction in word token counts is of 14 % for the Latin
manuscripts and 3 % for the Danish ones, while it amounts to 43 % for Latin
and 42 % for Danish when we look at character token counts.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>
        In this study, dating of documents is dealt with as a classification task in which
the charters are classified as belonging to a time interval, or bin. Two sets of
bins are considered: dividing the documents into (i) four classes of 100 years
(corresponding to the 13th, 14th, 15th, and the 16th centuries), and (ii) eight
classes of 50 years each (two classes pr. century, i.e., 1250-1300, 1300-1350, ...,
1550-1600). The division of the timeline into periods is naive in the sense that
the boundaries are not based on any knowledge about linguistics or historical
periods; it is simply a division of the timeline into series of bins of equal size.
Furthermore, framing the problem as a classification task is a simplification,
since it makes no assumptions about documents from two time spans close to
each other being closer than two documents belonging to time spans further
away. However, if accurate, such a method would still be useful in providing
an approximate assessment of their possible date of production. For instance,
classifying the documents in 50-year bins can be seen as a way to date the
documents with a 25-year error margin by assigning all the documents belonging
to one category the median year of the range. We also performed classification
in 100-year bins, in spite of it being very coarse-grained, to position our work
against results from the literature, in particular [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>A number of classification experiments are reported here. In all of them,
each of the charters is represented as a vector, the values of which correspond
to the frequency in the charter of either word or character n-grams of order
1-3. Diferent experiments are run with n-grams extracted from the facsimile
and diplomatic levels, and we also tried combining unigrams and bigrams with
trigrams, again separately for the two transcription levels.</p>
      <p>
        SVMs were used to classify the charters. First of all, SVMs are known to
work well with sparse representations, which are a potential problem when using
n-grams of a larger order together with a relatively small dataset. Secondly,
when applied to document classification tasks such as the identification of similar
languages (e.g., discriminating between Dutch and Flemish) this model provides
state-of-the-art results [
        <xref ref-type="bibr" rid="ref19 ref9">9, 19</xref>
        ]. The task of document dating is somewhat related
to this task if one considers the stages of developments of a language to be similar
to dialects or very closely related languages.
      </p>
      <p>When carrying out the experiments, two baselines are considered. The first
one always chooses the most frequent class, which has slightly diferent
likelihoods of being correct depending on the size of the time spans. The second one
picks the most frequent class for each of the languages. Here we chose to group
the Swedish and Low German together with the Danish as one group. Since
the two language groups considered (Latin and Danish, misc) are not equally
distributed, the average accuracy of this baseline is a weighted mean of the
accuracy that would be reached for each language separately. Again, two diferent
measures are obtained depending on the intervals chosen. The reason why we
chose to add the second baseline is due to the fact that the distribution of the
documents over languages correlates with time (as we will see in the following
section). Thus, in order to control that the model is not ’only’ performing language
classification, we use this baseline to test whether the models can outperform it.
The reader should keep in mind that we do not provide explicit knowledge of
the source language of a charter to the models that we train in the next section.
That knowledge is only available implicitly through the text given as input.</p>
      <p>In all the experiments 10-fold cross validation was used to evaluate the
different models.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Dataset</title>
      <p>As mentioned earlier, the dataset used in this study is constituted by the 291
charters from the collection that have been transcribed so far. Two diferent
levels will be included: (i) the facsimile transcription, where allographic variation
is annotated, and (ii) the diplomatic transcription, where it is normalised, while
spelling variation is still maintained. The dataset contains documents in the four
languages represented in the collection. The distribution of the charters amongst
the diferent languages can be seen in Table 2.</p>
      <p>In Figures 2 and 3 the charters are grouped into bins of 50 and 100 years,
respectively. As can be seen, the Latin documents are most dominant in the 13th
80
s
tre60
r
a
h
c
fo40
r
e
b
m
u20
N
0
100
rs 80
e
t
r
cah 60
f
o
re 40
b
m
uN 20</p>
      <p>0
Fig. 3: Plot of the distribution of the charters in 100-year bins.
and 14th centuries, whereas there is a shift during the 15th century to documents
being written in Danish. The two Low German documents are from 1350-1400
and from 1400-1450, while the two Swedish ones are from 1500-1550.
Furthermore, since there are more Latin documents than Danish, the total distribution
of documents over time is skewed such that documents from the middle of the
13th century to the end of the 14th century constitute almost 50 % of the
documents in total (see the cumulative proportions in Table 3).</p>
      <p>All the transcriptions were preprocessed by removing all dates and adding
additional document start and end symbols to preserve this positional information
when representing the documents as n-gram counts.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Results and discussion</title>
      <p>Two baselines were chosen, as already mentioned, one always choosing the most
frequent class, and the other also relying on knowledge of the language groups.
Table 4 shows the accuracy and weighted F1 score reached by these baselines
depending on the bin size. Whilst the accuracy is the proportion of correct
predictions, the weighted F1 score is based on precision and recall, and is computed
by taking the average F1 score of the predicted classes and multiplying it by the
proportion of supporting instances. We chose this measure instead of a simple
F1 score to account for the imbalance between the classes.</p>
      <p>A total of 32 diferent experiments were run, 16 for each transcription level.
We trained models using unigrams, bigrams and trigrams, as well as a
combination of unigrams, bigram and trigrams, built from characters and from words,
and including labels for the two diferent time interval sizes. The results for
accuracy are displayed in Tables 5 and 6, while the results for the weighted F1
scores are displayed in Tables 7 and 8. Tables 5 and 7 show the results obtained
using the facsimile transcription and Tables 6 and 8 show those relating to the
diplomatic level. The accuracy and weighted F1 scores correspond to the average
scores obtained by the classifier over the 10 folds in each experiment.</p>
      <p>In the tables, the highest accuracy and F1 score have been highlighted for the
word and character models respectively. The models at the facsimile and
diplomatic levels for the 50-year bins show the same pattern for both accuracy and F1
scores: In the character models, the scores increase when increasing the order of
the n-gram. In word models, in contrast, the scores decrease when increasing the
order of the n-gram. One possible explanation of why this happens is that the
dimension of the vector representations built over word n-grams is too high, and
therefore causes the models to overfit to the training data. Compared to the
facsimile character model that has 269 unique unigrams and 29,387 trigrams (159
and 12,384, respectively, using the diplomatic transcription), the word model has
21,704 unique unigrams and 64,363 trigrams (19,595 and 59,400, respectively,
using the diplomatic transcription). One possible way of circumventing this issue
could be to perform some type of feature selection on the input features. In this
way one would be able to reduce noise at the same time as reducing the high
dimensionality of the input. Such a reduction in dimensionality would also limit
the sparseness of the input space compared to the amount of data available to
the models.</p>
      <p>In general the models using the facsimile transcription exhibit higher
accuracy and F1 scores than the corresponding models using the diplomatic
transcription. When using only single character counts, for example, we reach an
accuracy of 70 %, compared to 60 % with the same model using the diplomatic
level. The fact that using facsimile transcription yields more accurate results
than relying on the diplomatic transcription, confirms our expectations given
the knowledge we have of the importance of palaeographic diferences for the
dating of medieval text. However, when increasing the order of the n-gram for
the character models the diference becomes smaller. It would be interesting to
compare what patterns the models trained on diferent transcription levels
actually capture. If the character models at the diplomatic level account for variation
in spelling, it makes sense that the models would need higher order n-grams in
order to capture the context of diferent character sequences. However, if the
models at the facsimile level capture shifts in character inventories, the context
of the individual characters might be of less importance.</p>
      <p>
        With the exception of the trigram word models, all the experiments yield
higher accuracy and F1 scores than the two baselines. This suggests that the
proposed models not only learn the temporal distribution of the documents over
time and languages, but that they are also able to model more fine-grained
temporal diferences. It is also interesting to observe that the best results obtained
on the 100-year bins are in line with, or for most models above, the 74.1 %
state-of-the-art accuracy reported for a similar task [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Furthermore, although
there is a decrease in accuracy when going from 100 to 50 years, our best models
trained on the facsimile transcription still perform in line with or slightly better
than the state-of-the-art. However, none of the models manages to predict the
date of a document with a 25-year error margin with 100 % accuracy. At best,
the character models using the facsimile transcription were able to correctly
predict almost 75 % of the charters with a 25-year error margin. As was discussed
previously, treating dating as a classification problem, in which time is viewed
as a finite number of distinct classes, may yield misleading results. For example,
if a charter from the very end of the 15th century were assigned to the 1500-1550
time span, such a prediction would, within a classification framework, yield an
accuracy of 0 % just as would be the case if a charter from the 13th century were
assigned to the same time span. In the former case, however, the absolute error
would only be of 26 years.
      </p>
      <p>Figure 4 shows confusion matrices of the cross validation errors for the
highlighted models from Tables 5–8. This provides a more fine-grained view of the
type of errors the models make in their predictions. The rows of the matrices
represent how the documents belonging to a specific time interval were classified by
the model. The numbers in the cells specify what proportion of those documents
was correctly classified and what proportion was misclassified as belonging to
other time spans. The individual time spans are temporally ordered along this
axis. Thus, the cells on the diagonal represent the correctly classified documents
within the diferent categories, and temporally close time spans are also closer to
each other in the matrix. Firstly, a general trend across the matrices is that even
when the models make a wrong prediction, in most cases, it is still a qualified
guess. In fact, wrongly classified documents are mostly assigned to a time span
close to the correct one. Secondly, when considering the numbers in the
diagonal, it can be seen that the models have higher accuracy scores for the earlier
time intervals. This is likely to be a consequence of the dataset composition, in
that there are more examples from the early periods in the dataset compared to
the later ones (see Table 3). Thus, while 88 % of the documents from the period
1250-1300 were correctly assigned to their time bin, this was only true for half
of the documents from the period 1500-1550. Furthermore, none of documents
from the period 1550-1600 was categorised correctly, but then only two charters
from this period were present in the dataset used in this study.
0 0 0 0 0
0 0 0 0 0
0.72 0.28 0 0 0
0.20 0.73 0.08 0 0
0.03 0.28 0.59 0.10 0
0 0.08 0.32 0.52 0.04
0 0 0 1.00 0
(a) 1-3-gram char model (facsimile)
(b) 1-gram word model (facsimile)
(c) 1-3-gram char model (diplomatic)
(d) 1-gram word model (diplomatic)
Fig. 4: Confusion matrices for cross validation error in normalised counts.</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions and future work</title>
      <p>In this paper we investigated how state-of-the-art NLP models can be applied
to the problem of dating medieval charters from 1200-to-1600 Denmark. We
framed the problem as a classification task by dividing the period into bins of 50
years and using these as classes in a supervised learning setting to develop SVM
classifiers. Furthermore, we investigated how diferent levels of transcription of
the text can be used to facilitate this task.</p>
      <p>We showed that the more detailed facsimile transcription, which captures
palaeographic characteristics of a text, provided better results than the
diplomatic level, where such distinctions are normalised. Moreover, both
character and word n-grams showed promising results, the highest accuracy reaching
74.96 %. This level of accuracy corresponds to being able to date almost 75 %
of the charters with a 25-year error margin, which philologists use as a standard
for the precision with which medieval texts can be dated manually.</p>
      <p>
        Looking into the accuracy results of the experiments in more depth, we
showed that there was a substantial diference in how well documents from
individual bins could be predicted, ranging from 88 % accuracy for the 1250-1300
documents to 52 % accuracy for documents from the years 1500-1500. We
argued that this diference is likely to be due to the fact that some of the bins
are represented by a few dozen documents. NLP methods often assume that a
large amount of training data is available. However, small datasets are often a
circumstance when working with historical text sources. In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] the authors were
able to increase the precision of their models drastically by adding synthetically
generated documents to the dataset. Similar methods would be interesting to
apply to our current collection. However, one danger when using such methods,
is that they may be prone to overfitting. A possible way to control for this
unwanted efect would be to validate the model with documents from outside the
collection.
      </p>
      <p>As discussed in Section 5 some charters from the medieval period are copies
of earlier text making them dificult to date. In connection to the documents
misclassified by the models, it would be interesting to do a more thorough
inspection of these to see if some of them were indeed unidentified copies missed in
the manual labelling process. The problem of outlier detection motivates another
future line of work in which the temporal ranking of a collection of documents
is just as important as the actual dating.</p>
      <p>
        In this paper we compared the diferent models by looking at their
performance measured in terms of accuracy. We haven’t yet, however, investigated
the behaviour or the individual predictions made by the diferent models. The
interpretability of machine learning models is currently a much debated topic
among NLP researchers, the motivation for this line of work being the
importance of creating trust in the models’ predictions and a wish to infer causality
in the natural world from synthetic learning settings [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Both causality and trust are relevant when studying automatic methods for
the dating of historical documents: Considering causality first, the ability to
answer questions about what types of feature were relevant when using the
facsimile and diplomatic transcriptions, respectively, would help us answer
questions about how the charters developed over time. Similarly, when comparing
word and character models, it would be useful to determine if the most
predictive features reflect lexical, phonological and morphological characteristics of the
manuscripts that a philologist would recognise as being relevant to their dating.
As for trust, it is relevant when applying trained models to undated documents.
Knowing why the models fail or succeed is a crucial step if we wish to apply
these models to currently undated documents, such as the copies we mentioned
earlier. Being able to interpret such models might in turn contribute to studies
of the diachronic developments within text collections using diferent linguistic
annotations as basis for such analyses.</p>
      <p>In this study we looked at two diferent levels of transcription of the
charters, one that captured palaeographic characteristics and the other where such
diferences were normalised. In the future it would be interesting to repeat the
study with other levels of transcriptions, e.g., the normalised level, or to
include diferent types of linguistic annotation such as POS tags or morphological
features. Moreover, as mentioned earlier, it is not only the text that provides
clues to when a manuscript was written, but so does physical evidence about
ink and parchment. In this respect a line of future work could be to investigate
how methods of ensemble learning can contribute to the problem of dating
documents, by combing the textual evidence outlined in this paper with evidence
from image processing or multispectral measurements of the material.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abe</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsumoto</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Text categorization with considering temporal patterns of term usages</article-title>
          .
          <source>In: Proceedings of the 2010 IEEE International Conference on Data Mining Workshops</source>
          . pp.
          <fpage>800</fpage>
          -
          <lpage>807</lpage>
          . ICDMW '10, IEEE Computer Society, Washington, DC, USA (
          <year>2010</year>
          ). https://doi.org/10.1109/ICDMW.
          <year>2010</year>
          .186
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Garcia-Fernandez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ligozat</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dinarelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernhard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>When was it written? Automatically determining publication dates</article-title>
          . In: Grossi,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Sebastiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Silvestri</surname>
          </string-name>
          ,
          <string-name>
            <surname>F</surname>
          </string-name>
          . (eds.)
          <source>String Processing and Information Retrieval</source>
          . pp.
          <fpage>221</fpage>
          -
          <lpage>236</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Adkomstbreve i Skt. Clara Klosters arkiv</article-title>
          . In: Driscoll,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Óskarsdóttir</surname>
          </string-name>
          , S. (eds.)
          <article-title>66 håndskrifter fra Arne Magnussons samling</article-title>
          , pp.
          <fpage>138</fpage>
          -
          <lpage>139</lpage>
          . Museum Tusculanum (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Haugen</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bruvik</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Driscoll</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johansson</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kyrkjebø</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wills</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The Menota handbook: Guidelines for the electronic encoding of Medieval Nordic primary sources</article-title>
          .
          <source>The Medieval Nordic Text Archive</source>
          ,
          <volume>2</volume>
          <fpage>edn</fpage>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sammara</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burgers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Towards style-based dating of historical documents</article-title>
          .
          <source>In: 14th International Conference on Frontiers in Handwriting Recognition</source>
          . pp.
          <fpage>265</fpage>
          -
          <lpage>270</lpage>
          (
          <year>Sep 2014</year>
          ). https://doi.org/10.1109/ICFHR.
          <year>2014</year>
          .52
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>A polar stroke descriptor for classification of historical documents</article-title>
          .
          <source>In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR)</source>
          . pp.
          <fpage>6</fpage>
          -
          <lpage>10</lpage>
          (
          <year>Aug 2015</year>
          ). https://doi.org/10.1109/ICDAR.
          <year>2015</year>
          .7333715
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kjeldsen</surname>
            ,
            <given-names>A.S.:</given-names>
          </string-name>
          <article-title>Filologiske studier i kongesagahåndskriftet Morkinskinna, Bibliotheca Arnamagnaeana Supplementum</article-title>
          , vol.
          <volume>8</volume>
          .
          <string-name>
            <given-names>Museum</given-names>
            <surname>Tusculanums</surname>
          </string-name>
          <string-name>
            <given-names>Forlag</given-names>
            ,
            <surname>København</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lipton</surname>
            ,
            <given-names>Z.C.</given-names>
          </string-name>
          :
          <article-title>The mythos of model interpretability</article-title>
          .
          <source>arXiv:1602.04938v3 [cs.LG]</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Medvedeva</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kroon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plank</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>When sparse traditional models outperform dense neural networks: The curious case of discriminating between similar languages</article-title>
          .
          <source>In: Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)</source>
          . pp.
          <fpage>156</fpage>
          -
          <lpage>163</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Pichel</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.R.</given-names>
            ,
            <surname>Gamallo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Alegria</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Measuring language distance among historical varieties using perplexity. Application to European Portuguese</article-title>
          .
          <source>In: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial</source>
          <year>2018</year>
          ). pp.
          <fpage>145</fpage>
          -
          <lpage>155</lpage>
          . Association for Computational Linguistics (
          <year>2018</year>
          ), http://aclweb.org/anthology/W18-3916
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Popescu</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strapparava</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Behind the times: Detecting epoch changes using large corpora</article-title>
          .
          <source>In: International Joint Conference on Natural Language Processing IJCNLP</source>
          . pp.
          <fpage>347</fpage>
          -
          <lpage>355</lpage>
          . Nagoya,
          <source>Japan (October</source>
          <volume>14</volume>
          -18
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Popescu</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strapparava</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Semeval 2015, task 7: Diachronic text evaluation</article-title>
          .
          <source>In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2015</year>
          ). pp.
          <fpage>870</fpage>
          -
          <lpage>878</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Štajner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Stylistic changes for temporal text classification</article-title>
          . In: Habernal,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Matoušek</surname>
          </string-name>
          , V. (eds.) Text, Speech, and Dialogue. pp.
          <fpage>519</fpage>
          -
          <lpage>526</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Szymanski</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lynch</surname>
          </string-name>
          , G.:
          <article-title>Ucd: Diachronic text classification with character, word, and syntactic n-grams</article-title>
          .
          <source>In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2015</year>
          ). pp.
          <fpage>879</fpage>
          -
          <lpage>883</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wahlberg</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mårtensson</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Large scale continuous dating of medieval scribes using a combined image and language model</article-title>
          .
          <source>In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS)</source>
          . pp.
          <fpage>48</fpage>
          -
          <lpage>53</lpage>
          (
          <year>Apr 2016</year>
          ). https://doi.org/10.1109/DAS.
          <year>2016</year>
          .71
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Wahlberg</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mårtensson</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Large scale style based dating of medieval manuscripts</article-title>
          .
          <source>In: Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing</source>
          . pp.
          <fpage>107</fpage>
          -
          <lpage>114</lpage>
          . HIP '15,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2015</year>
          ). https://doi.org/10.1145/2809544.2809560
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wahlberg</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Historical manuscript production date estimation using deep convolutional neural networks</article-title>
          .
          <source>In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)</source>
          . pp.
          <fpage>205</fpage>
          -
          <lpage>210</lpage>
          (
          <year>Oct 2016</year>
          ). https://doi.org/10.1109/ICFHR.
          <year>2016</year>
          .0048
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malmasi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dras</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Modeling language change in historical corpora: The case of Portuguese</article-title>
          . In: Calzolari,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Declerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Goggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Grobelnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Mazo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Odijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Piperidis</surname>
          </string-name>
          , S. (eds.)
          <source>Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ). pp.
          <fpage>4098</fpage>
          -
          <lpage>4104</lpage>
          . European Language Resources Association (ELRA), Paris, France (May
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malmasi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glass</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scherrer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samardžić</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ljubešić</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tiedemann</surname>
            , J., van der Lee,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grondelaers</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oostdijk</surname>
          </string-name>
          , N., van den Bosch, A.,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lahiri</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Language identification and morphosyntactic tagging: The second VarDial evaluation campaign</article-title>
          .
          <source>In: Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)</source>
          .
          <source>Santa Fe</source>
          , USA (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>