<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>matteo-brv @ DaDoEval: An SVM-based Approach for Automatic Document Dating</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matteo Brivio</string-name>
          <email>matteo.brivio@student.uni-tuebingen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Tu ̈ bingen Department of Linguistics</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. This paper describes our contribution to the EVALITA 2020 shared task DaDoEval - Dating Document Evaluation. The solution we present is based on a linear multi-class Support Vector Machine classifier trained on a combination of character and word n-grams, as well as number of word tokens per document. Despite its simplicity, the system ranked first both in the coarse-grained classification task on same-genre data and in the one on cross-genre data, achieving a macroaverage F1 score of 0.934 and 0.413, respectively. The system implementation is available at https://github.com/ matteobrv/DaDoEval.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Temporal information, such as the publication date
of a document, is of major relevance in a number
of domains, like historical linguistics and digital
humanities
        <xref ref-type="bibr" rid="ref22">(Niculae et al., 2014)</xref>
        . This is arguably
even more true for a wide range of information
retrieval tasks, such as document exploration,
similarity search, summarisation and clustering, where
the temporal dimension plays a major role in
improving search results
        <xref ref-type="bibr" rid="ref11 ref12">(Alonso et al., 2007; Alonso
et al., 2011)</xref>
        .
      </p>
      <p>
        Such information, however, is not always
readily available and must therefore be inferred,
relying either on qualitative or quantitative methods,
if not both
        <xref ref-type="bibr" rid="ref2">(Ciula, 2017)</xref>
        . Nonetheless, despite
their significance, methods for temporal text
classification and automatic document dating are still
rather unexplored compared to other text
classification tasks
        <xref ref-type="bibr" rid="ref22">(Niculae et al., 2014)</xref>
        . This, however,
      </p>
      <p>Copyright © 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
is most likely bound to change as the increasing
availability of large-scale, time-annotated digital
resources, such as Google n-grams1, is promoting
research in this direction. Two recent examples of
this new trend, in line with the present task, are
the Diachronic Text Evaluation shared task
organised by Popescu et al. (2015) at SemEval 2015 and
the RetroC Challenge presented by Gralin´ski et al.
(2017).</p>
      <p>In this work we propose a simple, yet effective,
approach for automatic document dating based on
a linear multi-class Support Vector Machine
classifier, trained on a combination of character and
word n-grams, as well as document length in word
tokens.</p>
      <p>
        The solution is evaluated in the context of
the DaDoEval – Dating Document Evaluation –
shared task at EVALITA 2020
        <xref ref-type="bibr" rid="ref1 ref16">(Menini et al.,
2020; Basile et al., 2020)</xref>
        . The task is based on
the Alcide De Gasperi’s corpus of public
documents
        <xref ref-type="bibr" rid="ref14">(Tonelli et al., 2019)</xref>
        and is organised into
six sub-tasks: (I) coarse-grained classification on
same-genre data, (II) coarse-grained classification
on cross-genre data, (III) fine-grained
classification on same-genre data, (IV) fine-grained
classification on cross-genre data, (V) year-based
classification on same-genre data, (VI) year-based
classification on cross-genre data.
      </p>
      <p>The proposed solution tackles the first two
subtasks, coarse-grained classification on same-genre
and cross-genre data. Both sub-tasks require to
correctly assign document samples to one of the
main five time periods identified in De Gasperi’s
political life, spanning a range of over fifty years
from 1901 to 1954.</p>
      <p>The paper is structured as follows: in section 2
we provide a brief overview of the training data
set, in section 3 we go over the system setup and
describe the feature space, section 4 is dedicated
to results analysis and discussion, in section 5 we
1http://books.google.com/ngrams
consider possible improvements while section 6 is
reserved for final remarks.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>
        The training data set released for the shared task
includes 2,210 document samples extracted from
the Alcide De Gasperi’s corpus of public
documents, a multi-genre collection of 2,759 texts
written or transcribed between 1901 and 1954
        <xref ref-type="bibr" rid="ref14">(Tonelli
et al., 2019)</xref>
        .
      </p>
      <p>With respect to the coarse-grained
classification sub-tasks, the given samples are organised
into five classes (see Table 1) corresponding to
the main time periods historians identified in De
Gasperi’s political life: Habsburg years
19011918, Beginning of political activity 1919-1926,
Internal exile 1927-1942, From fascism to the
Italian Republic 1943-1947, Building the Italian
Republic 1948-1954.</p>
      <p>A preliminary analysis of the data set reveals an
imbalanced class distribution, with a significantly
lower number of samples in the third class,
corresponding to the 1927-1942 interval. This,
however, is partially mitigated by the markedly higher
average number of word tokens per sample
observed in this class compared to the other ones.
3</p>
    </sec>
    <sec id="sec-3">
      <title>System Description</title>
      <p>
        The proposed solution is based on a Support
Vector Machine (SVM) classifier implemented using
the Scikit-learn library
        <xref ref-type="bibr" rid="ref7">(Pedregosa et al., 2011)</xref>
        .
      </p>
      <p>To account for the rather imbalanced data set,
the SVM is tuned in such a way that classes are
assigned weights inversely proportional to their
frequency in the input data.</p>
      <p>
        Following the assumption that most text
categorisation problems are linearly
separable
        <xref ref-type="bibr" rid="ref18">(Joachims, 1998)</xref>
        the model uses a
linear kernel implemented in terms of libsvm
        <xref ref-type="bibr" rid="ref5">(Chang and Lin, 2011)</xref>
        while relying on a
one-versus-one decision strategy to handle
both sub-tasks as multi-class, single label,
classification problems.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Feature space</title>
        <p>The system relies solely on the data provided by
the task organisers and is split into training set
(80%) and development set (20%). No
preprocessing is applied, as measures such as case
normalisation and punctuation removal do not seem to
improve the classification result on the development
set, but rather to worsen it.</p>
        <p>
          Each document in the data set is represented
using three sets of features: document length in
terms of word tokens as well as character and word
n-grams. In this respect, we explore the idea that
SVMs trained on combinations of character and
word n-grams are particularly effective in tackling
text classification tasks
          <xref ref-type="bibr" rid="ref3 ref4">( C¸o¨ltekin and Rama, 2017;
C¸o¨ltekin and Rama, 2018)</xref>
          .
        </p>
        <p>Character n-grams are extracted for n 2
f3; 4; 5g and span across word boundaries, thus
capturing punctuation and space characters
occurring at the beginning and at the end of each word
token. Word n-grams, on the other hand, are
extracted for n 2 f1; 2g. Both feature sets are
weighted using term-frequency, inverse-document
frequency (TF-IDF) to scale down the impact of
the most frequent n-grams.</p>
        <p>
          The number of word tokens per document is
computed in a naive way, splitting each sample at
every white space. Similarly to n-gram features,
tokens count are scaled down to a 0-1 range in an
attempt to avoid numerical problems and prevent
features in higher numeric ranges from
dominating those in smaller ones
          <xref ref-type="bibr" rid="ref6">(Hsu et al., 2003)</xref>
          .
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Optimisation and Tuning</title>
        <p>The system hyper-parameters are optimised to
obtain the best F1 score on the development set.</p>
        <p>A subset of the hyper-parameters is tuned
empirically through several experiments or on the
basis of existing literature. This is the case for kernel
type, decision strategy, class balancing, tolerance
for stopping criterion (tol) and n-grams size.</p>
        <p>The remaining hyper-parameters considered
during optimisation are the regularisation
parameter (C) together with the maximum and minimum
document frequency (max df, min df), which
in the present approach are used to set an
acceptance threshold for high and low frequency
ngrams.</p>
      </sec>
      <sec id="sec-3-3">
        <title>COMPONENT</title>
      </sec>
      <sec id="sec-3-4">
        <title>PARAMETER</title>
      </sec>
      <sec id="sec-3-5">
        <title>VALUE</title>
        <sec id="sec-3-5-1">
          <title>TfidfVectorizer analyzer</title>
          <p>max df
min df
ngram range
lowercase</p>
        </sec>
        <sec id="sec-3-5-2">
          <title>TfidfVectorizer analyzer</title>
          <p>max df
min df
ngram range
lowercase
SVM
kernel
decision function
tol
C
class weight
word
0.9
0.004
(1, 2)
False
char
0.3
0.001
(3, 5)
False
linear
ovo
1e-12
0.881
balanced</p>
          <p>
            These hyper-parameters are tuned through
the BayesSearchCV algorithm implemented
in the scikit-optimize library
            <xref ref-type="bibr" rid="ref19">(Head et al.,
2020)</xref>
            , using a 5-fold-shuffled cross validation.
BayesSearchCV relies on Bayesian
Optimisation and explores the hyper-parameters search
space exploiting the information available from
previous evaluations. This is in contrast to other
approaches, such as grid and random search,
which move across the search space either in an
exhaustive or completely random manner.
          </p>
          <p>Table 2 summarises the best hyper-parameters
setup obtained from the tuning process.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>In this section we present the results for the two
sub-tasks the system participated to. Results are
summarised in Table 3 and reported in terms of
macro-average F1 score.</p>
      <p>The system ranked first both in the same-genre
and in the cross-genre coarse-grained
classification task, obtaining a macro-average F1 score of
0.934 and 0.413, respectively.</p>
      <sec id="sec-4-1">
        <title>TEAM RUN MACRO F1</title>
        <p>same-genre</p>
        <p>matteo-brv
cross-genre
matteo-brv
team 1
baseline
team 1
baseline
team 1
The runs submitted for the first sub-task are based
on test samples of the same genre as the ones in
the training set. The system scored well above
the baseline, which was computed with a Logistic
Regression model trained on TF-IDF-weighted
word unigrams, without performing any
preprocessing.</p>
        <p>Overall, the results registered on the test set are
in line with those observed during training. This is
confirmed by the data summarised in Table 4 and
by the confusion matrix in Figure 1.</p>
        <p>The confusion matrix depicts a run on the
development set which achieved a macro-average
F1 score of 0.95, while Table 4 reports the
perclass results of the best test run submitted for the
sub-task. In both cases 1919-1926, 1943-1947
and 1948-1954 are the classes showing the highest
number of misclassifications and, incidentally, are
also the ones corresponding to the shortest time
periods.</p>
      </sec>
      <sec id="sec-4-2">
        <title>CLASS</title>
        <p>1919-1926
1927-1942 1943-1947</p>
        <p>
          Predicted label
The runs submitted for the second sub-task are
based on samples coming from a cross-genre,
outof-domain test data set. These samples are a
subset of the documents collected for the Epistolario
project
          <xref ref-type="bibr" rid="ref15">(Tonelli et al., 2020)</xref>
          , an ongoing effort to
create a digital archive of Alcide De Gasperi’s
private and public correspondence.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Possible improvements</title>
      <p>
        Results for the same-genre task are quite
encouraging and in line with those obtained on the
development set, where the F1 score ranges between
0.92 and 0.96. However, with the current data
and setup, there might not be much room for
further improvement. Nonetheless, additional
features like richness measures and linguistically
motivated features (e.g. POS tags) are explored in
other contributions
        <xref ref-type="bibr" rid="ref13 ref9">(Sˇ tajner and Zampieri, 2013;
Zampieri et al., 2016)</xref>
        and could help achieve more
stable results.
      </p>
      <p>
        On the other hand, results for the second
subtask suggest a lack of generalisation on
crossgenre, out-of-domain data. In this respect, even
though SVM-based systems for text classification
should be able to perform well and take
advantage of high dimensional feature spaces
        <xref ref-type="bibr" rid="ref18">(Joachims,
1998)</xref>
        , it might still be worthwhile experimenting
with some feature selection methods. Another
angle worth considering is that the system might be
too sensitive to the shallow n-gram features used
to represent the training data. In this case,
including deeper text features, such as those
encoding syntactic information, might help the system
to abstract away from the lexical level. A first
step in this direction is attempted by Szymanski
and Lynch (2015) who employ Google
Syntactic N-grams in an SVM-based system that
participated to the Diachronic Text Evaluation shared
task
        <xref ref-type="bibr" rid="ref10">(Popescu et al., 2015)</xref>
        at SemEval 2015.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper we describe a simple, yet effective,
approach for automatic document dating
implemented for the DaDoEval shared task at EVALITA
2020. The system is based on a linear Support
Vector Machine and is trained on a small set of
stylistic and lexical features, resulting in a fast and
efficient classification model.</p>
      <p>In particular, the approach achieves top scores
in both coarse-grained classification sub-tasks,
thus confirming that SVM-based systems trained
on character and word n-grams are indeed well
suited to tackle text classification problems.</p>
      <p>Nonetheless, results observed in the second task
suggest that the model does not generalise well
on cross-genre data, leaving room for further
improvements.</p>
      <p>As expected, despite scoring above the
baseline, cross-genre results are significantly lower
than those obtained in the same-genre task.
Perclass results summarised in Table 5 show how
promising system performances registered in the
same-genre task do not transfer to the cross-genre
one, suggesting a poor ability of the model to
generalise. Particularly interesting and worth
investigating are the results registered for the third class,
corresponding to the 1927-1942 interval. With
respect to this class precision and recall values are
equal to 0, indicating that model did not recognise
any sample as belonging to this time period.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We thank Dr. C¸ ag˘ rı C¸ o¨ ltekin for his patient
encouragement and valuable suggestions throughout
this project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Arianna</given-names>
            <surname>Ciula</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Digital palaeography: What is digital about it? Digital Scholarship in the Humanities</article-title>
          ,
          <volume>32</volume>
          (
          <issue>2</issue>
          ):
          <fpage>ii89</fpage>
          -
          <lpage>ii105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>C¸ag˘rı C¸ o¨ltekin</article-title>
          , Taraka Rama.
          <year>2018</year>
          .
          <article-title>Tu¨bingen-oslo at SemEval-2018 task 2: SVMs perform better than RNNs in emoji prediction</article-title>
          .
          <source>In Proceedings of The 12th International Workshop on Semantic Evaluation</source>
          ,
          <fpage>34</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>C¸ag˘rı C¸ o¨ltekin</article-title>
          , Taraka Rama.
          <year>2017</year>
          .
          <article-title>Tu¨bingen system in VarDial 2017 shared task: experiments with language identification and cross-lingual parsing</article-title>
          .
          <source>In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)</source>
          ,
          <fpage>146</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Chih-chung</surname>
            <given-names>Chang</given-names>
          </string-name>
          , Chih-jen
          <string-name>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>LIBSVM: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ):
          <volume>27</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          :
          <fpage>27</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Chih-Wei</surname>
            <given-names>Hsu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chih-Chung Chang</surname>
          </string-name>
          and
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>A practical guide to support vector classification</article-title>
          .
          <source>Technical report</source>
          , Department of Computer Science, National Taiwan University.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Filip</given-names>
            <surname>Gralin</surname>
          </string-name>
          <article-title>´ski, Rafał Jaworski, Łukasz Borchmann</article-title>
          and Piotr Wierzchon´.
          <year>2017</year>
          .
          <article-title>The RetroC Challenge: How to Guess the Publication Year of a Text?</article-title>
          .
          <source>In Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage</source>
          ,
          <fpage>29</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Marcos</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Shervin Malmasi and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Dras</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Modeling Language Change in Historical Corpora: The Case of Portuguese</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC'16)</source>
          ,
          <fpage>4098</fpage>
          -
          <lpage>4104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Octavian</given-names>
            <surname>Popescu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Carlo</given-names>
            <surname>Strapparava</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Semeval 2015, task 7: Diachronic text evaluation</article-title>
          .
          <source>In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2015</year>
          ),
          <fpage>870</fpage>
          -
          <lpage>878</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Omar</given-names>
            <surname>Alonso</surname>
          </string-name>
          , Stro¨tgen Jannik, Baeza Y. Ricardo and
          <string-name>
            <given-names>Gertz</given-names>
            <surname>Michael</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Temporal Information Retrieval: Challenges and Opportunities</article-title>
          .
          <source>In Proceedings of the 1st International Temporal Web Analytics Workshop</source>
          ,
          <volume>11</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Omar</given-names>
            <surname>Alonso</surname>
          </string-name>
          , Gertz Michael and
          <string-name>
            <given-names>Baeza Y.</given-names>
            <surname>Ricardo</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>On the value of temporal information in information retrieval</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>41</volume>
          :
          <fpage>35</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Sanja Sˇtajner and Marcos Zampieri</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Stylistic Changes for Temporal Text Classification</article-title>
          .
          <source>In Proceedings of the 16th International Conference on Text, Speech and Dialogue (TSD), Lecture Notes in Artificial Intelligence - LNAI 8082</source>
          , Springer,
          <fpage>519</fpage>
          -
          <lpage>526</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Sara</given-names>
            <surname>Tonelli</surname>
          </string-name>
          , Rachele Sprugnoli and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Moretti</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain</article-title>
          . In Proceedings of CLIC-it
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Sara</given-names>
            <surname>Tonelli</surname>
          </string-name>
          , Rachele Sprugnoli, Giovanni Moretti, Stefano Malfatti and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Odorizzi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <string-name>
            <surname>Epistolario De Gasperi: National Edition of De Gasperi</surname>
          </string-name>
          <article-title>'s Letters in Digital Format</article-title>
          .
          <source>In Proceedings of AIUCD.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Menini</surname>
          </string-name>
          , Giovanni Moretti, Rachele Sprugnoli and
          <string-name>
            <given-names>Sara</given-names>
            <surname>Tonelli</surname>
          </string-name>
          .
          <year>2020</year>
          . DaDoEval @ EVALITA 2020:
          <article-title>Same-Genre and Cross-Genre Dating of Historical Documents</article-title>
          .
          <source>In Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Terrence</given-names>
            <surname>Szymanski</surname>
          </string-name>
          and
          <string-name>
            <given-names>Gerard</given-names>
            <surname>Lynch</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>UCD: Diachronic Text Classification with Character, Word, and Syntactic N-grams</article-title>
          .
          <source>In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2015</year>
          ),
          <fpage>879</fpage>
          -
          <lpage>883</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Thorsten</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Text categorization with support vector machines: Learning with many relevant features</article-title>
          .
          <source>In Proceedings of the 10th European Conference on Machine Learning (ECML'98)</source>
          ,
          <volume>1398</volume>
          :
          <fpage>137</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Head</surname>
            , Manoj Kumar, Holger Nahrstaedt, Gilles Louppe and
            <given-names>Iaroslav</given-names>
          </string-name>
          <string-name>
            <surname>Shcherbatyi</surname>
          </string-name>
          .
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>scikit-optimize/scikit-optimize (</article-title>
          <source>Version v0.8</source>
          .1).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Zenodo</surname>
          </string-name>
          http://doi.org/10.5281/zenodo.4014775.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Vlad</given-names>
            <surname>Niculae</surname>
          </string-name>
          , Marcos Zampieri, Liviu Dinu and
          <string-name>
            <surname>Alina M. Ciobanu</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Temporal Text Ranking and Automatic Dating of Texts</article-title>
          .
          <source>In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)</source>
          ,
          <volume>2</volume>
          :
          <fpage>17</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>