<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Alessio Palmero Aprosio[</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Tint, the Swiss-Army Tool for Natural Language Processing in Italian</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>0000</year>
      </pub-date>
      <volume>0002</volume>
      <abstract>
        <p>In this we paper present the last version of Tint, an opensource, fast and extendable Natural Language Processing suite for Italian based on Stanford CoreNLP. The new release includes a set of text processing components for ne-grained linguistic analysis, from tokenization to relation extraction, including part-of-speech tagging, morphological analysis, lemmatization, multi-word expression recognition, dependency parsing, named-entity recognition, keyword extraction, and much more. Tint is written in Java freely distributed under the GPL license. Although some modules do not perform at a state-of-the-art level, Tint reaches very good accuracy in all modules, and can be easily used outof-the-box.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural Language Processing</kwd>
        <kwd>Arti cial Intelligence</kwd>
        <kwd>Text Analysis</kwd>
        <kwd>Readability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In this paper, we present Tint, a suite of ready-to-use modules for Italian NLP.
Tint is free to use, open source, and can be downloaded and used out-of-the-box
(see Section 5). Compared to the previous versions [
        <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
        ], the suite has been
enriched with several modules for ne-grained linguistic analysis that were not
available for Italian before (for example, constituency parsing, relation
extraction, temporal expression extraxtion) Finally, some other modules have been
trained with new data (named-entity recognition and dependency parsing).
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>Most of the linguistic pipelines freely available for download (such as Stanford
CoreNLP1 and OpenNLP2) are language independent and, even if they are not
Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <sec id="sec-2-1">
        <title>1 http://stanfordnlp.github.io/CoreNLP/</title>
      </sec>
      <sec id="sec-2-2">
        <title>2 https://opennlp.apache.org/</title>
        <p>
          available in Italian out-of-the-box, they could be trained in every existing
language. A notable examples in this direction are UDpipe [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ], SpaCy [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], and
Stanza [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], trainable pipelines which perform most of the common NLP tasks
and are available in almost all the languages included in the Universal
Dependencies [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], Freeling [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], a C++ library providing language analysis functionalities
for a variety of languages. There are also some other pipelines speci cally written
for Italian, such as TextPro [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], and T2K [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], but none of them are released
as open source (and TextPro is the only one that can be downloaded and used
for free for research purposes).
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Modules</title>
      <p>
        The Tint software is built on top of Stanford CoreNLP, a framework that helps
users to derive linguistic annotations for text. Di erently from some similar tools,
such as UIMA [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and GATE [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], CoreNLP requires only basic object-oriented
programming skills to extend it. The centerpiece of CoreNLP is the pipeline, that
takes in raw text, runs a series of NLP annotators on the text, and produces
a nal set of annotations. CoreNLP supports six languages (Arabic, Chinese,
English, French, German, and Spanish). In Tint, we use the CoreNLP paradigm
and bring it to Italian.
      </p>
      <p>Among the modules, some of them have been implemented from scratch and
do not rely on the components available in Stanford CoreNLP (for example:
tokenization, morphological analysis, and so on). Other tasks, such as POS tagging
and dependency parsing, are performed using the existing modules in CoreNLP,
trained on Italian dataset available to the community. Finally, additional
modules include wrappers for existing tools written in Java or available through a
web API, such as keyword extraction and entity linking.</p>
      <p>We mark with an asterisk (*) the modules that have never been described in a
previous work, and with a dagger (†) the ones that were retrained or signi cantly
renovated w.r.t. the past articles.
3.1</p>
      <sec id="sec-3-1">
        <title>Tokenizer and sentence splitter</title>
        <p>
          This module provides text segmentation in tokens and sentences. At rst, the
text is grossly tokenized. Then, in a second step, tokens that need to be put
together are merged using two customizable lists of Italian non-breaking
abbreviations (such as \dott." or \S.p.A.") and regular expressions (for e-mail
addresses, web URIs, numbers, emoticons). This second phase uses [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] to speedup
the process.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Truecaser (*)</title>
        <p>The truecase module recognizes the \true" case of tokens (how it would be
capitalized in well-edited text) when this information is lost, e.g., all upper case text.</p>
        <p>
          It is included in the CoreNLP original package,3 and relies on a discriminative
model using the CRF sequence tagger. The model shipped with Tint has been
trained on an Italian corpus (1.3 billion words) that includes texts from di erent
domains: legal, narrative, news, and so on [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ].
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Part-of-speech tagger</title>
        <p>
          The part-of-speech annotation is provided through the Maximum Entropy
implementation [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ] included in Stanford CoreNLP. The model is trained on the
Universal Dependencies (UD) dataset for Italian [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
3.4
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Token splitter (*)</title>
        <p>The tokenizer (see Section 3.1) is able to split a text into words, but the basic
unit for a good morphological annotation is the syntactic word. This means</p>
        <sec id="sec-3-4-1">
          <title>3 https://stanfordnlp.github.io/CoreNLP/truecase.html</title>
          <p>that we systematically want to split o clitics, as in \dammelo" (verb \da",
plus pronouns \me", and \lo"), and undo contractions, as in \alla", that is \a"
(preposition) plus \la" (determiner). We refer to such cases as multiword tokens
because a single orthographic token corresponds to multiple (syntactic) words.</p>
          <p>The CoreNLP pipeline performs such task immediately after segmentation.
However, in Italian the tokenization alone is not enough to understand whether
a particular token needs to be split. For instance, depending on the context,
\delle" can be both a partitive article and a contraction of a preposition and
a determiner. Similarly, \porci" can be a noun or a verb plus clitic. We then
write a new module for the purpose, using the information provided by the POS
module to discriminate the ambiguous cases.</p>
          <p>To ensure compatibility with the previous versions of Tint, all the modules
provided in Tint that operate after part-of-speech are con gured to work either
with and without the splitter. Modules that need the training of a model (such
as dependency/constituency parsing and named-entity recognition) are trained
in both setups: the two models are included in the Tint distribution (one can
activate the right one in the con guration le).
3.5</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>Morphological analyzer</title>
        <p>
          The morphological analyzer module provides the full list of morphological
features for each annotated token. The current version of the module has been
trained using the Morph-it lexicon [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ], but it is possible to extend or retrain
it with other Italian datasets. To extend the coverage of the results, especially
for the complex forms, such as \porta-ce-ne" or \bi-direzionale", the module
decomposes the token into pre x-root-in x-su x and tries to recognise the root
form.
3.6
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>Lemmatizer (†)</title>
        <p>The module for the lemmatization is a rule-based system that works by
combining the part-of-speech output (Section 3.3) and the results of the
morphological analyzer (Section 3.5) so to disambiguate the morphological features using
the grammatical annotation. In order to increase the accuracy of the results,
the module tries to detect the genre of noun lemmas relying on the analysis
of their processed articles. For instance, for the correct lemmatization of \il
latte/the milk", the module uses the singular article \il" to identify the correct
gender/number of the lemma \latte" and returns \latte/milk" (male, singular)
instead of \latta/metal sheet" (female, which plural form is \latte").</p>
        <p>In addition, we developed a morphological guesser that is activated whenever
a form cannot be linked to any lemma through the morphological analyzer
(Section 3.5). Starting from the series form/lemma/pos in the Italian UD datasets,
we trained a statistical model using decision trees and probabilities given by
frequencies of certain su xes in the UD. For instance, starting from the
nonexistent word \insalatando" tagged as verb (probably meaning eating salad), the
guesser starts from the end of the form and, letter by letter, explores the tree of
possibilities until it reaches a result with a reasonable accuracy.</p>
        <p>The guesser is active by default, but can be deactivated when needed. When
active, the guessed lemmas are tagged as such, so that the researcher (or the
tool calling Tint) can use this information.
3.7</p>
      </sec>
      <sec id="sec-3-7">
        <title>Verbal tenses classi er</title>
        <p>Part-of speech tagger and morphological analyzer released with Tint can identify
and classify verbs at token level, but sometimes the modality, form and tense
of a verb is the result of a sequence of tokens, as in compound tenses such as
participio passato, or passive verb forms. For example, in Italian the word siamo,
takes as a single token, is the simple present form of the verb essere; if we look at
the surrounding words, we can have forms such as siamo andati (present perfect
of verb andare, active) or siamo mangiati (simple present of verb mangiare,
passive). For this reason, we include in Tint a tense module to provide a more
complete annotation of multi-token verbal forms. The module supports also the
analysis of discontinuous expressions, like for example ho sempre mangiato.
3.8</p>
        <p>A</p>
        <p>
          xes annotator
This module provides a token-level annotation about word derivatives, based
on derIvaTario [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ],4 a resource manually created to achieve a high accuracy
and overcome errors coming from resources developed in a semi-automatic way
[
          <xref ref-type="bibr" rid="ref36 ref42">42, 36</xref>
          ]. The dataset was built segmenting into derivational cycles about 11,000
derivatives and annotating them with a wide array of features. The module uses
this resource in input to segment a token into root and a xes, for example
visione is analysed as baseLemma=vedere, a x=zione and allomorph=ione.
3.9
        </p>
      </sec>
      <sec id="sec-3-8">
        <title>Multi-word expressions extractor</title>
        <p>
          A speci c multi-token annotator has been implemented to recognize more than
13,450 multi-word expressions, the so-called `polirematiche' [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ], manually
collected from various online resources. The list includes verbal, nominal, adjectival
and prepositional expressions (e.g. lasciar perdere, societa per azioni, nei
confronti di, mezzo morto). This annotator can identify also discontinuous
multiwords. For example, in the expression andare a genio (Italian phrase that means
\to like") an adverb can be included, as in andare troppo a genio. Similarly, in
such phrases one can nd nouns and adjectives (e.g. lasciare Antonio a piedi,
where lasciare a piedi is an Italian multiword for leave stranded ).
        </p>
        <sec id="sec-3-8-1">
          <title>4 http://derivatario.sns.it/</title>
          <p>3.10</p>
        </sec>
      </sec>
      <sec id="sec-3-9">
        <title>Named-entities recognizer (†)</title>
        <p>
          The NER module recognize persons, locations and organizations in the text. It
uses a CRF sequence tagger [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] included in Stanford CoreNLP and it is trained
on KIND [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], a dataset containing around 340K words taken from Wikinews.
        </p>
        <p>
          To enhance the classi cation, Stanford NER also accepts gazettes of names
labelled with the corresponding tag. We collect a list of persons, organizations
and locations from the Italian Wikipedia using some classes in DBpedia [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]:
Person, Organisation, and Place, respectively. In addition to this, we collect
the list of streets from OpenStreetMap [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], limiting the extraction to Italian
names.
3.11
        </p>
      </sec>
      <sec id="sec-3-10">
        <title>Temporal expressions extractor and normalizer (*)</title>
        <p>
          Since the rst version of Tint, the task of temporal expression extraction was
provided as a wrapper to HeidelTime [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ], a rule-based state-of-the-art temporal
tagger developed at Heidelberg University.
        </p>
        <p>
          The original English version of CoreNLP uses SUTime [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], a powerful library
for processing temporal expressions, built on top of TokensRegex, a framework
for de ning regular expressions over text and tokens, and mapping matched text
to semantic objects. The current version of Tint uses SUTime and a new set
of rules written for Italian. It also normalizes the expressions according to the
TIMEX3 annotation standard. SUTime is generally run as a subcomponent of
the named-entities recognizer annotator (Section 3.10) and is active by default
(it can be disabled if not needed).
        </p>
        <p>Recognized temporal expressions can be resolved relative to the document
date. For instance, the expression \mercoled scorso" will be resolved to the
Wednesday that is immediately before to the document date, be it the current
date or any other date. The document date can be set when Tint is launched,
otherwise current date and time are used.
3.12</p>
      </sec>
      <sec id="sec-3-11">
        <title>Constituency parser (*)</title>
        <p>
          A constituency parser is a program that works out the grammatical structure
of sentences, for instance, which groups of words go together (as \phrases" and
which words are the subject or object of a verb. In Tint this task is performed
by shift-reduce [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ] parser module included in Stanford CoreNLP.
        </p>
        <p>
          Data used for training is taken from both the Turin University Treebank [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
and the Parallel TUT [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Their licence allows to use it for research purposes.
        </p>
        <p>These treebanks cannot be used as is, because the multiword tokens (see
Section 3.4) are denoted by doubling the token. In addition, part-of-speech tags
and some constituency labels need to be replaces to make the dataset compatible
with the CoreNLP parser. Conversion rules for both tagsets are included in the
Tint release. A script to apply the conversion to the dataset is also included.
3.13</p>
      </sec>
      <sec id="sec-3-12">
        <title>Dependency parser (†)</title>
        <p>
          This module provides syntactic analysis of the text and uses a transition-based
parser (included in Stanford CoreNLP) which produces typed dependency parses
of natural language sentences [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The parser is powered by a neural network
which accepts word embedding inputs: the model is trained on the UD dataset
and the word embeddings are built on the corpus described in Section 3.2.
3.14
        </p>
      </sec>
      <sec id="sec-3-13">
        <title>Relation extraction</title>
        <p>New regulations on transparency and the recent policy for privacy force the
public administration (PA) to make their documents available, but also to limit
the di usion of personal data. The relation extraction module represents a rst
approach to the extraction of sensitive data from PA documents in terms of
named entities and semantic relations among them.</p>
        <p>
          For this task, we rely on the Relation Extraction module [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] included in
Stanford CoreNLP. For this module to work, a relation must connect two entities.
For instance, address is used for instance to link a LOC entity representing an
address to the person or company to which the address belongs, while birthDate,
birthLoc link respectively the date and location of birth.
        </p>
        <p>Some entities are extracted using the named-entities recognizer (Section 3.10)
and the temporal expression extractor (Section 3.11). To deal with all the
requested relations, some additional entity types are manually added and annotate
using the TokenRegexp CoreNLP module (see Section 3.11). Additional entities
include, for example, NUMBER for numbers (such as VAT), CF for the Italian
\codice scale" sequence of chars, ROLE for personal and organisation roles,
and so on.</p>
        <p>
          To train the relation extraction module, we use the REDIT dataset,
containing documents taken from the PA domain and manually annotated with 19
relations [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
3.15
        </p>
      </sec>
      <sec id="sec-3-14">
        <title>Text reuse</title>
        <p>Detecting text reuse is useful when, in a document, we want to measure the
overlap with a given corpus. This is needed in a number of applications, for example
for plagiarism detection, stylometry, authorship attribution, citation analysis,
etc. Tint includes a component to deal with this task, i.e. identifying parts of an
input text that overlap with a given corpus. First of all, each sentence of the
corpus is compared with the sentences in the processed text using the FuzzyWuzzy
package5, a Java fuzzy string matching implementation: this allows the system
not to miss expressions that are slightly di erent with respect to the texts in
the original corpus. In this phase, only long spans of text can be considered,
as the probability of an incorrect match on fuzzy comparison grows as soon as
the text length decreases. A second step checks whether the overlap involves the
whole sentence and, if not, it analyzes the two texts and identi es the number
of overlapping tokens. Finally, the Stanford CoreNLP quote annotator6 is used
to catch text reuse that is in between quotes, ignoring the length limitation of
the fuzzy comparison.
3.16</p>
      </sec>
      <sec id="sec-3-15">
        <title>Readability and corpus statistics</title>
        <p>
          In this module, we compute some metrics that can be useful to assess the
readability of a text, partially inspired by [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ]. In particular, we include the
following indices:
{ Number of content words, hyphens (using iText Java Library7), sentences
having less than a xed number of words, distribution of tokens based on
part-of-speech.
{ Type-token ratio (TTR), i.e. the ratio between the number of di erent
lemmas and the number of tokens; high TTR indicates a high degree of lexical
variation.
{ Lexical density, i.e. the number of content words divided by the total number
of words.
        </p>
        <sec id="sec-3-15-1">
          <title>5 https://github.com/xdrop/fuzzywuzzy</title>
        </sec>
        <sec id="sec-3-15-2">
          <title>6 https://stanfordnlp.github.io/CoreNLP/quote.html</title>
        </sec>
        <sec id="sec-3-15-3">
          <title>7 https://github.com/itext/itextpdf</title>
          <p>
            { Amount of coordinate and subordinate clauses, along with the ratio between
them.
{ Depth of the parse tree for each sentence: both average and max depth are
calculated on the whole text.
{ Gulpease formula [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ] to measure the readability at document level.
{ Text di culty based on word lists from De Mauro's Dictionary of Basic
Italian8.
          </p>
          <p>
            In addition, a set of extractors described in [
            <xref ref-type="bibr" rid="ref31">31</xref>
            ] and mainly de ning the
neostandard Italian used by high school students (such as anglicisms, euphonic \D",
and much more) are available out-of-the-box in Tint.
          </p>
          <p>Finally, a collection of CoreNLP annotators have been developed to extract
statistics that can be used, for instance, to analyse traits of interest in texts. More
speci cally, the provided modules can mark and compute words and sentences
based on token, lemma, part-of-speech and word position in the sentence.
3.17</p>
        </sec>
      </sec>
      <sec id="sec-3-16">
        <title>Keywords extraction</title>
        <p>
          Keyword extraction in Tint is performed by Keyphrase Digger9 [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], a rule-based
system for keyphrase extraction. It combines statistical measures with linguistic
information given by part-of-speech patterns to identify and extract weighted
keyphrases from texts.
3.18
        </p>
      </sec>
      <sec id="sec-3-17">
        <title>Entity linking</title>
        <p>
          The entity linking task consists in disambiguating a word (or a set of words)
and link them to a knowledge base (KB). The biggest (and most used) available
KB is Wikipedia, and almost every linking tool relies on it. The Tint pipeline
provides a wrapper annotator that can connect to DBpedia Spotlight10 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and
The Wiki Machine11 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Both tools are distributed as open source software
and can be used by the annotator both as external services or through a local
installation.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>Tint is a complex system that relies on a variety of modules interacting each
other. For this reason, the accuracy on the single tasks does not reach
state-ofthe-art accuracy. Nevertheless, Tint performs as a reasonable level in all tasks it
performs.</p>
      <p>
        An accurate evaluation of most of its modules (especially the ones that use
machine learning techniques), with a comparison with other NLP Italian tools,
is available in the previous papers [
        <xref ref-type="bibr" rid="ref24 ref26 ref27">26, 27, 24</xref>
        ].
      </p>
      <sec id="sec-4-1">
        <title>8 http://bit.ly/nuovo-demauro</title>
      </sec>
      <sec id="sec-4-2">
        <title>9 https://dh.fbk.eu/2015/12/kd-keyphrase-digger/</title>
        <p>10 https://www.dbpedia-spotlight.org/
11 https://bitbucket.org/fbk/airpedia/wiki/Tutorial</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Tint distibution</title>
      <p>The Tint pipeline is released as an open source software under the GNU General
Public License (GPL), version 3. It can be download from the Tint website12 as
a standalone package, or it can be integrated into an existing application as a
Maven dependency. The source code is available on Github.13</p>
      <p>The tool is written using the Stanford CoreNLP paradigm, therefore a third
part software can be integrated easily into the pipeline.</p>
      <p>
        Along with Tint, one can also try Tintful14 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], a NLP annotation software
that can be used both to manually annotate texts and to x mistakes in NLP
pipelines (and, in particular, in Tint). Using a paradigm similar to wiki-like
systems, a user who notices some wrong annotation can easily x it and submit
the resulting (and right) entry back to the tool developers. The Tint online
demo, linked from the project website, uses Tintful as graphical interface and is
con gured to show most of the modules described in this paper. Therefore the
annotation provided by the modules working on machine learning algorithms
that need to be trained over annotated data (named-entity recognizer,
part-ofspeech tagger, dependency parser) can be edited by the occasional user. The
resulting annotation will be manually checked by linguists and added to the
next training session. Figure 3 shows the web interface of Tintful.
12 http://tint.fbk.eu/
13 https://github.com/dhfbk/tint/
14 https://github.com/dhfbk/tintful
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>In this paper, we presented the last release of Tint, a simple, fast and accurate
NLP pipeline for Italian, based on Stanford CoreNLP. In the new version, we
xed some bugs and improved some of the existing modules. We also added a
set of components for ne-grained linguistics analysis that were not available so
far.</p>
      <p>
        In the future, we plan to improve the suite and extend it with additional
modules, in particular Word Sense Disambiguation (WSD) based on linguistic
resources such as MultiWordNet [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] and Semantic Role Labelling, by porting
to Italian resources such as FrameNet [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], now available only in English.
      </p>
      <p>
        We also plan to increase the accuracy of the trained modules (such as
partof-speech tagger and named-entity recognizer) using deep learning techniques
and including a pretrained language model at a di erent granularity (words,
characters) into the process [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Akbik</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blythe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vollgraf</surname>
          </string-name>
          , R.:
          <article-title>Contextual string embeddings for sequence labeling</article-title>
          .
          <source>In: Proceedings of the 27th International Conference on Computational Linguistics</source>
          . pp.
          <volume>1638</volume>
          {
          <fpage>1649</fpage>
          . Association for Computational Linguistics, Santa Fe, New Mexico, USA (Aug
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aprosio</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giuliano</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The wiki machine: an open source software for entity linking and enrichment</article-title>
          . ArXiv e-prints (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          . In: Aberer,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.S.</given-names>
            ,
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Allemang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.I.</given-names>
            ,
            <surname>Nixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Golbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Mika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Maynard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Mizoguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Cudre-Mauroux</surname>
          </string-name>
          , P. (eds.)
          <article-title>The Semantic Web</article-title>
          . pp.
          <volume>722</volume>
          {
          <fpage>735</fpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>C.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fillmore</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          :
          <article-title>The berkeley framenet project</article-title>
          .
          <source>In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume</source>
          <volume>1</volume>
          . pp.
          <volume>86</volume>
          {
          <fpage>90</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lombardo</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vassallo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lesmo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Building a treebank for Italian: a data-driven annotation schema</article-title>
          .
          <source>In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC'00)</source>
          .
          <source>European Language Resources Association (ELRA)</source>
          , Athens, Greece (May
          <year>2000</year>
          ), http://www.lrecconf.org/proceedings/lrec2000/pdf/220.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montemagni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Converting Italian treebanks: Towards an Italian Stanford dependency treebank</article-title>
          .
          <source>In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse</source>
          . pp.
          <volume>61</volume>
          {
          <fpage>69</fpage>
          .
          <article-title>Association for Computational Linguistics, So a</article-title>
          ,
          <source>Bulgaria (Aug</source>
          <year>2013</year>
          ), https://aclanthology.org/W13-2308
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <issue>7</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>A.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>SUTime: A library for recognizing and normalizing time expressions</article-title>
          .
          <source>In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)</source>
          . pp.
          <volume>3735</volume>
          {
          <fpage>3740</fpage>
          .
          <string-name>
            <surname>European Language Resources Association</surname>
          </string-name>
          (ELRA), Istanbul, Turkey (May
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.:</given-names>
          </string-name>
          <article-title>A fast and accurate dependency parser using neural networks</article-title>
          .
          <source>In: EMNLP</source>
          . pp.
          <volume>740</volume>
          {
          <issue>750</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tablan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Gate: An architecture for development of robust hlt applications</article-title>
          .
          <source>In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics</source>
          . pp.
          <volume>168</volume>
          {
          <fpage>175</fpage>
          . ACL '
          <volume>02</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2002</year>
          ). https://doi.org/10.3115/1073083.1073112, http://dx.doi.org/10.3115/1073083.1073112
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Daiber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakob</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hokamp</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          :
          <article-title>Improving e ciency and accuracy in multilingual entity extraction</article-title>
          .
          <source>In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics)</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>De La Briandais</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>File searching using variable length keys</article-title>
          .
          <source>In: Papers Presented at the the March 3-5</source>
          ,
          <year>1959</year>
          , Western Joint Computer Conference. pp.
          <volume>295</volume>
          {
          <fpage>298</fpage>
          .
          <string-name>
            <surname>IRE-AIEE-ACM</surname>
          </string-name>
          '
          <volume>59</volume>
          (Western), ACM, New York, NY, USA (
          <year>1959</year>
          ). https://doi.org/10.1145/1457838.1457895, http://doi.acm.
          <source>org/10</source>
          .1145/1457838.1457895
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montemagni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venturi</surname>
          </string-name>
          , G.:
          <article-title>Read-it: Assessing readability of italian texts with a view to text simpli cation</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies</source>
          . pp.
          <volume>73</volume>
          {
          <fpage>83</fpage>
          . SLPAT '
          <volume>11</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2011</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>2140499</volume>
          .
          <fpage>2140511</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venturi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montemagni</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>T2k^ 2: a system for automatically extracting and organizing knowledge from texts</article-title>
          .
          <source>In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC2014)</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Emanuele</given-names>
            <surname>Pianta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.G.</given-names>
            ,
            <surname>Zanoli</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.:</surname>
          </string-name>
          <article-title>The textpro tool suite</article-title>
          . In: Chair),
          <string-name>
            <given-names>N.C.C.</given-names>
            ,
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Odijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Piperidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Tapias</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)</source>
          .
          <source>European Language Resources Association (ELRA)</source>
          , Marrakech, Morocco (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ferrucci</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lally</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Uima: An architectural approach to unstructured information processing in the corporate research environment</article-title>
          .
          <source>Nat. Lang. Eng</source>
          .
          <volume>10</volume>
          (
          <issue>3-4</issue>
          ),
          <volume>327</volume>
          {348 (Sep
          <year>2004</year>
          ). https://doi.org/10.1017/S1351324904003523, http://dx.doi.org/10.1017/S1351324904003523
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Finkel</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grenager</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Incorporating non-local information into information extraction systems by Gibbs sampling</article-title>
          .
          <source>In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05)</source>
          . pp.
          <volume>363</volume>
          {
          <fpage>370</fpage>
          . Association for Computational Linguistics, Ann Arbor,
          <source>Michigan (Jun</source>
          <year>2005</year>
          ). https://doi.org/10.3115/1219840.1219885, https://aclanthology.org/P05-1045
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Frasnelli</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bocchi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmero</surname>
            <given-names>Aprosio</given-names>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Erase and rewind: Manual correction of NLP output through a web interface</article-title>
          .
          <source>In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations</source>
          . pp.
          <volume>107</volume>
          {
          <fpage>113</fpage>
          . Association for Computational Linguistics,
          <source>Online (Aug</source>
          <year>2021</year>
          ). https://doi.org/10.18653/v1/
          <year>2021</year>
          .acl-demo.
          <volume>13</volume>
          , https://aclanthology.org/
          <year>2021</year>
          .acl-demo.
          <fpage>13</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Honnibal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>An improved non-monotonic transition system for dependency parsing</article-title>
          .
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>1373</volume>
          {
          <fpage>1378</fpage>
          . Association for Computational Linguistics, Lisbon, Portugal (
          <year>September 2015</year>
          ), https://aclweb.org/anthology/D/D15/D15-1162
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Lucisano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piemontese</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>GULPEASE: una formula per la predizione della di colta dei testi in lingua italiana</article-title>
          .
          <source>Scuola e citta</source>
          <volume>3</volume>
          (
          <issue>31</issue>
          ),
          <volume>110</volume>
          {
          <fpage>124</fpage>
          (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. de Marne e, M.C.,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nivre</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Universal Dependencies.
          <source>Computational Linguistics</source>
          <volume>47</volume>
          (
          <issue>2</issue>
          ),
          <volume>255</volume>
          {
          <volume>308</volume>
          (07
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Moretti</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sprugnoli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tonelli</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Digging in the dirt: Extracting keyphrases from texts with kd</article-title>
          .
          <source>CLiC</source>
          it p.
          <volume>198</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <article-title>OpenStreetMap contributors: Planet dump retrieved from https://planet</article-title>
          .osm.org . https://www.openstreetmap.org (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Paccosi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmero</surname>
            <given-names>Aprosio</given-names>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>KIND: an Italian Multi-Domain Dataset for Named Entity Recognition</article-title>
          . In: arXiv preprint (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Paccosi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmero</surname>
            <given-names>Aprosio</given-names>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>REDIT: a Tool and Dataset for Extraction of Personal Data in Documents of the Public Administration Domain</article-title>
          . In: CLiC-it
          <source>2021 Italian Conference on Computational Linguistics</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Padro</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stanilovsky</surname>
          </string-name>
          , E.:
          <article-title>Freeling 3.0: Towards wider multilinguality</article-title>
          .
          <source>In: LREC2012</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>Palmero</given-names>
            <surname>Aprosio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Moretti</surname>
          </string-name>
          , G.:
          <article-title>Italy goes to Stanford: a collection of CoreNLP modules for Italian</article-title>
          . ArXiv e-prints (
          <year>Sep 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>Palmero</given-names>
            <surname>Aprosio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Moretti</surname>
          </string-name>
          , G.:
          <article-title>Tint 2.0: an all-inclusive suite for nlp in italian</article-title>
          .
          <source>In: Proceedings of the Fifth Italian Conference on Computational Linguistics CLiCit</source>
          . vol.
          <volume>10</volume>
          , p.
          <volume>12</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Pianta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bentivogli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girardi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Developing an aligned multilingual database</article-title>
          .
          <source>In: Proc. 1st Int'l Conference on Global WordNet. Citeseer</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Bolton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Stanza: A Python natural language processing toolkit for many human languages</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          (
          <year>2020</year>
          ), https://nlp.stanford.edu/pubs/qi2020stanza.pdf
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Sanguinetti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : PartTUT: The Turin University Parallel Treebank, pp.
          <volume>51</volume>
          {
          <fpage>69</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Sprugnoli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tonelli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aprosio</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moretti</surname>
          </string-name>
          , G.:
          <article-title>Analysing the evolution of students' writing skills and the impact of neo-standard italian with the help of computational linguistics</article-title>
          .
          <source>In: Proceedings of the Sixth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2018</year>
          ). Torino,
          <string-name>
            <surname>Italy</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Straka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strakova</surname>
          </string-name>
          , J.:
          <article-title>Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe</article-title>
          .
          <source>In: Proceedings of the CoNLL</source>
          <year>2017</year>
          <article-title>Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies</article-title>
          . pp.
          <volume>88</volume>
          {
          <fpage>99</fpage>
          . Association for Computational Linguistics, Vancouver, Canada (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33. Strotgen, J.,
          <string-name>
            <surname>Armiti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Canh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zell</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gertz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Time for more languages: Temporal tagging of arabic, italian, spanish, and vietnamese</article-title>
          .
          <source>ACM Transactions on Asian Language Information Processing (TALIP) 13(1)</source>
          ,
          <volume>1</volume>
          {
          <fpage>21</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McClosky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gusev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Customizing an information extraction system to a new domain</article-title>
          .
          <source>In: Proceedings of the ACL 2011 Workshop on Relational Models of Semantics</source>
          . pp.
          <volume>2</volume>
          {
          <fpage>10</fpage>
          . Association for Computational Linguistics, Portland, Oregon, USA (Jun
          <year>2011</year>
          ), https://aclanthology.org/W11-0902
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Talamo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Celata</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertinetto</surname>
            ,
            <given-names>P.M.:</given-names>
          </string-name>
          <article-title>DerIvaTario: An annotated lexicon of Italian derivatives</article-title>
          .
          <source>Word Structure</source>
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <volume>72</volume>
          {
          <fpage>102</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Tamburini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melandri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Anita: a powerful morphological analyser for italian</article-title>
          . In: Chair),
          <string-name>
            <given-names>N.C.C.</given-names>
            ,
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Declerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Dogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.U.</given-names>
            ,
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Odijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Piperidis</surname>
          </string-name>
          , S. (eds.)
          <source>Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)</source>
          .
          <source>European Language Resources Association (ELRA)</source>
          , Istanbul, Turkey (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Tonelli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmero</surname>
            <given-names>Aprosio</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Mazzon</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>The impact of phrases on italian lexical simpli cation</article-title>
          .
          <source>In: Proceedings of the Fourth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2017</year>
          ). pp.
          <volume>316</volume>
          {
          <issue>320</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Tonelli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran Manh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pianta</surname>
          </string-name>
          , E.:
          <article-title>Making readability indices readable</article-title>
          .
          <source>In: Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations</source>
          . pp.
          <volume>40</volume>
          {
          <fpage>48</fpage>
          . Association for Computational Linguistics, Montreal, Canada (
          <year>June 2012</year>
          ), http://www.aclweb.org/anthology/W12-2206
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Feature-rich partof-speech tagging with a cyclic dependency network</article-title>
          .
          <source>In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1</source>
          . pp.
          <volume>173</volume>
          {
          <fpage>180</fpage>
          . NAACL '
          <volume>03</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2003</year>
          ). https://doi.org/10.3115/1073445.1073478, http://dx.doi.org/10.3115/1073445.1073478
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>Voghera</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Polirematiche</surname>
          </string-name>
          . La formazione delle parole in italiano pp.
          <volume>56</volume>
          {
          <issue>69</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Zanchetta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baroni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Morph-it! A free corpus-based morphological resource for the Italian language</article-title>
          .
          <source>Corpus Linguistics 2005</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ) (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <string-name>
            <surname>Zanchetta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baroni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Morph-it! a free corpus-based morphological resource for the italian language</article-title>
          .
          <source>In: Proceedings of corpus linguistics 2005</source>
          . University of Birmingham UK (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          43.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Zhu</surname>
          </string-name>
          , J.:
          <article-title>Fast and accurate shiftreduce constituent parsing</article-title>
          .
          <source>In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          . pp.
          <volume>434</volume>
          {
          <fpage>443</fpage>
          .
          <article-title>Association for Computational Linguistics, So a</article-title>
          ,
          <source>Bulgaria (Aug</source>
          <year>2013</year>
          ), https://aclanthology.org/P13-1043
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>