<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Crowdsourcing for Building Knowledge Graphs at Scale from the Vatican Archives (Discussion Paper)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Donatella Firmani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Merialdo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Nieddu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Rossi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Torlone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Roma Tre University</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Project In Codice Ratio is developing tools to extract the knowledge contained in the ancient manuscripts of the Vatican Archives. The scarcity of datasets suitable for our setting has led us to rely on crowdsourcing in all phases of our project. In this paper we discuss our approaches for leveraging inexpensive non-expert workers to fruitfully perform labelling operations on challenging manuscripts. We describe the range of di erent tasks we are devising, as well as the corresponding priority and redundancy policies we are employing. We describe the datasets collected thus far and the corresponding results.</p>
      </abstract>
      <kwd-group>
        <kwd>Crowdsourcing</kwd>
        <kwd>Knowledge Graphs</kwd>
        <kwd>Digital Humanities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Recent advancements in Knowledge Extraction from unstructured data have
opened up new, exciting possibilities in digital humanities. These novel
methodologies typically require large amounts of labelled data to reach satisfactory
performances; nonetheless, when dealing with speci c tasks in vertical domains,
the lack of suitable datasets makes it unfeasible to use such approaches. As a
matter of fact, in many scenarios the only available resources are raw and
unlabeled. This has led several projects to rely on crowdsourcing techniques, having
human operators, or workers, manually process and label part of the available
data. By gathering the outputs yielded by the workers it is possible to build
new, extensive datasets to train automatic systems on.</p>
      <p>
        Our Digital Humanities project (ICR) In Codice Ratio [
        <xref ref-type="bibr" rid="ref2 ref6">2, 6</xref>
        ] falls within this
scenario. ICR aims at extracting and harnessing the knowledge contained in the
manuscripts of the Vatican Apostolic Archive (VAA). The VAA is one of the
largest historical libraries in the world, with more than 85 linear kilometres of
shelving. We are currently focusing on its \Vatican Registers" corpus, consisting
of 43 parchment registers for a total of 18650 pages; these documents date to
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). This volume is published
and copyrighted by its editors. SEBD 2020, June 21-24, 2020, Villasimius, Italy.
the XIII century, under the papacy of Honorius III, and contain the o cial
correspondence of the Roman Curia, including legal or political letters from
and to kings, sovereigns and institutions throughout Europe. The manuscripts
are written in Chancery script (also called Cancelleresca) and they have been
recently digitized in high de nition images, but despite their historical relevance
they have never been integrally transcribed so far.
      </p>
      <p>ICR aims at (i) obtaining a textual transcription of the manuscripts in the
Archive, and (ii) extracting entities, relations and facts from the obtained text to
construct an extensive Knowledge Graph (KG). Performing automatic
transcription from the high de nition images of the document pages amounts to perform
Handwritten Text Recognition (HTR), while extracting structured knowledge
from the resulting transcription is a well known task of Text Mining.</p>
      <p>Machine Learning (ML) approaches have been shown to reach
state-of-theart performances in both tasks. Since our documents are written in Medieval
scripts and languages, no comprehensive datasets could be found for the
adequately training ML models. In this paper we discuss how we are leveraging
crowdsourcing methodologies to process, label and enrich data, in order to
collect suitable datasets, we describe the crowdsourcing tasks we have devised and
the assignment policies we have implemented.</p>
      <p>Our pool of workers counts over 700 members, and it is entirely constituted
by high-school students in the city of Rome, that have joined the project for free
as a part of their work-related learning program. In addition to performing tasks,
students also attended frontal lessons in a variety of topics, from paleography to
machine learning, managed by both the engineering and humanities departments
of Roma Tre University. As a consequence, they are provided with an opportunity
of personal growth, and they receive guidance for their future studies and careers.</p>
      <p>So far we have focused on the transcription phase: with the data labelled
by our workers, we were able to successfully generate datasets large enough for
training and developing very promising HTR models. These results allow us to
start assigning to our workers the rst batches of Text Mining tasks, with the
goal of extracting knowledge from the transcribed text.</p>
      <p>The rest of this paper is structured as follows. In Section 2 we report the
HTR-oriented tasks we have employed so far, and we describe the obtained
datasets and the corresponding transcription results. In Section 3 we discuss the
typologies of tasks we have devised for text mining and knowledge extraction. We
discuss related works in Section 4, and provide concluding remarks in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Crowdsourcing for automatic transcription</title>
      <p>The extraction of textual transcriptions from images is nowadays generally
performed with automatic HTR models. As already mentioned in the Introduction,
the lack of representative labelled datasets for the Papal Registers (i.e., similar
script, abbreviations, and layout) makes it di cult to train a full- edged HTR
system without dealing with a very expensive data preparation e ort.</p>
      <p>Mark the segments that form symbols similar to these:</p>
      <p>Next</p>
      <p>
        We have thus employed a crowd-sourcing approach to build our own dataset
with workers manually identifying portions from the original images. The
obtained labelled segments are then gathered into datasets large enough to enable
the training of HTR models. The registers are strikingly hard to decipher, and
the extensive use of abbreviations makes them even less understandable.
Therefore, unlike other transcription projects [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], in our case a non-expert worker is
generally unable to directly transcribe entire paragraphs, lines or even words.
Therefore we ask our workers to just recognize speci c symbols inside words: the
resulting task is more akin to pattern matching rather than to text transcription.
      </p>
      <p>In a preparatory step, we apply to the images of the manuscripts pages
(i.e. written facades) a pipeline of computer vision operations. We rst run a
custom binarization algorithm to transform them into black and white images.
We then use the alternation between black and white sections to perform line
segmentation and word segmentation. In each of the obtained word images, we
analyse the upper and lower pro le of the writing, and cut the word wherever
the stroke is absent or exceedingly thin. This is an over-segmentation, because
the obtained segments are ner-grained than the actual characters in the word.</p>
      <p>We then use this setting to generate our crowdsourcing tasks for character
recognition. In each task, we use as an input the oversegmented image of a word,
and we ask our workers if a speci c symbol is present within the word (e.g. symbol
\ ", that is a Tironian note meaning \et"). If it is, the worker should highlight
the set of consecutive segments that belong to that character. In case the symbol
occurs multiple times in the word, the worker can also highlight multiple sets of
segments. In order to facilitate the selection, in our UI we display each segment
in a di erent color, as shown in Figure 1, in which the worker is asked to identify
the symbols for character 'a'.</p>
      <p>
        Our approach to design the task was inspired by [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], in which the authors
address the problem of extracting a number of items satisfying certain properties
from a larger set, and design optimal algorithms for various settings in terms of
cost and time. In our tasks, if the worker has identi ed the character in the word,
this correspond to a positive vote; otherwise it is a negative one. We assume that
workers can make mistakes and employ redundancy to address this issue.
      </p>
      <p>
        Speci cally, we use the rectangular policy in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to aggregate votes and
the cost-optimal paradigm in same paper to schedule tasks: for each class, we
propose the same task to di erent workers until either positive or negative votes
exceed respective thresholds N and M .1 As soon as the crowd nds a
prede ned target amount of items for a class, we move to the next one sequentially.
The same over-segmented word image can be re-used in multiple tasks, asking
to recognize di erent symbols. At this regard, we prioritize the symbols that
we lack the most in our datasets. The extracted sets of segments are used to
generate labelled images, that are nally put in our transcription datasets. So
far, with the contributions gathered from our workers, we were able to build
various datasets, the latest - and largest - of which includes around 50k samples.
We currently cover 32 classes; among these, 21 correspond to character symbols,
10 correspond to abbreviation symbols, and the remaining class identi es
\notcharacter" elements, i.e. selections not corresponding to any valid item in the
alphabet 2. All our datasets are publicly available at our repository. 3
      </p>
      <p>
        We have then used the produced datasets to train our HTR model, based on
Deep Convolutional Neural Networks. Our model is described in greater detail
in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. As shown in Figure 2, character-level transcription results very promising:
in each character class our prediction accuracy exceeds 90% - even in the
\notcharacter" class. The average accuracy is 97%.
1 We tried numerous con gurations for these thresholds, and found that in our scenario
using N = M = 3 is the best option.
2 Labelled samples for \not-character" class are generated by combining or removing
segments from samples belonging to the other classes
3 http://www.inf.uniroma3.it/db/icr
      </p>
      <p>The whole process that generates tasks, assigns them to the workers, and
gathers the corresponding results, has been implemented as a Web Application
hosted in the engineering department at Roma Tre University. We used a
standard stack of technologies, using Java and the Spring Boot Framework, and
storing data in a centralized relational DB.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Crowdsourcing for KG generation</title>
      <p>After the correct transcription of a manuscript has been obtained, it is
possible to extract its entities and relations, and to use them to populate a KG. In
modern languages, these tasks are usually tackled by automatic classi ers based
on machine learning and NLP approaches. However, similarly to the HTR step,
when it comes to Medieval Latin texts the scarcity of labelled datasets makes it
hard to train automatic models. Therefore, we are going to approach this phase
too with crowd-based techniques. We are currently engineering two main
categories of tasks for our workers: Entity Classi cation and Relation Classi cation.
Each category may contain multiple types of tasks, with di erent requirements.
Entity Classi cation. We have veri ed that, in our manuscripts, it is
possible to automatically select almost all named entities by just applying simple
heuristics, e.g. searching for capitalized words. Nonetheless, human intervention
is still required to classify the selected entities into semantic categories, such as
\people", \locations", \organizations", \temporal data" etc.</p>
      <p>In a basic Entity Classi cation task, each worker may receive the text of an
entire transcribed page, with the named entities to classify already highlighted;
the task would consist in labelling each entity with the corresponding class. Note
that, once the correct class for an entity has been identi ed, it is possible to
propagate it to all the other occurrences of the same entity across the manuscript. For
this task, we assume that workers can make mistakes, and manage redundancy
with a policy similar to the Rectangular strategy employed for the transcription
task, extending it to the case in which multiple classes are available. The order
in which tasks will be handed to the workers will depend on their priority value,
that is, the number of entities it contains that have not been classi ed yet.</p>
      <p>Entity Classi cation can be further optimized by devising automatic labeling
rules. For instance, if a named entity lies in close proximity to special keywords,
such as the title "episcopus" (meaning bishop), one can infer that it belongs to
the \people" category. Modeling these rules requires knowledge of the speci c
syntactic patterns that occur most often in the manuscripts under analysis. Our
workers, after they have become accustomed to the basic Entity Classi cation
task, should ideally possess this kind of knowledge. Therefore, we plan to design
advanced Entity Classi cation tasks based on rule building. We will provide
students with the basics of rule-based logic, and divide them into groups. Each
group will be receive, as an input, sets of several already labelled pages, from
which they will be asked to build entity-classifying rules. We will evaluate the
resulting clauses on standard metrics such as precision, recall and f-measure,
and automatically select the combinations that maximize performances. We will
nally apply the best performing mix of rules to pages in which the entities have
not been classi ed yet.</p>
      <p>Relation Classi cation. In order to build a KG, after extracting entities and
their classes it is also necessary to identify the relations linking them. Once again,
we will model basic Relation Classi cation tasks as tasks of manual labelling.
In each task, the worker will receive as an input a single sentence containing at
least two distinct entities, and will be asked to identify the relations occurring
among such entities. The relation class can be chosen from a xed vocabulary,
that can be manually extended, if necessary. Once again, we will assume that
workers can make mistakes, and make the crowd results more robust through
redundancy, employing policies similar to those described for the basic Entity
Classi cation task.</p>
      <p>
        The Entity Classi cation and Relation Classi cation tasks will yield large
amounts of labelled data that we will use to train automatic classi ers, similarly
to the HTR phase. Once the KG has been populated, further facts can be inferred
by leveraging recent Link Prediction techniques based on KG embedding (see [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
for a comparative analysis). In terms of general prerequisites, our workers in
this phase will just require a very basic knowledge of Latin language - which
is generally imparted in Italian high schools. Furthermore, in case of doubts
or unclear transcriptions, they will have the possibility to interact with domain
experts and paleographers. In order to assign KG extraction tasks to our workers,
as well as to gather the corresponding results, we are planning to expand the
web application employed for the transcription phase (see Section 2).
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Related Works</title>
      <p>
        Works related to ours can be roughly divided into Text Recognition and KG
Generation works. We summarize these works in the following paragraphs, focusing
on those based on crowdsourcing approaches, and refer the reader to the recent
works [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ] for further discussion on crowdsourcing in the Text Recognition area.
Text Recognition Projects. As a matter of fact, relying on crowd-based
solutions is not unusual in text recognition projects. In this area general purpose
crowdsourcing platforms may not be exible enough, therefore the most common
approach involves building specialized applications following the entire lifecycle
of each task. Project Transcribe Bentham [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], for instance, aims at crowdsourcing
the entire transcription of Jeremy Bentham's unpublished works. The writing is
in modern English and relatively easy to read, so they ask their workers to
transcribe whole paragraphs or even pages. Project Read [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], on the other hand, have
developed a mobile application in order to expand the set of potential volunteers;
in each task the worker is asked to read aloud a handwritten text line, thus
relying on speech dictation. Project Monk [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], nally, focuses on word search in
handwritten manuscripts; they typically ask their workers to transcribe entire
words, and use the resulting labeled images to train automatic systems for word
spotting and other word-level operations.
      </p>
      <p>Our work is fundamentally di erent from these projects as the tasks we
propose to workers do not require any transcription skills: each task only requires
to identify speci c symbols or characters inside the image of a word and thus it
can be solved by non-experts such as high-school students.</p>
      <p>
        KG Generation Projects. Many open KGs nowadays are built in a
collaborative fashion, meaning that they rely on contributions to add missing information
or correct wrong pieces of the current contents. This collaborative e ort can be
seen as a very broad form of crowdsourcing, in which any user can temporarily
become a volunteer worker. In this setting each task has a very loose
formulation, with the worker herself choosing the entities, relations or facts to add or
update. Examples of KGs built this way include FreeBase [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and Wikidata [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>While these approaches are viable for general purpose KGs, they do not work
well when handling vertical topics that the workers may not have knowledge of,
or when facts must actually be extracted from text. This has led researchers to
devise more structured approaches for these scenarios.</p>
      <p>
        In a completely di erent domain (drugs and their side e ects), the work in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
has a similar objective to ours, as they aim at building a KG from a collection
of articles (from the PubMed archive) by leveraging a crowsourced dataset of
annotations: in each task workers receive a sentence and a relation, and are
asked whether the sentence actually conveys that relation or not. A similar
approach is followed by [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], whose goal is to build a scienti c KG by extracting
entities, relations and facts from unstructured texts of research publications.
Their approach is to perform fact extraction, integration, and analysis in a
semisuperivised way. On the one hand, automatic tools are employed in each step;
on the other, users are involved in all activities through visual interfaces that
allow them to perform quality control, data enrichment and discovery.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper we have described how crowdsourcing methodologies are allowing
us to tackle the extraction of knowledge from Medieval handwritten manuscripts
in our In Codice Ratio project.</p>
      <p>We have shown that the lack of datasets suitable for our scenario a ects
both the transcription step and the text mining step. We have thus discussed
how a pool of not expert workers can be put in condition to perform very simple
tasks, yielding enough labeled data to train e ective automatic systems. We have
described in detail our strategies to assign tasks to speci c workers, including the
policies we employ to increase robustness to potential mistakes made by workers.
We have reported our very promising results in character-level transcription, as
well as our plans for the oncoming KG construction phase.</p>
      <p>Among future work, we plan to inject data provenance methods in the whole
process of knowledge extraction from ancient manuscripts, with the goal of
improving the understanding of the results and simplifying the ability to trace
errors back to the root cause.
Work funded by Regione Lazio LR 13/08 Project \In Codice Ratio" (14832).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ammirati</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Firmani</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maiorino</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merialdo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nieddu</surname>
          </string-name>
          , E.:
          <article-title>In codice ratio: Machine transcription of medieval manuscripts</article-title>
          .
          <source>In: Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, IRCDL</source>
          <year>2019</year>
          , Pisa, Italy,
          <source>January 31 - February 1</source>
          ,
          <year>2019</year>
          , Proceedings. pp.
          <volume>185</volume>
          {
          <issue>192</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ammirati</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Firmani</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maiorino</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merialdo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nieddu</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>In codice ratio: Scalable transcription of historical handwritten documents</article-title>
          .
          <source>In: SEBD</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bollacker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paritosh</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sturge</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , J.:
          <article-title>Freebase: a collaboratively created graph database for structuring human knowledge</article-title>
          .
          <source>In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bravo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>A.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Good</surname>
            ,
            <given-names>B.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furlong</surname>
            ,
            <given-names>L.I.:</given-names>
          </string-name>
          <article-title>Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text</article-title>
          .
          <source>Database</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Causer</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tonra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Transcription maximized; expense minimized? crowdsourcing and editing the collected works of jeremy bentham</article-title>
          .
          <source>Literary and Linguistic Computing</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Firmani</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maiorino</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merialdo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nieddu</surname>
          </string-name>
          , E.:
          <article-title>Towards knowledge discovery from the vatican secret archives. in codice ratio - episode 1: Machine transcription of the manuscripts</article-title>
          .
          <source>In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining. KDD '18</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Firmani</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merialdo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maiorino</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>In codice ratio: Scalable transcription of vatican registers</article-title>
          .
          <source>ERCIM News</source>
          (
          <volume>111</volume>
          ) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Firmani</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merialdo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nieddu</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scardapane</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>In codice ratio: Ocr of handwritten latin documents using deep convolutional networks</article-title>
          .
          <source>In: AI* CH@ AI* IA</source>
          . pp.
          <volume>9</volume>
          {
          <issue>16</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Granell</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            nez-Hinarejos,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Multimodal crowdsourcing for transcribing handwritten documents</article-title>
          .
          <source>IEEE/ACM Transactions on Audio, Speech, and Language Processing</source>
          <volume>25</volume>
          (
          <issue>2</issue>
          ),
          <volume>409</volume>
          {419 (Feb
          <year>2017</year>
          ). https://doi.org/10.1109/TASLP.
          <year>2016</year>
          .2634123
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Firmani</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matinata</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merialdo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barbosa</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Knowledge graph embedding for link prediction: A comparative analysis</article-title>
          .
          <source>CoRR</source>
          (
          <year>2020</year>
          ), https://arxiv.org/abs/
          <year>2002</year>
          .00819
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sarma</surname>
            ,
            <given-names>A.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parameswaran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Crowd-powered nd algorithms</article-title>
          .
          <source>In: 2014 IEEE 30th International Conference on Data Engineering</source>
          . pp.
          <volume>964</volume>
          {
          <fpage>975</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Seifert</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Granitzer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Ho er,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Mutlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Sabol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Schlegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Bayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Stegmaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Zwicklbauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kern</surname>
          </string-name>
          , R.:
          <article-title>Crowdsourcing fact extraction from scienti c literature (</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Vrandecic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Krotzsch, M.:
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          .
          <source>Communications of the ACM</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ameryan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolstencroft</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stork</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heerlien</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Towards a digital infrastructure for illustrated handwritten archives</article-title>
          .
          <source>In: Digital Cultural Heritage</source>
          , pp.
          <volume>155</volume>
          {
          <fpage>166</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>