<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Linked Open Data to Bootstrap a Knowledge Base of Classical Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matteo Romanello</string-name>
          <email>matteo.romanello@epfl.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Pasin</string-name>
          <email>michele.pasin@springernature.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ecole Polytechnique Federale de Lausanne</institution>
          ,
          <addr-line>Route Cantonale, 1015 Lausanne</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Springer Nature</institution>
          ,
          <addr-line>The Campus, 4 Crinan Street, London N1 9XW</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>3</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>We describe a domain-speci c knowledge base aimed at supporting the extraction of bibliographic references in the domain of Classics. In particular, we deal with references to canonical works of the Greek and Latin literature by providing a model that represents key aspects of this domain such as names and abbreviations of authors, the canonical structure of classical works, and links to related web resources. Finally, we show how the availability of linked data in the emerging Graph of Ancient World Data has helped bootstrapping the creation of our knowledge base.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Knowledge bases are essential resources for many Natural Language Processing
tasks { such as for example the disambiguation of named entities or of word
senses { as they provide algorithms with some surrogate of the knowledge needed
to handle and capture certain aspects of our natural language.</p>
      <p>The resource discussed in this paper is a domain-speci c knowledge base
aimed at supporting the extraction of bibliographic references in the domain of
Classics. In particular, we deal with references to canonical works of the Greek
and Latin literature, one out of many kinds of references to be found within
publications in this eld (e.g. references to fragmentary texts, inscriptions, papyri,
manuscripts, coins and museum objects). One peculiarity of canonical references
is that, by de nition, they transcend (i.e. abstract from) speci c editions or
translations of a text.</p>
      <p>This knowledge base contains various types of information that are needed
to extract and disambiguate canonical references, such as:</p>
      <sec id="sec-1-1">
        <title>1. names (and abbreviations) of ancient authors;</title>
        <p>2. titles (and abbreviations) of ancient works;
3. unique identi ers of authors, works and citable passages of these works;
4. links to the Wikipedia pages of ancient authors;
5. information about the canonical citation structure of ancient works.</p>
        <p>Although there exist several online resources from which this sort of
information can be gathered { such as the Perseus Catalog3 and the Classical Works
Knowledge Base (CWKB)4 { our knowledge base makes up for the lack of a
single resource to support this information extraction task, that is suitable for use
in NLP applications as well as to publish the extracted references using Semantic
Web standards.</p>
        <p>The creation of our knowledge was informed by the following principles:
1. it should be based on interoperable standards so as to increase the chances
of being reused in other contexts;
2. it should be easy to use programmatically;
3. it should be linked as much as possible to other available resources as they
provide complementary information about other facets of the data;
4. it should be easy to edit, maintain and update in the future.</p>
        <p>
          Devising a technical solution that ful lls all of these principles is not entirely
trivial as some of these principles may seem to contradict each other (e.g. \based
on interoperable standards" and \easy to use programmatically"). In this paper
we present our proposed solution, describe how it was implemented and explain
how it is used in practice.5 We also show how the availability of linked data in
the emerging Graph of Ancient World Data [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] has helped bootstrapping the
creation of our knowlegde base.
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Motivation and Background</title>
      <p>
        Much of the Graph of Ancient World Data (GAWD) is emerging as a community
of practice has developed that values the use of shared controlled vocabularies
based on URIs to refer to `things' [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>As emerges also from the GAWD cloud diagram6, the Pleiades gazetteer
has played a key role in the growth of LOD for the ancient world, with Pelagios
building upon it to connect even more resources. In its current phase Pelagios has
become de facto a LOD-facilitator: its community platform, Pelagios Commons,
is the go-to place for those who are keen to apply Pelagios' model/philosophy to
other areas of the study of the ancient world.</p>
      <p>After geographical data, something similar is taking place with regards to
the recently developed time-gazetters PeriodO and Chronontology, which have
started to enable the interlinking of datasets based on shared references to time
periods7.
3 Perseus Catalog, http://catalog.perseus.org.
4 Classical Works Knowledge Base, http://www.cwkb.org.
5 The HuCit Knowledge Base can be explored via a Linked Open Data (LOD)
frontend available at purl.org/hucit/kb/.
6 Graph of Ancient World Data by Regis Robineau (19/06/2012), http://bsa.</p>
      <p>biblio.univ-lille3.fr/doc/gawd/gawd.html.
7 Pelagios commons { Time Working group, http://commons.pelagios.org/groups/
time-events-working-group/.</p>
      <p>
        In addition to space and time, references to ancient texts are undoubtedly
another dimension that could be leveraged to expand the GAWD cloud, as the cited
primary sources are often an area where existing datasets do overlap. However,
what is still missing to realise this potential are resolvable URIs for all citable
sections of canonical texts. We do have { thanks to the CTS protocol developed
for the Homer Multitext project { a scheme of unique identi ers that can be used
to identify those citable units of texts, i.e. the CTS Uniform Resources Names
(URNs). This protocol was implemented also by the Perseus catalog, meaning
that each canonical author and work can now be looked up by its CTS URN in
the catalog.[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
      </p>
      <p>
        The only thing that is currently missing { which is one of the aims of our
knowledge base { is to have fully resolvable URIs for all citable passages of
classical texts, linked to other resources like Perseus (catalog and library) and
CWKB. One of the advantages of having such URIs in place is that we will
then be able to use them in combination with ontologies like the FRBR-aligned
Bibliographic Ontology (FaBiO) and the Citation Typing Ontology (CiTO) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
for publishing citation data on the Semantic Web.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Uses of the Knowledge Base</title>
      <p>
        The knowledge base is one of four components of the system described in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to
extract automatically canonical references from text (see Fig. 1). The other
components are a Citation Extractor (2) that takes care of identifying the citation
components within the stream of text; a Citation Matcher (2) that attempts to
disambiguate the cited ancient work against the knowledge base and, in turn,
relies on a Citation Parser (4) to normalise the reference scope into a format
that can be readily embedded into a CTS URN.8
      </p>
      <p>The extraction and disambiguation of canonical references was modelled as
a three-step process consisting of the following steps:
1. Extraction of named entities: a) names of ancient authors (e.g. Virgilio);
b) titles of works (e.g. Aeneid) and c) references to speci c text passages (e.g.</p>
      <p>Virg., Aen. 12.10 f.).</p>
      <sec id="sec-3-1">
        <title>2. Detection of relations between entities: since a reference is represented</title>
        <p>as a relation between two entities (i.e. the author name/work title and the
reference scope), the canonical references are reconstructed from the entities
found in the text. For example, the reference \12.10 f." is expressed as a
relation between the entity identifying the cited text (in this case \Virg.
Aen.") and the entity indicating the citation scope (\12.10 f."), namely the
precise text passage being cited.
8 Knowledge Base, Citation Extractor and CitationParser are openly
available as Python libraries respectively at https://github.com/mromanello/
CitationExtractor, https://github.com/mromanello/CitationParser and
https://github.com/mromanello/hucit_kb (the CitationMatcher component is
part of the CitationExtractor).
3. Disambiguation of named entities and relations: determining which
authors, works and passages are referred to in the text is done by assigning
a unique identi er to each entity and relation. The reference in the example
above, for instance, will be assigned the URN
\urn:cts:latinLit:phi0690.phi003:12.1012.11". This identi er is built by concatenating the URN for the cited work
(urn:cts:latinLit:phi0690.phi003 for Virgil's Aeneid) with a normalised value
representing the cited passage (12.10-12.11 which stands for book 12, lines
10 and 11).
The rst use of the knowledge base { and perhaps the most important at least
from an end-user perspective { is the linking of extracted references with their
corresponding full text passage. Since canonical references by de nition
transcend (i.e. abstract from) speci c editions or translations of the text, the linking
of a reference to its full text needs to enable the reader to select the very edition
or translation she is after from those available online. To this end, we rely on
two external services: the Perseus Digital Library for openly available editions
and translations, and CWKB for texts whose access requires an institutional
subscription (e.g. the Thesaurus Linguae Graecae). In particular, the latter
provides the ability of resolving links in a context-aware fashion: if the user selects
to read the TLG text of a passage and her institution has access to it, the service
will redirect the browser directly to the full-text.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Generation of Dictionaries</title>
        <p>The second use case is the generation of dictionaries { i.e. lists { of names, titles
and respective abbreviations that are used at various stages of the extraction of
canonical references. The dictionaries of abbreviations are employed in the
process of splitting texts up into sentences and then into tokens. Their use allows us
to prevent some errors that are commonly caused by the presence of punctuation
within abbreviations. Such dictionaries are of particular importance for the
extraction of information { i.e. author names, work titles and canonical references
{ from texts written in several European languages as they enable the citation
extraction system to relate di erent spelling variants to the same entity.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Disambiguation of References</title>
        <p>
          Information contained in the Wikipedia page of a given ancient author can be
used to help the automatic disambiguation of canonical references, a technique
that is used in almost any Named Entity Disambiguation system (cfr. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]). The
rationale behind this is that the words and entities contained in the Wikipedia
page of a given author have some overlap with the context where a reference
to that author appears. Typically, this information is leveraged by computing
a similarity score between the document where the reference is found and the
Wikipedia page of every disambiguation candidate, usually extracted from a
knowledge base by using some heuristics. This score is then used in combination
with other features to establish the ranking of the disambiguation candidates,
whose aim is to rank the correct entity as rst.
4
4.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Knowledge Base Implementation</title>
      <sec id="sec-4-1">
        <title>Data Model</title>
        <p>The rationale for developing the knowledge base's data model was to re-use as
much as possible already existing and widely adopted ontologies, and to extend
them by means of new classes and properties only when absolutely necessary.</p>
        <p>The rst two ontologies that form the backbone of the HuCit knowledge base
are CIDOC-CRM and FRBRoo.9 The CIDOC-CRM is a conceptual model that
was born as a metadata standard for the archive and museum world, and proved
to be suitable to represent information in many di erent domains. The subset
of CIDOC-CRM classes and properties used by the knowledge base is limited
9 See respectively http://www.cidoc-crm.org/ and http://www.cidoc-crm.org/
frbroo/.
1 @prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
2 @prefix hucit: &lt;http://purl.org/net/hucit#&gt; .
3 @prefix ecrm: &lt;http://erlangen-crm.org/current/&gt; .
4 @prefix rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt; .
5 @prefix efrbroo: &lt;http://erlangen-crm.org/efrbroo/&gt; .
6 @prefix owl: &lt;http://www.w3.org/2002/07/owl#&gt; .
9
10
11
7
8 &lt;http://purl.org/hucit/kb/authors/678&gt;
a efrbroo:F10_Person ;
ecrm:P1_is_identified_by
,! &lt;http://purl.org/net/hucit-kb/authors/678#cts_urn&gt;,
,! &lt;http://purl.org/net/hucit-kb/authors/678#name&gt; ;
owl:sameAs
,! &lt;http://data.perseus.org/catalog/urn:cts:latinLit:phi0690/&gt;,
,! &lt;http://cwkb.org/author/id/678/&gt;,
,! &lt;http://viaf.org/viaf/8194433&gt; .
12
13 &lt;http://purl.org/hucit/kb/authors/678#name&gt;
14 a efrbroo:F12_Name ;
15 ecrm:P139_has_alternative_form</p>
        <p>,! &lt;http://purl.org/hucit/kb/authors/678#abbr&gt; ;
16 rdfs:label "P. Vergilius Maro"@la, "P. Virgilius Maro"@la, "Publio
,! Virgilio Marone"@it, "Publio Virgilio Maron"@es, "Publius
,! Vergilius Maro"@la, "Publius Virgilius Maro"@la, "Vergil",
,! "Virgil"@en, "Virgile"@fr .
17
18 &lt;http://purl.org/hucit/kb/authors/678#abbr&gt;
ecrm:P2_has_type &lt;http://purl.org/hucit/kb/types/abbreviation&gt; ;
a ecrm:E41_Appellation ;
rdfs:label "Verg.", "Virg." .
19
20
21
22
23 &lt;http://purl.org/hucit/kb/authors/678#cts_urn&gt;
24 ecrm:P2_has_type &lt;http://purl.org/hucit/kb/types/CTS_URN&gt; ;
25 a ecrm:E42_Identifier ;
26 rdfs:label "urn:cts:latinLit:phi0690" .
to those that represent things like names, titles, abbreviations and for ancient
authors and works (for an example, see Fig. 2). It is worth noting, however,
that we try as much as possible to harmonise our use of CIDOC-CRM with the
adoption of other essential standards, like the CTS protocol, that exist outside
of the CRM world. For instance, we make extensive use of CTS URNs, which
are declared as instances of CIDOC-CRM's E42 Identifier having a speci c
E55 Type.</p>
        <p>
          FRBRoo is an implementation of the FRBR model, aligned with the
CIDOCCRM [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The FRBR model has been widely adopted in the eld of Digital
Classics as its hierarchy is suitable to describe the kind of bibliographic information
scholars in this eld deal with [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The FRBRoo classes used by our knowledge
base are those concerned with the representation of authorship, namely the fact
that a given conceptual work was created by someone at a certain point in time
(e.g. Ovid's creation of the Metamorphoses ).
        </p>
        <p>
          The third and last ontology involved is the Humanities Citation Ontology
(HuCit). This ontology was developed as a lightweight extension of CIDOC-CRM
and FRBRoo aimed speci cally at formalising the canonical text structures that
are used to cite classical texts (see [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]; [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], pp. 85-100). This ontology allows us
to instantiate any single citable unit of a canonical text (e.g. all lines in all books
of Homer's Iliad ), an ability of essential importance when representing canonical
citations. Fig. 3 shows how the canonical text structure of Virgil's Aeneid can
be modelled, and how it relates to a canonical citation.
Our knowledge base is populated from (and linked to) three main datasources:
        </p>
        <sec id="sec-4-1-1">
          <title>1. the Classical Works Knowledge Base (CWKB); 2. the Perseus Digital Library and Catalog; 3. Wikidata.</title>
          <p>Firstly, the CWKB was used to perform the initial import of ancient authors
and works into the knowledge base. Since CWKB has recently begun to
publish its data as LOD, it was possible to harvest programmatically its content.
Each author and work in our resources is aligned to CWKB by means of an
owl:sameAs property; by doing so, it will be possible to bene t from new data
that will be added to this resource in the future. Thanks to the CWKB it was
possible to add to the knowledge base a substantial amount of authority data
(see Table 1), to which others were added at later stages. The distribution of
author names and work titles is rather uneven, as the high variance of their
distribution reported in table 1 con rms; alongside authors and works with only
one name/title variant each, there are others with a much higher number of
lexical information attached.</p>
          <p>Author names
Author abbreviations
Work titles
Work abbreviations</p>
          <p>Total Min Max Mean Variance
4842 1 27 3.13 9.81
774 0 2 0.50 0.26
10354 1 31 1.99 6.42
2377 0 3 0.46 0.57</p>
          <p>Secondly, the Perseus Digital Library was used to import information
concerning the canonical text structures according to which ancient works are cited
in the scholarship (e.g. Homer's Iliad division into books and lines). CWKB
records are linked to Perseus in two ways: rst, by means of owl:sameAs links
pointing to author or work records in the Catalog; and, second, by means of a
dcterms:identifier (in the Dublin Core vocabulary) recording the CTS URN
of an author/work (e.g. urn:cts:greekLit:tlg0012.tlg001 for Homer's Iliad ).
It is worth noting that, currently, links to Perseus are available only for a subset
of the CWKB records, and thus of our knowledge base. Increasing this coverage
as much as possible is one of the goals of our project for the next future.</p>
          <p>Since the canonical divisions of texts are encoded as markup elements within
the digital editions and translations contained in Perseus, it is possible to leverage
such information in order to instantiate the relevant HuCit classes (i.e.
TextStructure and TextElement). This operation can be fully automated given that the
content in Perseus is accessible programmatically by using its CTS API. The
process involves gathering two pieces of information for each work contained
in the knowledge base and in Perseus: rst, information about the hierarchical
structure of the canonical text divisions (e.g. the book/line structure of the
Iliad ); second, a list of all the citable elements that make up such a structure
(e.g. a list of all the books and lines in the Iliad ). Once all citable elements of a
text have been instantiated and imported into the knowledge base, they can be
used e.g. as the subject or the object of RDF statements.</p>
          <p>Thirdly, we have been adding { whenever possible { owl:sameAs links
pointing to VIAF and Wikidata records of ancient authors. We started with those
authors that are linked to the Perseus Catalog and for which the Catalog
provides a VIAF identi er. The main reason for doing this is that is to help the
disambiguation of references, as described above. By following the chain of links
from Perseus to VIAF and then to Wikidata, we were able to query Wikidata's
SPARQL endpoint to get the links to the Wikipedia pages in the languages we
are interested in (French, German, English, Italian and Spanish).</p>
          <p>Finally, the knowledge base is being constantly updated with new variant
forms and abbreviations as they are encountered while extracting canonical
references from text corpora like JSTOR or L'Anneee Philologique.
4.3</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Interfaces</title>
        <p>Although the choice of CIDOC-CRM and FRBRoo for the data model certainly
makes our data more interoperable and increases the chances of them being
reused in other contexts, it imposes certain a ordances when it comes to use the
knowledge base from within an NLP application.</p>
        <p>An apt example of these a ordances is provided by authorship, a concept
that CIDOC-CRM treats as an event: an author is the subject involved in the
event that leads to the creation of a given work. As a result, retrieving the works
by a certain author implies a) nding all creation events in which this author
was involved and b) nding all the works that were created in such events. Since
the knowledge base is stored in a triple store, this and many other queries will
have to be written in SPARQL, leading the application to grow in complexity.</p>
        <p>In order to keep the knowledge base as much as possible easy to use
programmatically, without having to give up the advantages of CIDOC-CRM
mentioned above, we used a Python Object RDF Mapper library called SuRF 10.
SuRF works similarly to an Object-relation Mapper with the di erence that,
instead of mapping a relation database to instances of Python objects, it maps
a triple store to such objects. This allows us to interact programmatically with
the knowledge base (see Fig. 4), and to hide away certain complexities of the
underlying data model.</p>
        <p>The following interfaces are currently available:
{ a SPARQL endpoint to a Virtuoso triple store;
{ a LOD interface, provided by the package Pubby, which makes the URIs of
resources contained in the triple store resolvable to various formats (HTML,
RDF/XML, RDF/Turtle);
{ a Command Line Interface (CLI), aimed at easing the task of adding new
information to the knowledge base.11
10 SuRF { Object RDF Mapper, http://pythonhosted.org/SuRF/.
11 All code and data for the knowledge base can be found at https://github.com/
mromanello/hucit_kb.
In this paper we have presented a domain-speci c knowledge base aimed at
supporting the extraction of bibliographic references in the domain of Classics.
In addition to lexical information about ancient authors and works (variants
spellings, abbreviations), this knowledge base will contain a record for any citable
passage of canonical texts, thus making it possible to use it also in order to
publish the extracted citation data by means of existing ontologies such as CiTO.</p>
        <p>Since the main intended use of the knowledge base is within NLP
applications, we developed a solution that neatly separates the data model { i.e. the
ontologies used to represent the data { from the code library used to access and
query the knowledge base. Such a solution has the advantage of hiding the
complexities of the data model when accessing the contents of the knowledge base.
In our speci c case, this allowed us to build upon the CIDOC-CRM to model
the data, while hiding its complexity { or even just speci c design patterns it
enforces { at the level of the code interfaces.</p>
        <p>Furthermore, we have shown how the availability of LOD about classical
texts, part of the so-called Graph of Ancient World Data, has enabled us to
bootstrap the creation of our knowledge base. Existing links between CWKB
and Perseus, as well as between Perseus and VIAF, greatly eased our task of
populating the knowledge base for a limited number of ancient authors and works.
At the same time, the external resources we are linking to will potentially be
able to aggregate information from our knowledge base. For example, by using
the VIAF URI for Homer, an external service could derive a list of
publications where Homeric works are cited, simply by following chains of owl:sameAs
relations.</p>
        <p>Further developments of the knowledge base in the near future will be aimed
at increasing its coverage both in breadth and depth. While the number of
classical authors and works does not grow, we aim to add links to VIAF, Wikidata
and Perseus for as many entries as possible, and to continue enrich the
knowledge base with lexical information. Also, a web user interface is planned so as to
make it easier to engage a wider community of users in growing collaboratively
the knowledge base.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Babeu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bamman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crane</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kummer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weaver</surname>
          </string-name>
          , G.:
          <article-title>Named Entity Identi cation and Cyberinfrastructure</article-title>
          . In: Kovacs,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Fuhr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Meghini</surname>
          </string-name>
          , C. (eds.) Research and
          <article-title>Advanced Technology for Digital Libraries</article-title>
          , pp.
          <volume>259</volume>
          {
          <fpage>270</fpage>
          . Springer (
          <year>2007</year>
          ), http://dx.doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -74851-9_
          <fpage>22</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Crane</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almas</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Babeu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cerrato</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krohn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baumgart</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franzini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanova</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Cataloging for a Billion Word Library of Greek and Latin</article-title>
          .
          <source>In: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage</source>
          . pp.
          <volume>83</volume>
          {
          <fpage>88</fpage>
          . DATeCH '14,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2014</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/2595188.2595190
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Elliott</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muccigrosso</surname>
          </string-name>
          , J.:
          <article-title>Prologue and Introduction</article-title>
          .
          <source>ISAW Papers</source>
          <volume>7</volume>
          (
          <issue>1</issue>
          ) (
          <year>2014</year>
          ), http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/ elliott-heath-muccigrosso/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Isaksen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simon</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barker</surname>
          </string-name>
          , E.T., de Soto Can~amares, P.:
          <article-title>Pelagios and the emerging graph of ancient world data</article-title>
          .
          <source>In: Proceedings of the 2014 ACM conference on Web science - WebSci '14</source>
          . pp.
          <volume>197</volume>
          {
          <fpage>201</fpage>
          . ACM Press, New York, New York, USA (
          <year>2014</year>
          ), http://dl.acm.org/citation.cfm?doid=
          <volume>2615569</volume>
          .
          <fpage>2615693</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Le</given-names>
            <surname>Boeuf</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>A Strange Model Named FRBROO</article-title>
          .
          <source>Cataloging &amp; Classi cation Quarterly</source>
          <volume>50</volume>
          (
          <issue>5-7</issue>
          ),
          <volume>422</volume>
          {
          <fpage>438</fpage>
          (
          <year>2012</year>
          ), http://dx.doi.org/10.1080/01639374.
          <year>2012</year>
          . 679222
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Peroni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shotton</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>FaBiO and CiTO: Ontologies for describing bibliographic resources and citations</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>17</volume>
          ,
          <issue>33</issue>
          {
          <fpage>43</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Romanello</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>From Index Locorum to Citation Network: an Approach to the Automatic Extraction of Canonical References and its Applications to the Study of Classical Texts</article-title>
          .
          <source>Ph.D. thesis</source>
          , King's College London (
          <year>2015</year>
          ), http://hdl.handle. net/11858/00-1780-
          <fpage>0000</fpage>
          -002A-4537-A
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Romanello</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Citations and Annotations in Classics : Old Problems and New Perspectives</article-title>
          .
          <source>In: DH-CASE '13 Proceedings of the 1st International Workshop</source>
          on Collaborative Annotations in Shared Environment:
          <article-title>metadata, vocabularies and techniques in the Digital Humanities</article-title>
          . ACM, New York, NY, USA (
          <year>2013</year>
          ), http: //dx.doi.org/10.1145/2517978.2517981
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J</given-names>
            ., Han, J
          </string-name>
          .:
          <article-title>Entity linking with a knowledge base: Issues, techniques, and solutions</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>27</volume>
          (
          <issue>2</issue>
          ),
          <volume>443</volume>
          {460 (feb
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>