Describing bibliographic references in RDF

          Angelo Di Iorio1 , Andrea Giovanni Nuzzolese1,2 , Silvio Peroni1,2 ,
                         David Shotton3 , and Fabio Vitali1
 1
     Department of Computer Science and Engineering, University of Bologna (Italy)
              2
                  STLab-ISTC, Consiglio Nazionale delle Ricerche (Italy)
                3
                   Oxford e-Research Centre, University of Oxford (UK)
    angelo.diiorio@unibo.it, nuzzoles@cs.unibo.it, silvio.peroni@unibo.it,
                  david.shotton@oerc.ox.ac.uk, fabio@cs.unibo.it


         Abstract. In this paper we present two ontologies, i.e., BiRO and C4O,
         that allow users to describe bibliographic references in an accurate way,
         and we introduce REnhancer, a proof-of-concept implementation of a
         converter that takes as input a raw-text list of references and produces
         an RDF dataset according to the BiRO and C4O ontologies.

         Keywords: BiRO, C4O, REnhancer, SPAR, Semantic Publishing, bib-
         liographic references, citation network


1      Introduction
Within the scholarly domain, the Semantic Publishing is the use of Semantic Web
technologies to enhance published scholarly articles. In the seminal paper [17],
Shotton suggests that the road to the Semantic Publishing will proceed through
incremental steps, starting simply and then improving the whole functionality
of the system. The (semantic) management of bibliographic references is one of
the areas where such an incremental approach is most feasible and effective.
    In [15] we presented a “manifesto” on liberating bibliographic references. The
bibliographic references are core elements of scholarly communication – since
they permit the attribution of credit and integrate our independent research
endeavours – and must be freely available for use by scholars. Citation data
“should be recognised as a part of the Commons” [15] – i.e., data that are freely
and legally available for sharing – and placed in an open repository, stored in
appropriate machine-readable formats, so as to be easily reused by machines to
assist people in producing novel services.
    Creating such open repositories of bibliographic references is not an easy task.
First of all, one should create an appropriate (and unique) description of each ref-
erenced document, probably starting from several bibliographic references that
are written according to different schemas4 . Also, bibliographic references are
difficult to normalise as they could contain typos on all elements of the refer-
ence (article title, authors’ information, etc.). The format these references are
4
     A large repository of over 6000 styles for references is available (in CSL 1.0 format)
     at https://www.zotero.org/styles.
2       Di Iorio et al.

exported is also an issue. The bibliographic entries can already be exported as
textual content or as record-like structures, but only a few repositories export
their data using Semantic Web technologies, such as Nature with its Linked Data
Platform5 and the Open Citation Corpus6 [18].
    Thus, ontologies to model references and reference lists are needed. It is cru-
cial that these ontologies cover all traits of bibliographic references and handle
them appropriately. For instance, a critical point is the distinction between the
references in the reference lists and the actual cited articles, as well as the dis-
tinction between the in-line pointers to the reference and the actual reference
(which, in turn, is different from the cited article).
    In this paper we present two ontologies – i.e., the Bibliographic Reference
Ontology (BiRO) and the Context Characterisation and Citation Counting On-
tology (C4O) – addressing these issues. They are two of the Semantic Publishing
And Referencing (SPAR) Ontologies7 , a suite of ontologies that describe the dif-
ferent aspects of the scholarly publishing domain. In particular, BiRO and C4O
have been developed for describing bibliographic lists, bibliographic references,
in-text reference pointers, citation contexts and a mechanism for counting cita-
tions locally (within an article) or globally (by means of particular platforms).
    The definition of ontologies for bibliographic resources is not enough, if it is
not combined with tools for populating these ontologies (cf. [15]). In order for
more actors to fully participate to the Semantic Publishing evolution and, in par-
ticular, to allow authors and readers to give a strong contribution (“[publishers,]
expect greater things of your author” [17]), we implemented a proof-of-concept
tool that helps users to convert textual lists of references into RDF descrip-
tions, compliant with BiRO and C4O. The prototype is named REnhancer and
is briefly presented at the end of the paper.
    The rest of the paper is structured as follows. In Section 2 we provide some
clarifications about the nomenclature related to bibliographic references, as well
as some related works on this topic. In Section 3 and Section 4 we introduce
BiRO and C4O respectively, and we show how they can be used to describe bib-
liographic references and related objects. In Section 5 we introduce REnhancer
and how to use its Web interface and its REST API. We conclude the paper in
Section 6 sketching out our future works on this topic.


2   What is a citation really?
The word “citation” is often subject of misinterpretations and misuse. The reason
being that the word can be used to identify objects which have different purposes,
at least in scientific literature. For instance, we often identify as “citation” (a)
the act of citing another work, (b) a bibliographic reference put at the end of a
paper (usually in a list), as well as (c) particular pointers (e.g., “[3]”) denoting
that bibliographic reference.
5
  Nature.com Linked Data: http://data.nature.com.
6
  Open Citation Corpus: http://opencitations.net.
7
  Semantic Publishing And Referencing (SPAR) Ontologies: http://purl.org/spar.
                                   Describing bibliographic references in RDF           3


Fig. 1. An excerpt of [12] highlighting the different roles of the various parts of text.


    In order to expand more on this topic, let us consider the excerpt from the
article “Intertextual semantics: a semantics for information design” [12] shown
in Fig. 1. That excerpt contains a particular sentence from the section “Related
Works” of the paper and a list item from the final “References” section. In
[15], we identified three different kinds of objects that are used to express any
citation (i.e., the attribution link between a citing work and the cited work) that
are relevant for this work, all of them having different purposes:
 – bibliographic reference, i.e., the textual entity within a citing work that ref-
   erences a cited work. A bibliographic reference is the realisation of some
   bibliographic record for the cited work, arranged in a specific format deter-
   mined by the house style of the citing publication;
 – in-text reference pointer, i.e., the entity present in the body text of a citing
   work that denotes a particular bibliographic reference in the reference list
   or a footnote. In scientific literature, this in-text reference pointer can be
   presented in different forms;
 – citation context, i.e., the textual content of that component of the published
   paper (e.g., sentence, paragraph) within which an in-text reference pointer
   appears, which provides the rhetorical rationale for the existence of that
   citation. In can includes the sentence where it appears (i.e., citation sentence
   [2]), a sequence of consecutive sentences referring (explicitly or implicitly)
   to the same cited work (i.e., context window [16]), and the main structure
   containing the in-text reference pointer (e.g., the paragraph).
    However, to our knowledge, there are no ontologies that have been devel-
oped to model all these objects, since most of them usually focus on describing
the whole citing/cited entity. In the following we will briefly describe the most
relevant ontologies in this context and their capabilities.
    Dublin Core Metadata Terms (DCTerms) [5] is, as far as we know, the most
widely used vocabulary for describing and cataloguing resources. While very
useful for the creation of basic metadata for resource discovery, the main lim-
itation of DCTerms is a direct consequence of the generic nature of its terms.
For example, using DCTerms one can identify a creator but not an author; a
4       Di Iorio et al.

bibliographic resource but not a journal article; an identifier but not an ISSN,
and a date but not a publication date. However, it makes available a particular
property, i.e., dcterms:bibliographicCitation, to define the textual string describ-
ing a reference, and another property, i.e., dcterms:references, to indicate that
an entity A cites/points another entity B.
    Similar to DCTerms, the RDF specification of the Publishing Requirements
for Industry Standard Metadata (PRISM) [10] has an extensive set of terms for
the description of bibliographic entities that is richer than DCTerms (its main
limitation is that it is a flat structure, lacking hierarchies). It makes available
the property prism:references (and its inverse prism:isReferencedBy) to define
citations between entities.
    The Bibliographic Ontology (BIBO) [8], is an OWL Full ontology that allows
one to write descriptions of documents (bibo:Document is the core class of that
model) for publication on the Semantic Web. It includes both DCTerms and
PRISM properties to cover common needs, and it adds other classes and proper-
ties to describe in more detail the publishing domain. In particular, it explicitly
defines the property bibo:cites to express citations between documents.
    Among the SPAR ontologies, FaBiO, CiTO [13] and DoCO [7] are ontologies
that make available a first infrastructure to organise the citation network be-
tween scholarly articles. In particular, FaBiO has a class, i.e., fabio:Bibliographic
Metadata, that enable the description of usual metadata associated to schol-
arly articles (e.g., authors’ names, title, journal name, DOI); CiTO is basi-
cally a list of properties defining citation acts between generic entities (e.g.,
cito:extends, cito:uses MethodIn, cito:disagreesWith); and, finally, DoCO makes
available classes defining document components for the characterisation of bib-
liographic references, i.e., deo:BibliographicReference and doco:BibliographicRefe
renceList.
    In conclusion, none of these ontologies provides all entities (classes and prop-
erties) useful to prevent or minimise ambiguities when modelling citing acts in
documents. In the next sections we will go into details of two ontologies that can
be combined to overcome these limitations: BiRO and C4O.


3    Describing bibliographic references: BiRO
The Bibliographic Reference Ontology8 (BiRO) describes reference lists and ref-
erences by using Semantic Web technologies. In particular, BiRO uses an OWL-
based definition of the FRBR model (prefix frbr)9 to define bibliographic ref-
erences and their compilation into ordered bibliographic lists, by means of the
Collections Ontology (prefix co)10 [3], as shown in Fig. 2.
   An individual bibliographic reference, such as one in the reference list of
a published journal article, may exhibit varying degrees of incompleteness, de-
pending on the formatting rules of the journal. For example, it may lack the title
8
   BiRO, the Bibliographic Reference Ontology: http://purl.org/spar/biro.
9
   FRBR ontology: http://purl.org/spar/frbr.
10
   CO, the Collections Ontology: http://purl.org/co.
                                            Describing bibliographic references in RDF                       5


Fig. 2. Graffoo diagram (http://www.essepuntato.it/graffoo) summarising the Biblio-
graphic Reference Ontology (BiRO).


of the cited article, the full names of the listed authors, or indeed a full listing
of the authors.
    BiRO provides a logical system for relating such incomplete bibliographic
reference to:
 – the full bibliographic record for that cited article, which, in addition to any
   author and title fields missing from the reference, may also be expected to
   include the name of the publisher, and the ISSN or ISBN of the publication;
 – collections of bibliographic records, such as library catalogues; and
 – ordered bibliographic lists, such as reference lists.
    In order to understand how to use BiRO to describe reference lists, let us
take into account again the reference introduced in Fig. 1 referring to Renear et
al.’s paper.
    A first way for defining a simple machine-readable representation of that
reference using BiRO with FRBR and DCTerms is as follows:
: intertextual - semantics frbr : part : reference - list .
: reference - list a biro : ReferenceList ;
   co : firstItem [ co : itemContent : barwise83 ;
   co : nextItem [ co : itemContent : black37 ; ...
   co : nextItem [ co : itemContent : renear02 ; ... ] ... ] ] . ...
: renear02 a biro : B i b l i o g r a p h i c R e f e r e n c e ;
   dcterms : b i b l i o g r a p h i c C i t a t i o n " Renear , A . , Dubin , D . & Sperberg - McQueen , C .
         M . (2002) . Towards a semantics for XML markup . In E . Mudson ( Chair ) ,
          Proceedings of the ACM Symposium on Document Engineering , ( pp . 119 -126)
         . New York : ACM Press ." . ...

    This formal description is not fully expressive as we only assigned an IRI to
the reference list and to each of its references, and the semantics of the string
representing the reference is still obscure. For instance, there is no explicit state-
ment saying that the strings “Renear, A.”, “2002” and “Towards a semantics
for XML markup” are, respectively, the name of one of the authors, the year of
publication and the title of the article. On the other hand, this is a first necessary
step to release bibliographic references in RDF. We also discuss below two main
approaches to associate a meaning to the strings composing the reference.
6       Di Iorio et al.

    Note also that if a complete RDF description of an article has already been
created, even according to a different ontological model, we can use the property
biro:references to create an explicit link between a reference citing that article
and the article itself, or better its description. The following excerpt, for instance,
shows how to say that the reference whose IRI is :renear02 references the article
whose IRI is :towards-a-semantics:
: renear02 biro : references : towards -a - semantics .


3.1   Semantic enhancement of strings: literal reification
A way to enable the semantic enhancement of strings, and to solve the above
mentioned limitations, is to use literals as subjects of assertions, by promoting
them as “first class object” in OWL. The pattern literal reification (prefix lit-
eral)11 [9] fulfils this scenario by reifying literals as proper individuals of the class
litre:Literal. Individuals of this class express literal values through the functional
data property litre:hasLiteralValue and can be connected to other individuals
that share the same literal value by using the property litre:hasSameLiteralValueAs.
Moreover, a literal may refer to, and may be referred by, any OWL individual
through litre:isLiteralOf and litre:hasLiteral respectively.
     This pattern allows one to describe each string of a bibliographic reference
as item of an ordered list of strings, by means of the Collections Ontology [3].
By means of this pattern and of the OWL 2 capabilities in meta-modelling, it
becomes possible to link specific strings in the references and to enhance them
through semantic assertions according to specific vocabularies, as shown in the
following excerpt:
: renear02 a biro : B i b l i o g r a p h i c R e f e r e n c e ;
   co : firstItem [ co : itemContent : first - author - name ; ...
   co : nextItem [ co : itemContent : publication - year ;
   co : nextItem [ co : itemContent : paper - title ; ... ] ] ] .
: first - author - name a literal : Literal , foaf : name ;
   literal : h a sL it er a lV al u e " Renear , A ."^^ xsd : string ;
   # it is the URL identifying the person referred by the above string
   literal : isLiteralOf : renear . ...

    As shown above, now the bibliographic reference under consideration is de-
scribed as a list of literals, each of them having a particular semantic connotation.

3.2   Semantic enhancement of strings: EARMARK ranges
Another approach to deal with the semantic enhancement of bibliographic refer-
ences is to use EARMARK12 ranges for associating appropriate semantic state-
ments to textual fragments, as illustrated in [14]. For instance, let us encode the
document cited in our example as an EARMARK document. We first need a
particular string container (called docuverse) defining the text of the reference:
11
   Literal reification pattern: http://www.essepuntato.it/2010/06/literalreification.
   The prefix literal refers to entities defined in it.
12
   EARMARK ontology: http://www.essepuntato.it/2008/12/earmark. The prefix ear-
   mark refers to entities defined in it.
                                          Describing bibliographic references in RDF           7

: renear02 - reference a earmark : St ri n gD oc uv e rs e ;
   earmark : hasContent " Renear , A . , Dubin , D . & Sperberg - McQueen , C . M . (2002) .
        Towards a semantics for XML markup . In E . Mudson ( Chair ) , Proceedings
        of the ACM Symposium on Document Engineering , ( pp . 119 -126) . New York :
        ACM Press ." .

    Then, we define ranges for each string we want to use in order to describe
the bibliographic reference according to BiRO. These ranges can be defined as
follows:
: renear02 a biro : B i b l i o g r a p h i c R e f e r e n c e ;
   co : firstItem [ co : itemContent : first - author - name ;
   co : nextItem [ co : itemContent : publication - year ;
   co : nextItem [ co : itemContent : paper - title ; ... ] ] ] ] .
: first - author - name a earmark : PointerRange ; # the string " Renear , A ."
   earmark : refersTo : renear02 - reference ;
   earmark : begins "0"^^ xsd : n o n N e g a t i v e I n t e g e r ;
   earmark : ends "9"^^ xsd : n o n N e g a t i v e I n t e g e r . ...

     Furthermore, using the Linguistic Act ontology [14], it is possible to link
EARMARK ranges to their formal meaning and to the particular object refer-
enced by such strings, as described in [1]. For instance, considering the range
:first-author-name, we can say that:

 – this range denotes a particular concrete object, i.e., a particular person iden-
   tified by :renear;
 – this range expresses a particular meaning, i.e., the fact that the string (as
   well as the denoted object) refers to something being an author of that paper.

    Thus, we can express these additional assertions:
: first - author - name la : denotes : renear ; la : expresses
   [ a owl : Restriction ;
     owl : onProperty pro : h o ld sR ol e In Ti m e ;
     owl : s omeValue sFrom [ owl : interse ctionOf (
        [ a owl : Restriction ; owl : onProperty pro : withRole ;
           owl : hasValue pro : author ]
        [ a owl : Restriction ; owl : onProperty pro : r e f e r s T o D o c u m e n t ;
           owl : hasValue : towards -a - semantics ] ) ] .

    In this way we are able to identify in RDF the various part that form the
reference and their specific meaning, and to link them to other entities.


4     What, where and how many times is cited: C4O

Besides defining reference lists and bibliographic references in a machine-readable
form, it is also useful to describe how these references are used in the citing paper.
In particular, we would need entities that describe:

 – in-text reference pointers within the citing paper;
 – links to the bibliographic references denoted by in-text reference pointers;
 – how much a particular document is locally cited by the citing document –
   i.e., the total number of in-text reference pointers within the citing paper
   denoting the same bibliographic reference (that is useful for certain studies
   on citations [11]);
8            Di Iorio et al.

    – how much an article is globally cited (according to particular bibliographic
      citation service, e.g., Google Scholar);
    – the contexts involved in a citation – i.e., the part Pciting of the citing article
      containing a particular in-text reference pointer and the part Pcited of the
      cited article that is relevant to Pciting (useful for some browsing tools of
      articles and citation services, e.g., CSIBS [19] and CiTalO [6]).
    The Citation Counting and Context Characterization Ontology13 (C4O) has
been developed to allow the description of the above entities. This ontology
enables the characterisation of bibliographic citations in terms of their presence
in an article by means of the following classes (shown in Fig. 3):
    – class c4o:InTextReferencePointer. An in-text reference pointer is a textual
      device denoting (property c4o:denotes) a single bibliographic reference that
      is embedded in the text of a document within the context of a particular
      sentence;
    – class c4o:InTextReferencePointerList. A list containing (through the chain
      co:item and co:itemContent) only in-text reference pointers denoting the spe-
      cific bibliographic references to which the list pertains (property c4o:pertains).
      Such a list cannot contain more than one item containing the same in-text
      reference pointer;
    – class c4o:SingleReferencePointerList. Defined as subclass of the previous
      class, it is an in-text reference pointer list that pertains to exactly one bib-
      liographic reference;
    – class c4o:GlobalCitationCount. The number of times a work has been cited
      globally (property c4o:hasGlobalCountValue), as determined from a partic-
      ular bibliographic information source (property c4o:hasGlobalCountSource)
      on a particular date (property c4o:hasGlobalCountDate).
    C4O provides the ontological structures which allow one to record the number
of in-text citations (property c4o:hasInTextCitationFrequency, i.e., the number
of in-text reference pointers to a single reference in the reference list of the cit-
ing article), and also the number of citations a cited entity has received globally
(property c4o:hasGlobalCitationFrequency), as determined by a bibliographic in-
formation resource such as Google Scholar14 , Scopus15 or Web of Knowledge16
on a particular date.
    Considering again the example in Section 3, we can write a set of assertions
according to C4O that describe how many times a reference is used within the
citing article and how much the cited article is globally cited (according to Google
Scholar):
: renear02 a biro : B i b l i o g r a p h i c R e f e r e n c e ;
   c4o : h a s I n T e x t C i t a t i o n F r e q u e n c y "1"^^ xsd : n o n N e g a t i v e I n t e g e r .

13
   C4O, the Citation Counting and Context Characterization                                                       Ontology:
   http://purl.org/spar/c4o. The prefix c4o refers to entities defined in it.
14
   Google Scholar: http://scholar.google.it.
15
   Scopus: http://www.info.sciverse.com/scopus/.
16
   Web of Knowledge: http://apps.isiknowledge.com.
                                                       Describing bibliographic references in RDF                          9


Fig. 3. Graffoo diagram summarising the C4O entities used for counting citations and
references.


: towards -a - semantics c4o : h a s G l o b a l C i t a t i o n F r e q u e n c y [
   a c4o : G l o b a l C i t a t i o n C o u n t ;
   c4o : h a s G l o b a l C o u n t D a t e "2014 -03 -17"^^ xsd : date ;
   c4o : h a s G l o b a l C o u n t S o u r c e [ a c4o : B i b l i o g r a p h i c I n f o r m a t i o n S o u r c e ;
     foaf : homepage < http :// scholar . google . com > ] ;
   c4o : h a s G l o b a l C o u n t V a l u e "5"^^ xsd : n o n N e g a t i v e I n t e g e r ] .

    Moreover, C4O enables ontological descriptions of the context where an in-
text reference pointer appears in the citing document (modelled as shown in
Fig. 4), and allows one to relate that context to relevant textual passages in the
cited document.
    Considering the previous bibliographic reference example, a possible C4O
formalisation of the contexts involved by that citing act is:
: intertextual - semantics frbr : part : in - text - renear02 .
: in - text - renear02 a c4o : I n T e x t R e f e r e n c e P o i n t e r ;
   c4o : denotes : renear02 ; c4o : hasContext : citation - sentence .
: citation - sentence a doco : Sentence ;
   c4o : hasContent " Renear , Dubin , and Sperberg - McQueen (2002 , pp . 121 -122)
          proposed a formal semantic approach for structured documents ." .
: sentence - in - towards -a - semantics a doco : Sentence ;
   frbr : partOf : towards -a - semantics ;
   c4o : hasContent " Markup semantics are modeled c om p ut at i on al ly by applying
          knowledge repr esentati on technologies to the problem of making those
          structures , relationships , and properties explicit ." ;
   c4o : isRelevantTo : citation - sentence .

    C4O, thus, completes the basic notions behind bibliographic references we
introduced in Section 2.
10        Di Iorio et al.


Fig. 4. Graffoo diagram summarising the C4O entities for describing citation contexts.


5      Converting references with REnhancer

In this section we describe a prototype that produces a BiRO-compliant RDF
from a textual reference list. We have named the tool REnhancer, which stands
for Reference list Enhancer. The tool is freely available online17 and can be
invoked through a Web interface or as a REST API. The input reference list
can be provided as formatted in the source article: it can be simply copied and
pasted from an article into the text area of the Web interface or passed directly to
the Web service. The list could be automatically extracted from PDF or HTML
articles by using other tools (e.g., PDFX [4]) and passed to REnhancer.
    The tool accepts two optional parameters, i.e., the namespace IRI to use for
the generation of identifiers for new generated objects, and the output format
for the RDF serialization (RDF/XML, Turtle or N-Triples).
    For example, given the following reference list:
 1. Di Iorio, A., Nuzzolese, A. G., Peroni, S. (2013). Towards the automatic identification of the
    nature of citations. In Proceedings of 3rd Workshop on Semantic Publishing (SePublica 2013):
    63-74.
 2. Garcia Castro, L. J., Berlanga, R., Rebholz-Schuhmann, D., Garcia, A. (2013). Connections
    across scientific publications based on semantic annotations. In Proceedings of 3rd Workshop
    on Semantic Publishing (SePublica 2013): 51-62.

the following RDF is returned:
: reference - list a biro : ReferenceList ; co : firstItem : reference - list - item -1 ;
   co : item : reference - list - item -1 , : reference - list - item -2 ;
   co : lastItem : reference - list - item -2 ; co : size "2"^^ xsd : n o n N e g a t i v e I n t e g e r .
: reference - list - item -1 rdf : type co : ListItem ;
   co : index "1"^^ xsd : n o n N e g a t i v e I n t e g e r ; co : itemContent : reference -1 ;
   co : nextItem : reference - list - item -2 .
: reference -1 rdf : type biro : B i b l i o g r a p h i c R e f e r e n c e ;
   dcterms : b i b l i o g r a p h i c C i t a t i o n " Di Iorio , A . , Nuzzolese , A . G . , Peroni , S .
          (2013) . Towards the automatic ident ificatio n of the nature of citations
         . In Proceedings of SePublica 2013: 63 -74."^^ xsd : string . ...

    Future releases of the tool will also expand information about authors, pub-
lication venue and year.
17
     Renhancer homepage: http://www.cs.unibo.it/˜nuzzoles/renhancer.
                                                Describing bibliographic references in RDF   11

   The following command shows how to use cURL to invoke REnhancer via
REST API and how to pass parameters. The textual reference list is passed
through the ref-list parameter (by substituting “TEXTUAL REFERENCE LIST”
with the actual references) that is mandatory for finalising a request. An optional
namespace for new generated items and bibliographic references can be specified
through the “namespace” parameter:
curl -v -X POST -H " Accept : text / turtle " -d
  namespace =" http :// foo . org / referenes#"
  -- data - urlencode ref - list =" T E X T U A L _ R E F E R E N C E _ L I S T "
  http :// www . cs . unibo . it /~ nuzzoles / renhancer /

   REnhancer is implemented as a PHP application. The recognition of individ-
ual bibliographic references within the reference list is performed by means of
regular expressions, which enable to distinguish commonly used syntactic pat-
terns, such as numbered lists, lists based on authors’ initials or brackets.


6     Conclusions
The main goal of this paper was to present BiRO and C4O, two OWL ontologies
for the in-depth description of citations in scientific papers. A more accurate
and expressive representation of citations makes it possible to better integrate
bibliographic information into the Linked Data universe, and to enable sophisti-
cated reasoning and applications. Yet, open datasets about scientific papers and
citation networks are already available – such as DBLP and ACM – but they
mainly contain bibliographic records and do not describe other details related
to citations between papers, such as in-text reference pointers, citation contexts,
citation counting, etc. The two ontologies presented here aim at introducing a
vocabulary to describe such citation-related entities.
    While these ontologies are quite stable, there is a long way to go on the
tools for extracting such semantic bibliographic information. In this paper we
presented a prototype called REnhancer that takes as input a raw-text list of ref-
erences and produces a BIRO and C4O translation of these entries. REnhancer
is planned to be extended to extract more information and integrated with tools
for the automatic extraction of content from PDF and XHTML. A parallel re-
search line we are following and we plan to integrate with REnhancer is on the
automatic extraction and characterisation of citations from XML documents.


References
1. Barabucci, G., Di Iorio, A., Peroni, S., Poggi, F., & Vitali, F. (2013). Annotations
   with EARMARK in practice: a fairy tale. In Proceedings of DH-CASE 2013. DOI:
   10.1145/2517978.2517990
2. Ciancarini, P., Di Iorio, A., Nuzzolese, A. G., Peroni, S., & Vitali, F.
   (2014). Evaluating citation functions in CiTO: cognitive issues. To ap-
   pear in Proceedings of ESWC 2014. Retrieved April 9, 2014, from
   http://speroni.web.cs.unibo.it/publications/ciancarini-in-press-evaluating-citation-
   functions.pdf
12      Di Iorio et al.

3. Ciccarese, P., & Peroni, S. (2013). The Collections Ontology: creating and handling
   collections in OWL 2 DL frameworks. To appear in Semantic Web – Interoperability,
   Usability, Applicability. DOI: 10.3233/SW-130121
4. Constantin, A., Pettifer, S., & Voronkov, A. (2013). PDFX: fully-automated PDF-
   to-XML conversion of scientific literature. In Proceedings of DocEng 2013: 177–180.
   DOI: 10.1145/2494266.2494271
5. DCMI Usage Board. (2012). DCMI Metadata Terms. DCMI Recommendation,
   14 June 2012. Dublin Core Metadata Initiative. Retrieved April 9, 2014, from
   http://dublincore.org/documents/dcmi-terms/
6. Di Iorio, A., Nuzzolese, A. G., & Peroni, S. (2013). Towards the automatic identi-
   fication of the nature of citations. In Proceedings of SePublica 2013. http://ceur-
   ws.org/Vol-994/paper-06.pdf
7. Di Iorio, A., Peroni, S., Poggi, F., Vitali, F., & Shotton, D. (2013). Recognising
   document components in XML-based academic articles. In Proceedings of DocEng
   2013: 181–184. DOI: 10.1145/2494266.2494319
8. D’Arcus, B., & Giasson, F. (2009). Bibliographic Ontology Specification.
   Specification Document, 4 November 2009. Retrieved April 9, 2014, from
   http://bibliontology.com/
9. Gangemi, A., Peroni, S., & Vitali, F. (2010). Literal Reification. In Proceedings of
   WOP 2010: 65–66. http://ceur-ws.org/Vol-671/pat04.pdf
10. Hammond, T. (2008). RDF Site Summary 1.0 Modules: PRISM. Retrieved April
   9, 2014, from http://nurture.nature.com/rss/modules/mod prism.htm
11. Hou, W.-R., Li, M., & Niu, D.-K. (2011). Counting citations in texts rather than
   reference lists to improve the accuracy of assessing scientific contribution: Citation
   frequency of individual articles in other papers more fairly measures their scientific
   contribution than mere presence in reference lists. BioEssays, 33(10): 724–727. DOI:
   10.1002/bies.201100067
12. Marcoux, Y., & Rizkallah, E. (2009). Intertextual semantics: A semantics for in-
   formation design. Journal of the American Society for Information Science and
   Technology, 60(9): 1895–1906. DOI: 10.1002/asi.21134
13. Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing
   bibliographic resources and citations. Web Semantics: Science, Services and Agents
   on the World Wide Web, 17: 33–43. DOI: 10.1016/j.websem.2012.08.001
14. Peroni, S., Gangemi, A., & Vitali, F. (2011). Dealing with markup semantics. In
   Proceedings the 7th International Conference on Semantic Systems: 111–118. DOI:
   10.1145/2063518.2063533
15. Peroni, S., Gray, T., Dutton, A., & Shotton, D. (2015). Setting our
   bibliographic references free: towards open citation data. To appear
   in Journal of Documentation, 71(2). Retrieved April 9, 2014, from
   http://speroni.web.cs.unibo.it/publications/peroni-in-press-setting-bibliographic-
   references.pdf
16. Qazvinian, V., & Radev, D. R. (2010). Identifying Non-explicit Citing Sentences
   for Citation-based Summarization. In Proceedings of ACL 2010: 555–564.
17. Shotton, D. (2009). Semantic publishing: the coming revolution in scientific journal
   publishing. Learned Publishing, 22(2): 85–94. DOI: 10.1087/2009202
18. Shotton, D. (2013). Publishing: Open citations. Nature, 502(7471): 295–297. DOI:
   10.1038/502295a
19. Wan, S., Paris, C., & Dale, R. (2010). Supporting browsing-specific information
   needs: Introducing the Citation-Sensitive In-Browser Summariser. Web Seman-
   tics: Science, Services and Agents on the World Wide Web, 8(2-3): 196–202. DOI:
   10.1016/j.websem.2010.03.002