<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>TExt, June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>in the OpenCitations accepted format</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arcangelo Massari</string-name>
          <email>arcangelo.massari@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Heibi</string-name>
          <email>ivan.heibi2@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna</institution>
          ,
          <addr-line>Bologna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>2</volume>
      <fpage>4</fpage>
      <lpage>06</lpage>
      <abstract>
        <p>format. The OpenCitations organization is working on ingesting citation data and bibliographic metadata directly provided by the community (e.g., scholars and publishers). The aim is to improve the general coverage of open citations, which is still far from being complete, and use the provided metadata to enrich the characterization of the citing and cited entities. This paper illustrates how the citation data and bibliographic metadata should be structured to comply with the OpenCitations accepted (author/editor) is defined by several attributes, e.g., his family name or ID. Multiple characters are separated by a semicolon followed by a white space character. Generally, the definition of an</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction 2. Metadata and citations</title>
      <p>
        The Declaration on Research Assessment [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the Leiden
OpenCitations manages and processes two diferent CSV
Manifesto for Research Metrics [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and the Initiative
ifles to separately characterize the ingested documents,
for Open Citations (I4OC, https://i4oc.org/) have
successone containing their metadata (META-CSV), and a
secfully convinced almost all major academic publishers to
ond one holding their citations (CITS-CSV). On this
secrelease their publication reference lists. To date, more
tion we discuss how these files should be structured and
than 1.2 billion citations are available through the
Crossdefined before providing them to OpenCitations. The
ref REST API [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and distributed by OpenCitations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
discussion presented in this section is based on a more
as structured, separated from the original bibliographic
exhaustive documentation [8].
source and under the CC0 license [5].
      </p>
      <p>CSV files are logically structured as tables. In
METANevertheless, the coverage of open citations is still</p>
      <p>CSV each document (row), is characterised by 11
atfar from complete [6]. On the one hand, some publish- tributes (columns):
ers have not yet made their citations public. On the
other hand, many citations are lost because they are only
present in unstructured format within PDF files,
especially in social sciences.</p>
      <p>OpenCitations is working on ingesting citations and
bibliographic metadata directly coming from the
community (e.g., scholars and publishers). In this way, projects
like EXCITE [7] - aimed at extracting citations from PDFs
- could significantly contribute to increasing the data
coverage.</p>
      <p>The following section illustrates how to structure the
citation data and bibliographic metadata in the accepted
format of OpenCitations. We conclude this paper with a
description of the upcoming future related works.
nEvelop-O
(I. Heibi)
(I. Heibi)
JCDL’22: ULITE-ws, Understanding LIterature references in academic
CEUR
htp:/ceur-ws.org
ISN1613-073</p>
      <p>CEUR</p>
      <p>Workshop Proceedings (CEUR-WS.org)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
• id. the ID(s) of the corresponding document. A
document can have more than one ID, each ID is
defined by its type (using an acronym) and value.</p>
      <p>Multiple IDs must be separated using single white
space, as follow:</p>
      <p>ID abbreviation + “:” + ID value
For example “doi:10.3233/ds-170012” indicates
a DOI identifier having the value
“10.3233/dsFamily Name + “,” + “ ” + Given Name + “ ” + “[”</p>
      <p>+ IDs + “]”
The IDs of the authors/editors are specified in
square brackets and follow the format used for
the “id” attribute.
e.g. “Peroni, Silvio [orcid:0000-0003-0530-4305]”
In case of no IDs, the square brackets are omitted
from the character description either. The given
name is not mandatory, however, the description
of the character should still contain a comma to
indicate such absence (e.g. “Peroni,
[orcid:00000003-0530- 4305]”)
• pub_date. the date of publication of the
document. The date is defined according to ISO
86014[9], the ISO standard for “Representation of
dates and times”:</p>
      <p>YYYY-MM-DD
It is mandatory to specify at least the publication
year. The values of the month and day are not
required. However, if the day is specified, the
month must be specified as well.
• venue. data regarding the venue of the document.</p>
      <p>For example, if the document is a journal article,
the venue defines the journal where the document
has been published. Each venue is described as
follows:
• The fields “title”, “pub_date”, and “author” (or
“editor”) are mandatory for the resources of type
book, dataset (or data file), dissertation, edited
book, journal article, monograph, other, peer
review, posted content (or web content),
proceedings article, report, and reference book. Moreover,
this information is compulsory if the ”type” field
is empty.
• The “title” and “venue” fields are required for the
resources of type book chapter, book part, book
section, book track, component, and reference
entry.
• Only the “title” field is required for the resources
of type book series, book set, journal, proceedings,
proceedings series, report series, standard, and
standard series.
• Regarding the resources of journal volume type,
the fields “venue” and “volume”, or “venue” and
“title”, are mandatory. Conversely, as for
resources of journal issue type, the fields “venue”
and “issue”, or “venue” and “title”, are mandatory.</p>
      <p>Table 1 shows an example of a well-formed
META</p>
      <p>CSV representation. The table contains a small sample of</p>
      <p>Venue Title + “ ” + “[” + IDs + “]” ten documents (rows) and their corresponding attributes
The IDs of a venue are described using the same (columns).
format used previously. In case of no identifiers, On the other hand, in the CITS-CSV each entity
the square brackets are omitted. (row) represents a citation. A citation is characterised
• volume and issue. these values are required by 4 attributes (columns): citing_id,
citing_publicaonly if the document is contained in a journal tion_date, cited_id, and cited_publication_date. The
volume or a journal issue. c i t i n g _ i d and c i t e d _ i d values represent the identifiers
of the citing and cited document, respectively. These
• page. the page range of the corresponding doc- values are both mandatory, and they are structured
folument, defined through the specification of the lowing the same scheme used for id definition in
METAifrst and the last page, divided by a hyphen “-”. CSV. The citing_publication_date and
cited_publica• type. a textual value to identify the document. tion_date represent the date of publication of the citing
This value is taken from the list of the currently and cited document, respectively. Both these values are
supported bibliographic resource types: book, optional, and follow the same structural scheme used for
book chapter, book part, book section, book se- p u b _ d a t e definition in META-CSV.
ries, book set, book track, component, dataset (or Table 2 shows an example of a well-formed CITS-CSV
data file), dissertation, edited book, journal, jour- representation. The table contains a small sample of
nal article, journal issue, journal volume, mono- ten diferent citations (rows) and their corresponding
graph, other, peer review, posted content (or web attributes (columns).
content), proceedings, proceedings article,
proceedings series, reference book, reference entry,
report, report series, standard, and standard se- 3. Discussion and conclusion
ries.
• publisher. the publisher name of the
corresponding document. To define a publisher we apply the
same format used in the definition of the v e n u e .</p>
      <p>This paper described how to define well-formed CSV files
storing citations and metadata of bibliographic resources,
ready to be provided and later processed by
OpenCitations.</p>
      <p>The ingestion of bibliographic metadata will be
possible starting from the release of OpenCitations Meta
(OC-Meta), expected by the end of 2022. OC-Meta will
store bibliographic metadata for the documents involved</p>
      <p>If the resource identifier is specified in the “id” field, all
the other fields are optional. Conversely, if the “id” field
is empty, there are mandatory fields that vary depending
on the resource type:
Acknowledgments
(as citing or cited entities) in OpenCitations citation in- [5] S. Peroni, D. Shotton, Open citation:
Defidexes. nition, 2018. doi:1 0 . 6 0 8 4 / M 9 . F I G S H A R E . 6 6 8 3 8 5 5 . V 1 ,</p>
      <p>The ingestion of the citations is possible thanks to artwork Size: 95436 Bytes Publisher: figshare.
CROCI, the Crowdsourced Open Citations Index, which [6] A. Martín-Martín, Coverage of open citation data
allows individuals identified by ORCIDs to deposit the approaches parity with web of science and scopus,
citation data that they have legal right to submit [10]. OpenCitations blog (2021).</p>
      <p>Citation data are submitted to either Figshare (https: [7] A. Hosseini, B. Ghavimi, Z. Boukhers, P. Mayr,
//figshare.com) or Zenodo (https://zenodo.org), accompa- Excite–a toolchain to extract, match and publish
nied by the ORCID of the contributor. Aftwerwards, the open literature references, in: 2019 ACM/IEEE Joint
submitter can inform OpenCitations using the GitHub Conference on Digital Libraries (JCDL), IEEE, 2019,
issue tracker on the CROCI repository (https://github. pp. 432–433.
com/opencitations/croci/issues). [8] A. Massari, How to produce well-formed CSV files</p>
      <p>Future works include implementing an interface that for OpenCitations, 2022. URL: https://doi.org/10.
simplifies and automates the entire publication process 5281/zenodo.6597141. doi:1 0 . 5 2 8 1 / z e n o d o . 6 5 9 7 1 4 1 .
via CROCI, also providing input data validation and mod- [9] M. Wolf, C. Wicksteed, Date and time formats,
ification suggestions. https://www.w3.org/TR/NOTE-datetime, 1997.</p>
      <p>Moreover, CROCI currently handles only DOI-to-DOI [10] I. Heibi, S. Peroni, D. M. Shotton,
Crowdsourccitations. The upcoming plan is to let CROCI manage ing open citations with CROCI - an analysis of
also any-to-any citations. the current status of open citations, and a
proposal, CoRR abs/1902.02534 (2019). URL: http:
//arxiv.org/abs/1902.02534. a r X i v : 1 9 0 2 . 0 2 5 3 4 .</p>
      <p>This work was funded from the European Union’s
Horizon 2020 research and innovation program under grant
agreement No 101017452 (OpenAIRE-Nexus Project). We
want to thank Silvio Peroni for supervising the entire
work on OpenCitations, Philipp Mayr-Schlegel and
Ahsan Shahid for the feedback on the documentation from
which this demo paper is drawn, and Davide Brambilla
for the valuable insights about CROCI and its future
developments.</p>
    </sec>
    <sec id="sec-2">
      <title>A. Appendix</title>
      <p>G</p>
      <p>M
1–4
H
b d d d d</p>
      <p>t t t t
m
d d L L L L
ean LLC ean LLC ited treG ign ign ign ign
y o o
t b j b
k k k rn rn rn rn
o o o</p>
      <p>u u u u
o o o</p>
      <p>o o o o
b b b j j j j
e e e e
l l l l
c c c c
i i i i
t t t t
r r r r
a a a a
l l l l
a a a a
5 0 6
2 6 7
3 7 7 3
- - - 8
1 3 7
2 5 6 7
3 7 7 7
iseavnW ireaSvgnn trsoyhw ttreeoyhh iftcevoon liiftbooy it-eghhm iitreaavd iftaaxoon trzeoLnG</p>
      <p>e d l d
w v w b
e o e n on ta h n e n
N G N O c S T a R a
z
i
r s
e e
t t
c u
a b
r i
a
r
t
h t
c a
s a
t t</p>
      <p>a
n
e
d
a
t
g
d
e
t n</p>
      <p>o
f
o s</p>
      <p>e
e r
l</p>
      <p>r
p o
m
a r</p>
      <p>i
s
c
e
h
A t
:
1
e
l
b
a
T
um e eu ISh :i01 aS :2n
c m en t7 od ta iss</p>
      <p>v 1 [ D [
o
n e
i t
n d a
d
_
b 8 7 5</p>
      <p>1 1 0
u 0 0 0
p p 2 2 2
3 7
e -1 -1
g 9 5
a 1 5
p 1 1
2
1
1
]</p>
      <p>]
6
- 2
8 9
6 4
6 -8
0</p>
      <p>1
0
- 5
0 4
3 :2
-0 n</p>
      <p>s
-3 is
8
7 4
9 8
/ e 4
7 c 8
C 00 n -1
e
i 5
W .1 c 4</p>
      <p>]
;]5 3X ]</p>
      <p>5
0 2 0
3 -5 3
4 6 4
-
0 0 0
3 5 3
5 -5 5
0 1 0
-
3 0 3
0 0 0
0 -0 0
0 0 0
-
0 0 0</p>
      <p>0
0 0</p>
      <p>0
0 : 0
:0 id :0
d c d
i r i
c o c
r [ r
o o
[ d [</p>
      <p>i
o v o
i i
v a v
l l
i D i</p>
      <p>S , S</p>
      <p>6
6 1
0 0
0 0
- 2
3
0
5
8 8 7 7 7 8
5 0 1 1 1 1
03 10 67 31 11 60 1x 1x 1x 1x
0 0 4 4 3 0 4 5 3 0 3 6 5 5 5 5
- - 4 0 3 0 4 0 3 0 3 1 1 1 1
7
-3 -1 -3 11 16 02 66 66 10 58 15 08 18 18 18 18
8 S 8 2 0 8 7 1 8 1 8 0 0 0 0 0
7</p>
      <p>7 0 0 7 2 3 7 1 7 6 0 0 0 0
/9 /D /9 6 6 /9 0 0 /49 03 /9 3 /s /s /s /s
7 3 7 7 7 7 3 3 5 0 4 4 4 4
0 3 0 4 4 5 2 2 2 0 1 5 3 3 3 3
0
id od od od isb isb od isb isb od isb od isb
0 2 0 3 3 0 0 3 1 5 1 1 1 1 1
.1 .3 .1 87 87 .1 87 87 .4 87 .1 87 .1 .1 .1 .1
0 0 0 9 :9 0 9 :9 0 9 0 9 0 0 0 0
:i1 :i1 :i1 :n n :i1 :n n :i1 :n :i1 :n :i1 :i1 :i1 :i1
o o o o
d d d d</p>
      <p>5
1 3
6 _
3 2 6
0 - S
0 6
.10 387 -1S 64
i 1 6
b -3 - 5
c 2 0 4
p 4 8 9
20 .la -6 41 .5
2 3 - 4
9 rn - 1 6
0 ou 78 04 45 27 61 54 79 57
0
/2 j/ /9 /2 /9 14 25 67 00 17
i_d .10087 .10371 .10007 .10186 .10145 :1915 :1978 :1868 :1589 :1819</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cagan</surname>
          </string-name>
          , San francisco declaration on research assessment,
          <source>Disease Models &amp; Mechanisms</source>
          (
          <year>2013</year>
          )
          <article-title>dmm</article-title>
          .012955. URL: https://journals.biologists.com/ dmm/article/doi/10.1242/dmm.012955/261854/ San-Francisco-Declaration-on-Research-Assessment.
          <source>doi:1 0 . 1 2 4 2 / d m m . 0 1</source>
          <volume>2 9 5 5 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wouters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Waltman</surname>
          </string-name>
          , S. de Rijcke, I. Rafols,
          <article-title>Bibliometrics: The leiden manifesto for research metrics</article-title>
          ,
          <source>Nature</source>
          <volume>520</volume>
          (
          <year>2015</year>
          )
          <fpage>429</fpage>
          -
          <lpage>431</lpage>
          . URL: https://www.nature.
          <source>com/articles/520429a. doi:1 0 . 1 0</source>
          <volume>3 8 / 5 2 0 4 2 9</volume>
          <fpage>a</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Hendricks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tkaczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Feeney</surname>
          </string-name>
          ,
          <article-title>Crossref: The sustainable source of community-owned scholarly metadata</article-title>
          ,
          <source>Quantitative Science Studies</source>
          <volume>1</volume>
          (
          <year>2020</year>
          )
          <fpage>414</fpage>
          -
          <lpage>427</lpage>
          . doi:
          <article-title>1 0 . 1 1 6 2 / q s s _ a _ 0 0 0 2 2</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Peroni</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Shotton, OpenCitations, an infrastructure organization for open scholarship</article-title>
          ,
          <source>Quantitative Science Studies</source>
          <volume>1</volume>
          (
          <year>2020</year>
          )
          <fpage>428</fpage>
          -
          <lpage>444</lpage>
          . doi:
          <article-title>1 0 . 1 1 6 2 / q s s _ a _ 0 0 0 2 3</article-title>
          . m y
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>