<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Breaking Boundaries in Citation Parsing: A Comparative Study of Generative LLMs and Traditional Out-of-the-box Citation Parsers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Iana Atanassova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marc Bertin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ELICO</institution>
          ,
          <addr-line>Université Claude Bernard Lyon 1</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université de Franche-Comté</institution>
          ,
          <addr-line>CRIT</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>38</fpage>
      <lpage>52</lpage>
      <abstract>
        <p>The task of citation string parsing has been the focus of many eforts. Traditional tools explicitly designed to parse bibliographic information, such as Bilbo, Grobid, and Parscit, have long been established in the academic landscape. Recently, with the emergence of general conversational LLMs (Large Language Models) such as OpenAI's ChatGPT and Llama, an interesting question arises: can such language models, originally developed for natural language understanding (NLU), be employed to eficiently process bibliographies, and how would their performance for this task compare to that of dedicated bibliographic parsing tools? In this article, we propose an experiment to measure the ability of LLMs to analyse citation strings in diferent citation styles. We use a synthetic dataset with 12 diferent citation styles. We evaluate the output of two generative LLMs, ChatGPT 3.5 and Llama 2 7B, and two out-of-the-box citation parsers, CERMINE and Neural ParsCit. The results show that the LLMs tend to outperform the citation parsers for all citation styles and labels.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Generative LLMs</kwd>
        <kwd>Citation string parsing</kwd>
        <kwd>Reference parsing</kwd>
        <kwd>BibTEX</kwd>
        <kwd>ChatGPT</kwd>
        <kwd>Neural ParsCit</kwd>
        <kwd>CERMINE</kwd>
        <kwd>Llama</kwd>
        <kwd>References</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        times of crisis. For example, the COVID-19 Open Research Dataset (CORD-19) Database [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is
a free resource1 of tens of thousands of scholarly articles about COVID-19, SARS-CoV-2, and
related coronaviruses for use by the global research community.
      </p>
      <p>
        PDF is currently the most widely used format for publishing scientific articles, although
some publishers ofer HTML access to their articles. However, obtaining structured text and
bibliographic data from PDFs is a complex and error-prone process. The XML format ofers
specific tagsets for representing journal articles, the JATS (Journal Article Tag Suite) and NLM
DTD. They are used for example by PubMed2 and PLOS3 who provide direct access to the
articles in XML. The LATEX format is also widely used in scientific publishing by many journals
and preprint databases, such as arXiv. ArXiv hosts over two million scientific articles in eight
ifelds, mostly in the Natural and Applied Sciences. The UnarXiv corpus [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] was constructed
using the arXiv data in LATEX format, using a method that avoids the distortions introduced by
PDF processing. Processing peer-reviewed publications, beyond Pubmed and PLOS datasets,
poses a considerable challenge, particularly in parsing citation strings from PDF files. This issue
is also confronting the scientific publishing sector [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>Conversational LLMs (Large Language Models) have recently had a significant impact in
many domains, particularly in coding. Thus, with the emergence of general chatbots such as
OpenAI’s GPT-3.5, which were initially developed for natural language understanding (NLU),
an intriguing question arises:</p>
      <p>Conversational Large Language Models (LLMs), such as OpenAI’s GPT-3.5, have had a
significant impact in various fields, , particularly in coding. While they are originally designed
for Natural Language Understanding (NLU), an intriguing question arises:</p>
      <p>Can generative LLMs be employed to eficiently process bibliographies, and how
would their performance for this particular task compare to that of dedicated
citation string parsing tools?</p>
      <p>This question is relevant for two reasons. First, the existence and easy access to conversational
LLMs might render task-specific tools obsolete in the near future. Are we approaching this
point? Second, the accessibility of conversational LLMs to a broad audience, including
nontechnical users, impacts academics and information science. Researchers, bibliometricians,
librarians, and students could leverage these models’ advanced parsing abilities through simple
natural language prompts, thus democratizing the access to sophisticated bibliographic data
management.</p>
      <p>In this paper, we evaluate the efectiveness of conversational generative LLMs, specifically
ChatGPT 3.5 and Llama 2 7B, in citation string parsing, by comparing their performance against
traditional tools like CERMINE and Neural ParsCit. These tools were employed directly, with
no additional training, to ensure accessibility and ease of use for non-specialists.</p>
      <p>Our objective is to assess these parsers across a large variety of citation styles which reflect
diferent academic disciplines. Existing datasets typically cover one or two disciplines with
limited variation in citation styles, and with the Humanities notably underrepresented. To
1https://github.com/allenai/cord19
2https://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/style.html
3https://plos.org/text-and-data-mining/
address this, we have developed a synthetic dataset utilizing the BibTEX format and the LATEX
biber package, allowing for a comprehensive representation of citation styles.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Citation String Parsing: State of the Art and Limitations</title>
      <p>
        Over the last decade, many tools have been developed to carry out the task of citation string
parsing, i.e. to produce structured bibliographic metadata from character strings that represent
bibliographic references. The two main categories of approaches, as described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], are
Nonmachine Learning based and Machine Learned (ML) based Approaches. Non-machine Learning
based Approaches include rule-based approaches, knowledge-based approaches, and template
matching. Machine Learned based Approaches include Support Vector Machines (SVMs), Hidden
Markov Models (HMM), Conditional Random Fields (CRF), and Deep Learning based approaches.
The work of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposes a state of the art and a study to compare out-of-the-box and re-trained
ML and rule-based approaches. The results showed that ML approaches tend to outperform
non-ML approaches. However, the study was limited to a specific set of metadata and did not
include an in-depth evaluation of essential fields of the bibliographic references, such as title or
authors.
      </p>
      <p>
        There are several datasets available for training and evaluating citation parsers, but they
are often limited to specific disciplines (see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for a complete analysis). For instance, Cora
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], CiteSeer [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and Flux-CIM [9] are designed for use in Computer Science and Artificial
Intelligence, while CS-SW[10] is intended for use in Semantic Web. GROTOAP2 [11] is based
on articles from PubMed Central Open Access Subset, and was used for training the CERMINE
citation parser [12].
      </p>
      <p>There are two multi-domain datasets available: GROBID [13] and GIANT [14, 15]. GROBID
was developed using the datasets cited above, but its evaluation is essentially based on life
sciences and prepublications4. On the other hand, the GIANT dataset is a synthetic corpus of
generated citation strings, designed to cover a wide range of citation styles5.</p>
      <p>The task of citation string parsing is an integral part of building large full-text annotated
corpora of publications, such as The Semantic Scholar Open Research Corpus (S2ORC) [16] or
ISTEX [17, 18]. S2ORC is a large corpus that contains 81.1 million English language academic
papers from a wide range of disciplines. ISTEX is the largest repository of standardized scientific
archives in France, serving the research community for documentary and TDM use. It contains
over 27 million scientific publications spanning 700 years in all disciplines and in several
languages. GROBID is a key component in both ISTEX and S2ORC’s processing pipelines.</p>
      <p>
        The diversity of scientific fields and citation practices plays an important role in citation
string parsing. Current ML methods require large annotated corpora for model training. The
tools perform well when trained on corpora adapted to their task. However, as noted by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
the IEEE and ACM citation styles difer significantly from MLA, which is primarily used in
the Humanities. The existence of numerous citation styles across various disciplines makes it
dificult to identify and parse citation strings independently of the styles. At the same time,
it appears that the datasets may not be large enough to encompass all styles required for the
4https://grobid.readthedocs.io/en/latest/Principles/.
5https://github.com/BeelGroup/
eficient training of the models. To address this limitation, [ 19] conducted a study comparing
the performance of tools for citation parsing using synthetic and real citation strings. The study
found that training models with synthetic data did not result in decreased performance compared
to real data, confirming that synthetic citation strings can be generated as an alternative to
corpus-based training.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <sec id="sec-3-1">
        <title>3.1. Building a synthetic dataset of citation strings in various styles</title>
        <p>The benchmark dataset required for our task consists of citation strings and their corresponding
parsed structures. To obtain a high-quality dataset that covers the most common citation styles,
we followed these steps:
1. We processed the obtained BibTEX database using LATEX with the biber package, applying
12 diferent citation styles. The list of the citation styles that we used is: apa, mla,
chem-acs, phys, nature, science, ieee, chicago-authordate, numeric,
alphabetic, authoryear, authortitle6.
2. To ensure that the PDF-to-text conversion does not intervene with the quality of the
citation strings, they were manually extracted from the produced PDF articles and stored
in text files, with one citation string per line. The extracted citation strings were then
used as input for both Neural ParsCit and the LLMs.
6BibLaTeX allows some variants of these styles, e.g. alphabetic-verb, authoryear-comp,
authortitle-ibid, but they did not produce any modification in the generated citation strings.</p>
        <p>author title journal year volume number pages series booktitle doi
Total</p>
        <p>Using the above steps, we obtained 1,200 citation strings in 12 diferent styles that correspond
to the BibTEX entries in our database of 100 references. Since this dataset only includes strings
produced by the LATEX biber package, we consider that they do not contain any formatting or
punctuation errors. As this procedure follows the typical method for producing a bibliography
in a paper, we believe that this type of dataset accurately reflects citation string structures that
are commonly found in real articles, while also encompassing a wide range of citation styles
used by various disciplines and journals.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Test protocol for the generative LLMs</title>
        <sec id="sec-3-2-1">
          <title>Our task was to test two readily available generative LLMs:</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>1. OpenAI’s ChatGPT: free online version 3.5, January 2024;</title>
          <p>2. Llama 2 7B, that we loaded locally using the LM Studio server.</p>
          <p>We divided the dataset of 1,200 citation strings into sets and submitted them to the two
models preceded by the following prompt:
"As an academic researcher, I would like to obtain a BibTeX file for my bibliographic
references. Here is the list of references. Can you please generate a BibTeX file
from these references to facilitate integration into my LaTeX document?"</p>
          <p>The sets submitted to ChatGPT were of 10 citation strings, while for Llama we had to reduce
this size to 5 because we found that the quality of the output for this model deteriorated rapidly
after the first 6 or 7 B ibTEX entries that were generated. Also, for both models we cleared the
conversation history after every 10 sets of citations, so as to prevent too long a conversation
history from afecting the quality of the responses.</p>
          <p>Both models produce BibTEX entries in response to the prompt, most of which follow the
correct BibTEX syntax. Llama’s responses contained, in addition to the BibTEX entries, several
introductory and concluding sentences that we had to remove, e.g. "Of course, I can help you
generate a BibTeX file for your references. Here is the output for each reference: [...] This will output
the reference in the standard BibTeX format. [...]".</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Test protocol for CERMINE and Neural ParsCit</title>
        <p>CERMINE (Content ExtRactor and MINEr) [12] extracts metadata and content from scientific
articles in PDF format. It’s output includes the metadata, the structured content of the article,
and the parsed bibliographic references in an NLM XML record. For our experiment, we used
the online version available from http://cermine.ceon.pl/.</p>
        <p>As CERMINE relies on the structure of the paper to identify the bibliography section, we
have provided it with full PDF papers generated using the llncs LATEX template for articles.
Each article contains a title, authors and afiliations, an abstract and keywords. The body of the
text follows the IMRaD structure, with several paragraphs per section and references to all 100
citations in our dataset. The last section is the References section. We generated 12 such articles,
one for each citation style. The articles are identical except for the citation style that is used.</p>
        <p>Neural ParsCit [20] uses a deep learning model, Long Short Term Memory (LSTM), to perform
sequence-to-sequence labeling. It parses reference strings into their component tags such as
Author, Journal, Location, Date, etc. The output is a string in which each token is followed by
its label. For our experiment we used the implementation of Neural ParsCit which is part of the
Scientific Document Processing Toolkit (SciWing) and uses Bi-LSTM-CRF + GloVe + Elmo +
Char-LSTM7.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Evaluation of the output</title>
        <p>The evaluation of each parser’s output was based on a predefined list of fields and labels. This
was necessary due to the varying formats and labels produced by the diferent parsers, despite
their intended retrieval of the same level of detail and number of fields. Table 2 displays the
specific lists that we used for each parser.</p>
        <p>The input for our processing is a BibTEX database, but only the two LLMs provide output in
the BibTEX format. CERMINE and Neural ParsCit use their own annotation labels to render the
structure of the citation strings. Table 3 displays the correspondence between the labels in the
three types of outputs: the sub-tags of the NML XML ref element that are used in CERMINE,
the labels produced by Neural ParsCit, and the BibTeX fields. Each parser was evaluated solely
on the fields it was intended to provide, considering this correspondence.
ChatGPT &amp; LLama "ENTRYTYPE", "author", "title", "journal", "year", "volume", "number", "pages",
"series", "booktitle", "doi"
CERMINE "author", "title", "journal", "year", "pages", "volume", "number"
Neural ParsCit "author", "title", "journal", "year", "pages", "volume", "booktitle"
The BibTEX format that we use for our input inherently allows for certain variations in the data,
that should be taken into account when we need to compare the original data with the output
of the parsers. To do this, we normalised all white spaces and converted all titles to titlecase.
Some of the punctuation had to be normalized, e.g. the diferent types of hyphens (-) that can
appear in the pages field. Non-Unicode characters have been removed, and punctuation signs</p>
        <sec id="sec-3-4-1">
          <title>7https://sciwing.io/, https://pypi.org/project/sciwing/.</title>
          <p>&lt;string-name&gt;, &lt;given-name&gt;, &lt;surname&gt;
&lt;article-title&gt;
&lt;source&gt;
&lt;source&gt;
&lt;volume&gt;
&lt;issue&gt;
&lt;fpage&gt;, &lt;lpage&gt;
&lt;year&gt;
Neural ParsCit label BibTEX field
AUTHOR
TITLE
JOURNAL
BOOKTITLE
VOLUME
VOLUME
PAGES
DATE
author
title
journal
booktitle
volume
number
pages
year
were stripped from titles, which allows to eliminate trailing commas and points that are present
in Neural ParsCit’s output.</p>
          <p>Author names in BibTEX require some specific processing. Figure 2 shows an example of a
BibTEX entry and its citation strings in two diferent citation styles, with the output produced
by the four citation parsers. Author names can be presented in a BibTEX field with one of
the following syntaxes: "First-name Surname" or "Surname, First-name" or "Surname,
F.". The generated citation string can follow one of these syntaxes depending on the citation
style. In addition, long author lists are often abbreviated in the citation strings and replaced by
the expression "et al".</p>
          <p>In the example in figure 2, the author lists produced by ChatGPT, CERMINE and Neural
ParsCit are correct for both ieee and science styles, although they do not contain all the
author names of the original entry. In fact, the parsers rely only on the citation strings, which
contain partial information for the authors, to produce the correct output. On the other hand,
Llama missed several authors for the ieee style, and hallucinated several other authors for
the science style. Furthermore, its output for the ieee style is not syntactically correct as a
BibTEX entry, in which case we consider all the fields to be wrong.</p>
          <p>Following these considerations, we applied the following algorithmic solution to correctly
compare the output of the parsers for the author names:</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>1. Convert all author names to the "First-name Surname" syntax.</title>
          <p>2. If the citation string contains "et al", then keep only the first author.
3. If the citation string contains only initials for the first names of authors, then assume that
only initials are present in the original BibTEX entry for this style.
4. Remove all points after the initials, and convert all names to lowercase to avoid problems
of capitalisation of names such as "McKein" to lead to incorrect output, etc.</p>
          <p>The BibTEX database we use contains two types of entries, @artile and @inproceedings,
which difer in that the @artile entries have a journal field and the @inproceedings
entries have a booktitle field. As CERMINE and Neural ParsCit do not distinguish between
these types of entries, and CERMINE does not provide a booktitle label, we considered that
for these two parsers the journal label is equivalent to booktitle in cases where the original
BibTEX entry contains a booktitle field.</p>
          <p>When evaluating the output for optional fields, such as doi, we need to take into account
that this output is only expected for those citation styles where the information is present
in the citation string. For example, in figure 2, the reference in the science style does not
contain any information about doi although the doi is present in the original BibTEX entry.
ChatGPT correctly returned an entry without doi, as did CERMINE and Neural ParsCit. Llama
hallucinated a doi and a url.</p>
          <p>The values of precision, recall and F-measure were calculated taking into account all the
ifelds/labels that were produced by the parsers according to the table 2. Only fields for which
the parsers produced values identical to those of the original BibTEX entries were considered
correct. Fields for which the values difered from those of the original B ibTEX entries, after
applying all of the above considerations, were considered incorrect. Other types of error include
ifelds added by the parser that were not present in the original B ibTEX record, or fields missing
from the parser’s output.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Limitations</title>
      <p>The present study is designed to conduct a comparative analysis between existing NLP tools
and LLMs with regard to their performance on the task of citation parsing. It is important to
8See also the github repository https://github.com/iana-atanassova/citation-parsers-bir2024.git
0.826
0.628
0.713
0.766
0.623
0.687
to mitigate hallucinations, also clearing conversation histories regularly. Despite this, Llama
frequently hallucinated, while ChatGPT showed better performance. However, systematic
hallucination control is essential before these models can efectively be used in real-case
scenarios. Additionally, LLMs’ use involves other considerations, such as prompt-specific responses
and unnecessary text additions, observed with ChatGPT and Llama respectively. These issues
highlight the need for improved prompts and post-processing in future LLM applications.</p>
      <p>While LLMs have been trained on huge amounts of data, this is not the case for classical
models. Consequently, the generalisation power of LLMs and classical models across diferent
citation styles and datasets varies significantly. Direct comparisons between these two categories
of models should be approached with caution, and any results derived from such comparisons
must be interpreted within the context of these foundational diferences.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>We proposed an experiment to measure the ability of LLMs to analyse citation strings in diferent
citation styles and compare them to two out-of-the-box citation parsers, CERMINE and Neural
ParsCit. We used a synthetic dataset of citation strings that allowes us to cover 12 diferent
citation styles. The results indicate that the LLMs tend to outperform the citation parsers for all
citation styles and labels, with ChatGPT 3.5 producing the best results.</p>
      <p>Our next step is to develop an approach for testing more LLMs using Crossref data. Crossref
is a DOI registration agency9, that supports various metadata content types, making it possible
to generate synthetic reference strings in both BibTEX and JSON formats. We also need to test
other traditional tools, such as Grobid. Additionally, we must compare the performance of larger
Open Source LLMs, such as the upcoming versions of Llama [22] and Mistral [23]. Prompt
engineering may be a viable strategy for improving results. Another way to improve the output
of LLMs is to address the problem of hallucinations by establishing a framework to reduce this
phenomenon.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by French ANR grant number ANR-20-CE38-0003-01 and grant
number ANR-21-CE38-0003-01.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration of Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT in order to: grammar and
spelling check. After using this service, the authors reviewed and edited the content as needed
and take full responsibility for the publication’s content.
European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April
13-16, 2014. Proceedings 36, Springer, 2014, pp. 311–322.
[9] E. Cortez, A. S. da Silva, M. A. Gonçalves, F. Mesquita, E. S. de Moura, FLUX-CIM:
lfexible unsupervised extraction of citation metadata, in: Proceedings JCDL 2007, 2007, pp.
215–224.
[10] T. Groza, A. Grimnes, S. Handschuh, Reference information extraction and processing
using random conditional fields, Information Technology and Libraries 31 (2012) 6–20.
[11] D. Tkaczyk, P. Szostek, L. Bolikowski, Grotoap2-the methodology of creating a large
ground truth dataset of scientific articles, D-Lib Magazine 20 (2014).
[12] D. Tkaczyk, P. Szostek, M. Fedoryszak, P. J. Dendek, L. Bolikowski, Cermine:
automatic extraction of structured metadata from scientific literature, International
Journal on Document Analysis and Recognition (IJDAR) 18 (2015) 317–335. doi:10.1007/
s10032-015-0249-8.
[13] P. Lopez, GROBID: Combining automatic bibliographic data recognition and term
extraction for scholarship publications, in: Proceedings ECDL 2009, Springer, Springer Berlin
Heidelberg, 2009, pp. 473–474. doi:10.1007/978-3-642-04346-8_62.
[14] M. Grennan, M. Schibel, A. Collins, J. Beel, Giant: The 1-billion annotated synthetic
bibliographic-reference-string dataset for deep citation parsing, in: 27th AIAI Irish
Conference on Artificial Intelligence and Cognitive Science, 2019, pp. 101–112.
[15] M. Grennan, M. Schibel, A. Collins, J. Beel, GIANT: The 1-Billion Annotated Synthetic
Bibliographic-Reference-String Dataset for Deep Citation Parsing [Data] (2019). URL:
https://doi.org/10.7910/DVN/LXQXAO. doi:10.7910/DVN/LXQXAO.
[16] K. Lo, L. L. Wang, M. Neumann, R. Kinney, D. Weld, S2ORC: The semantic scholar open
research corpus, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the
58th Annual Meeting of the Association for Computational Linguistics, Association for
Computational Linguistics, Online, 2020, pp. 4969–4983. URL: https://aclanthology.org/
2020.acl-main.447. doi:10.18653/v1/2020.acl-main.447.
[17] P. Cuxac, A. Collignon, Istex, un projet national d’archives documentaires: au-delà de
l’accès au texte intégral, l’enrichissement des données par méthodes de fouille de textes.,
in: Analyser la science: les bibliothèques numériques comme objet de recherche in 85ème
Congrès ACFAS, 2017.
[18] P. Cuxac, N. Thouvenin, Archives numériques et fouille de textes: le projet istex, Atelier
TextMine, EGC 2017 (Extraction et Gestion des Connaissances), Grenoble, France, January
24 27 (2017) 2017.
[19] M. Grennan, J. Beel, Synthetic vs. real reference strings for citation parsing, and the
importance of re-training and out-of-sample data for meaningful evaluations: experiments
with grobid, giant and cora, arXiv preprint arXiv:2004.10410 (2020). arXiv:2004.10410.
[20] A. Prasad, M. Kaur, M.-Y. Kan, Neural parscit: A deep learning based reference string
parser, International Journal on Digital Libraries 19 (2018) 323–337. doi:10.1007/
s00799-018-0242-1.
[21] M. Bertin, I. Atanassova, Synthetic dataset of citation strings in 12 styles, 2024. URL:
https://doi.org/10.5281/zenodo.10839503. doi:10.5281/zenodo.10839503.
[22] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra,
P. Bhargava, S. Bhosale, et al., Llama 2: Open foundation and fine-tuned chat models,
arXiv preprint arXiv:2307.09288 (2023).
[23] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. d. l. Casas, F. Bressand,
G. Lengyel, G. Lample, L. Saulnier, et al., Mistral 7b, arXiv preprint arXiv:2310.06825
(2023).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chandrasekhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Reas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Burdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Katsis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Kinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Merrill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mooney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Murdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sheehan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Wade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. X. R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wilhelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Raymond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kohlmeier</surname>
          </string-name>
          , CORD-
          <volume>19</volume>
          : The COVID-19 open research dataset,
          <source>in: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          . URL: https://www.aclweb.org/anthology/
          <year>2020</year>
          . nlpcovid19-acl.1.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Saier</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Färber, unarXive: a large scholarly data set with publications' full-text, annotated in-text citations, and links to metadata</article-title>
          ,
          <source>Scientometrics</source>
          <volume>125</volume>
          (
          <year>2020</year>
          )
          <fpage>3085</fpage>
          -
          <lpage>3108</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11192-020-03382-z.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Boukhers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ambhore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <article-title>An end-to-end approach for extracting and segmenting high-variance references from pdf documents</article-title>
          ,
          <source>in: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>186</fpage>
          -
          <lpage>195</lpage>
          . doi:
          <volume>10</volume>
          .1109/JCDL.
          <year>2019</year>
          .
          <volume>00035</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Tkaczyk</surname>
          </string-name>
          , A. Collins,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sheridan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Beel</surname>
          </string-name>
          ,
          <article-title>Evaluation and comparison of open source bibliographic reference parsers: a business use case</article-title>
          ,
          <source>arXiv preprint arXiv:1802</source>
          .
          <volume>01168</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Baliyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Machine learning approaches for entity extraction from citation strings</article-title>
          ,
          <source>in: International Conference on Information Technology</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>287</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Tkaczyk</surname>
          </string-name>
          , A. Collins,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sheridan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Beel</surname>
          </string-name>
          ,
          <article-title>Machine learning vs. rules and out-of-the-box vs. retrained: An evaluation of open-source bibliographic reference and citation parsers</article-title>
          ,
          <source>in: Proceedings of the 18th ACM/IEEE on joint conference on digital libraries</source>
          ,
          <source>JCDL '18</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>99</fpage>
          -
          <lpage>108</lpage>
          . doi:
          <volume>10</volume>
          .1145/3197026.3197048.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Seymore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosenfeld</surname>
          </string-name>
          ,
          <article-title>Learning hidden markov model structure for information extraction (</article-title>
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Caragea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ciobanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fernández-Ramírez</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Giles</surname>
          </string-name>
          ,
          <article-title>Citeseer x: A scholarly big dataset</article-title>
          ,
          <source>in: Advances in Information Retrieval: 36th</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>