<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dominika Tkaczyk</string-name>
          <email>d.tkaczyk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rohit Gupta</string-name>
          <email>rohit@iconictranslation.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Cinti</string-name>
          <email>riccardo@iconictranslation.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joeran Beel</string-name>
          <email>beelj@tcd.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Iconic Translation Machines Ltd.</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Bibliographic reference parsers extract machine-readable metadata such as author names, title, journal, and year from bibliographic reference strings. To extract the metadata, the parsers apply heuristics or machine learning. However, no reference parser, and no algorithm, consistently gives the best results in every scenario. For instance, one tool may be best in extracting titles in ACM citation style, but only third best when APA is used. Another tool may be best in extracting English author names, while another one is best for noisy data (i.e. inconsistent citation styles). In this paper, which is an extended version of [1], we address the problem of reference parsing from a recommendersystems and meta-learning perspective. We propose ParsRec, a meta-learning based recommender-system that recommends the potentially most effective parser for a given reference string. ParsRec recommends one out of 10 open-source parsers: Anystyle-Parser, Biblio, CERMINE, Citation, Citation-Parser, GROBID, ParsCit, PDFSSA4MET, Reference Tagger, and Science Parse. We evaluate ParsRec on 105k references from chemistry. We propose two approaches to meta-learning recommendations. The first approach learns the best parser for an entire reference string. The second approach learns the best parser for each metadata type in a reference string. The second approach achieved a 2.6% increase in F1 (0.909 vs. 0.886) over the best single parser (GROBID), reducing the false positive rate by 20.2% (0.075 vs. 0.094), and the false negative rate by 18.9% (0.107 vs. 0.132).</p>
      </abstract>
      <kwd-group>
        <kwd>recommender systems</kwd>
        <kwd>meta-learning</kwd>
        <kwd>citation parsing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Bibliographic reference parsing is useful for identifying cited documents, also
known as citation matching [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Citation matching is required for assessing the impact
of researchers [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], journals [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] and research institutions [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and for calculating
document similarity [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], in the context of academic search engines [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] and
recommender systems [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ].
      </p>
      <p>
        There exist many ready-to-use open-source reference parsers. Recently we compared
the performance of ten open source parsers [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]: Anystyle-Parser, Biblio, CERMINE,
Citation, Citation-Parser, GROBID, ParsCit, PDFSSA4MET, Reference Tagger and
Science Parse. The overall parsing results varied greatly, with F1 ranging from 0.27
for Citation-Parser to 0.89 for GROBID. Our results also showed that different tools
have different strengths and weaknesses. For example, ParsCit is ranked 3rd in the
overall ranking but is best for extracting author names. Science Parse, ranked 4th
overall, is best in extracting the year. These results suggest that there is no single best
parser. Instead, different parsers might give the best results for different metadata
types and different reference strings. Consequently, we hypothesize that if we were
able to accurately choose the best parser for a given scenario, the overall quality of
the results should increase. This can be seen as a typical recommendation problem: a
user (e.g. a software developer or a researcher) needs the item (reference parser) that
satisfies the user‘s needs best (high quality of metadata fields extracted from
reference strings).
      </p>
      <p>In this paper we propose ParsRec, a novel meta-learning recommender system for
bibliographic reference parsers. ParsRec takes as input a reference string, identifies
the potentially best reference parser(s), applies the chosen parser(s), and outputs the
metadata fields. ParsRec is built upon ten open-source parsers mentioned before.
ParsRec uses supervised machine learning to recommend the best parser(s) for the
input reference string. The novel aspects of ParsRec are: 1) considering reference
parsing as a recommendation problem, 2) using a meta learning-based hybrid
approach for reference parsing.</p>
      <p>
        This paper is an extended version of a poster published at the 12th ACM
Conference on Recommender Systems 2018 (RecSys) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Reference parsers often use regular expressions, hand-crafted rules, and template
matching (Biblio [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], Citation [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], Citation-Parser [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], PDFSSA4MET [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], and
BibPro [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]). Typically the most effective approach for reference parsing is
supervised machine learning, such as Conditional Random Fields (ParsCit [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], GROBID
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], CERMINE [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], Anystyle-Parser [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], Reference Tagger [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and Science Parse
[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]), or Recurrent Neural Networks combined with Conditional Random Fields
(Neural ParsCit [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]). To the best of our knowledge, all open-source reference parsers
are based on a single technique, none of them uses any ensemble, hybrid or
metalearning techniques.
      </p>
      <p>
        Some reference parsers are parts of larger systems for information extraction from
scientific papers. These systems automatically extract machine-readable information,
such as metadata, bibliography, logical structure, or fulltext, from unstructured
documents. Examples include PDFX [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], ParsCit [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], GROBID [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], CERMINE [
        <xref ref-type="bibr" rid="ref21 ref28">21,
28</xref>
        ], Icecite [
        <xref ref-type="bibr" rid="ref29 ref30">29, 30</xref>
        ] and Team-Beam [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ].
      </p>
      <p>
        Meta-learning is a technique often applied to the problem of algorithm selection
[
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. Meta-learning for algorithm selection allows the training of a model able to
automatically select the best algorithm for a given scenario. Meta-learning for
algorithm selection has been successfully applied to several areas in natural language
processing, for example, to grammatical error correction [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ], sentiment classification
[
        <xref ref-type="bibr" rid="ref34">34</xref>
        ], and part-of-speech tagging [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]. To the best of our knowledge, meta-learning
has not been applied to reference parsing.
      </p>
      <p>
        A very effective family of recommender approaches are hybrid-based approaches,
which leverage the strengths of many different recommendation algorithms [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ]. A
weighted hybrid combines the output of many recommenders into one final result
[
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]. A switching hybrid chooses a single recommender best suited for a given
situation [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]. ParsRec can be seen as a switching hybrid of reference parsers, where the
switching is controlled by machine learning.
      </p>
    </sec>
    <sec id="sec-3">
      <title>ParsRec Approach</title>
      <p>A meta-learning recommender for reference parsers recommends the best parser for a
given scenario. There are multiple ways to define a scenario. One aspect to consider is
the granularity of the entity, for which we choose a parser. We can recommend the
best parser for:
• a corpus,
• a document, i.e. its bibliography consisting of a list of reference strings,
• a single reference string,
• a metadata type in a reference string, such as title, journal name, or year.
These four parsing levels can also be combined. For example, a recommender system
might recommend a parser for a combination of corpus and metadata type. In this
case, one parser would be used to extract the year from all reference strings in corpus
A, and another parser would be used to extract the names of the authors from all
reference strings in corpus B.</p>
      <p>
        In this paper, we examine two types of a meta-learning recommender being
inspired by [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ]: ParsRecRef recommends a single parser to an entire reference string,
and ParsRecField recommends a single parser to a pair of reference string and metadata
type. The dataset we used for experiments does not allow for other types of the
recommender.
      </p>
      <p>ParsRecRef chooses one parser for a given reference string. This chosen parser is
then responsible for the extraction of all metadata. ParsRecRef works in a few steps
(Fig. 3). First, for each of the ten parsers, ParsRecRef predicts the performance of the
parser on the given reference string. Second, ParsRecRef ranks the parsers by their
predicted performance. Finally, ParsRecRef chooses the parser that was ranked highest
and applies it to the input reference string.</p>
      <p>In ParsRecRef the prediction of the performance of a parser is done by a linear
regression model. We train a separate regression model for every parser. Such a model
takes as input the vector of features extracted from the reference string and predicts
the F1 that the parser will achieve on this reference string. Table 1 visualizes the
supervised regression problem in ParsRecRef.</p>
      <p>For the sake of the machine learning models, the reference strings have to be
represented by vectors of features. The features were engineered to capture the citation
style and other information that potentially affects the extraction results. We use two
types of features: basic heuristics and n-grams.</p>
      <p>
        The heuristics-based features include:
• reference length (1 feature),
• number and fraction of commas (2 features),
• number and fraction of dots (2 features),
• number and fraction of semicolons (2 features),
• whether the reference starts with square bracket enumeration (e.g. “[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]”)
(1 feature),
• whether the reference starts with dot enumeration (e.g. “14.”) (1 feature).
N-gram features are binary features corresponding to 3- and 4-grams extracted from
the reference string. The terms in n-grams are classes of words, such as number,
capitalized word, comma, etc. These features capture style-characteristic sequences of
token classes. Example features include: number-comma-number (matching e.g. “3,
12”), capitalized word-comma-uppercase letter-dot (matching e.g. “Springsteen, B.”),
number-left parenthesis-number-right parenthesis (matching e.g. “5 (28)”). In
practice, thousands of distinct n-gram features are generated from the training set, and it is
important to select the ones most helpful for the prediction. In our system, we select
automatically 150 n-gram features using feature importance, calculated as part of a
random forest algorithm trained on the training set [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ].
      </p>
      <p>The response variable in the regression model in ParsRecRef is the F1 metric. F1
measures how well the metadata fields were extracted from the reference string. F1 is
the harmonic mean of precision and recall, calculated by comparing the set of
extracted metadata fields to the set of ground truth metadata fields. An extracted field is
correct if both type and value are equal to one of the ground truth fields. Precision is
the number of correct fields divided by the total number of extracted fields. Recall is
the number of correct fields divided by the total number of ground truth fields.</p>
      <p>ParsRecField chooses the potentially best single parser separately for each metadata
type in the input reference string. All chosen parsers are then applied to the input
reference string. From each parser, the system takes only those metadata fields, for
which this parser was chosen. For example, for a specific reference string, ParsRecField
might choose the following parsers: GROBID for extracting authors, title and journal,
Science Parse for extracting the year, and CERMINE for volume, issue and pages. In
this case, the final metadata fields will contain title field from GROBID, year from
Science Parse, etc.</p>
      <p>ParsRecField works in several steps (Fig. 4). First, ParsRecField iterates over all pairs
(parser, metadata type), and for each pair ParsRecField predicts whether the parser will
correctly extract the metadata type from the input reference string. Second, for each
metadata type, ParsRecField ranks the parsers based on the predicted probability of
being correct and chooses the parser ranked most highly. All chosen parsers are
applied to the input reference string and the fields are chosen according to the
previous choice of the parser.</p>
      <p>In ParsRecField the prediction of the correctness is done by a binary classifier based on
logistic regression. We train a separate classification model for each pair (parser,
metadata type). Such a model takes as input the vector of features extracted from the
reference string. The features are identical as in the case of ParsRecRef. The model
then predicts whether the parser will extract the given metadata field correctly. Apart
from a binary classification decision, the logistic regression model outputs the
probability of correctness, which is used for ranking. Table 2 visualizes the classification
problem in ParsRecField.</p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>
        For the experiments we used a closed dataset that comes from a commercial project
described in more detail in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The dataset is composed of 371,656 reference strings
and the corresponding parsed references, extracted from 9,491 documents from
chemical domains. The parsed references were manually curated and contain 1.9 million
metadata fields.
      </p>
      <p>The dataset contains 6 metadata types: author (the name of the first author), source
(the source of the referenced document, this can be the name of the journal or the
conference, URL or identifier such as arXiv id or DOI), year, volume, issue, and page
(the first page of the pages range). Unlike the typical reference parsing task, the title
of the referenced document was not required by the client of the business project and
is not annotated in the data.</p>
      <p>The data was randomly split in the following way: 40% of the documents for the
training of individual parsers (the training set), 30% of the documents for the training
of the parser recommender (the meta-learning set), and 30% of the documents for
testing (the test set). Since the split was random, it is possible that there were some
rare cases of the same reference string used for both training and testing (if it was
contained by two different documents).</p>
      <p>The training set will be used in the future for the training of single parsers, to make
them work better (this is outside the scope of this paper). The meta-learning set was
used for training of the meta-learning recommenders. All parsers were applied to the
meta-learning set and evaluated. As a result of the evaluation, we obtained
information about which individual metadata fields extracted by the parsers were correct,
as well as the overall F1 of each parser on each reference string. This corresponds
directly to the data needed for the training of the recommenders (Table 1 and Table
2). Finally, the test set was used for testing and comparisons.</p>
      <p>We compare the proposed approach against three baselines. The first baseline is
the best single parser (GROBID). The second baseline, called a hybrid baseline, uses
the best parser for each metadata type (i.e. ParsCit for author, Science Parse for year,
GROBID for other metadata types). The third baseline is a voting ensemble, in which
the final result contains only those metadata fields, that appear in the output of at least
three different parsers. We evaluate ParsRec in both versions, ParsRecRef and
ParsRecField. We report the results using precision, recall and F1 calculated for the
metadata fields.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>The overall results are presented in Fig. 5. In general, ParsRecField achieved the best
results, outperforming ParsRecRef by 2% (F1 0.909 vs. 0.891). This is most likely
caused by ParsRecField being more granular, i.e. it applies parsers separately for
different metadata fields, while ParsRecRef treats reference parsing as a single task.
Both variations of ParsRec outperform the best single parser (GROBID). ParsRecRef
achieved a 0.6% increase in F1 (0.891 vs. 0.886), reducing the false positive rate by
3.2% (0.091 vs. 0.095), and the false negative rate by 3.8% (0.127 vs. 0.132).
ParsRecField achieved a 2.6% increase in F1 (0.909 vs. 0.886), reducing the false
positive rate by 20.2% (0.075 vs. 0.094), and the false negative rate by 18.9% (0.107 vs.
0.132). We also used Student’s t-test to statistically compare the mean F1s over the
documents in the test set. Both versions of ParsRec achieved statistically significant
increase in mean F1 over GROBID (p = 0.0027 for ParsRecRef and p &lt; 0.001 for
ParsRecField). These improvements show that the recommender indeed learns useful
patterns from the data and is able to recommend parsers well.</p>
      <p>Both versions of ParsRec also outperform the voting ensemble. While ParsRecRef is
only marginally better (F1 0.890 vs. 0.891), ParsRecField achieved a 2.1% increase in
F1 (0.909 vs. 0.890). In the case of ParsRecRef, the increase in the mean F1 is not
statistically significant. In the case of ParsRecField the increase is significant (p &lt; 0.001).</p>
      <p>Only ParsRecField outperforms the hybrid baseline with a 1.6% increase in F1
(0.909 vs. 0.895). In this case, the increase in the mean F1 is significant (p &lt; 0.001).
ParsRecRef is slightly worse than the hybrid baseline. The reason is most likely the
fact that the hybrid baseline is more granular than ParsRecRef.</p>
      <p>Fig. 6 shows how often each parser is chosen in each type of ParsRec. In the case
of ParsRecRef, the distribution is more skewed. For example, one the two most often
chosen parsers (GROBID and CERMINE) is chosen in 88% of cases in ParsRecRef
and in 65% of cases in ParsRecField. Also, Science Parse, which is almost never chosen
in ParsRecRef, is chosen in 8% of cases in ParsRecField. These results show that
choosing a parser for different metadata types individually allows for the more effective use
of parsers specializing in certain fields, and gives better results.
The promising results of our evaluation clearly show the potential of the proposed
recommender system for reference parsers. Both proposed approaches outperform the
best single parser and the voting ensemble, which indicates that the recommender
indeed makes useful recommendations. One of the proposed approaches (ParsRecField)
also outperforms the hybrid baseline.</p>
      <p>In most cases, the increases in F1 are not large. We suspect the reason for this is
not enough diversity, both in the data and among the parsers. The data comes
exclusively from chemical papers, which might not include a lot of different reference
styles and languages. Six out of 10 parsers use Conditional Random Fields.</p>
      <p>Our plans for the future include training individual parsers, adding more features
(related to the language or source of the reference), diversifying the dataset and
adding more diverse reference parsers.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This research was conducted in collaboration with and part-funded by Iconic
Translation Machines Ltd. with additional financial support from Science Foundation Ireland
(SFI) under Grant Number 13/RC/2106. The project has also received funding from
the European Union’s Horizon 2020 research and innovation programme under the
Marie Sklodowska-Curie grant agreement No 713567.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Tkaczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sheridan</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Beel</surname>
          </string-name>
          ,
          <article-title>"ParsRec: Meta-Learning Recommendations for Bibliographic Reference Parsing,"</article-title>
          <source>in Proceedings of the Late-Breaking Results track part of the Twelfth ACM Conference on Recommender Systems (RecSys '18)</source>
          , Vancouver, BC, Canada,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Fedoryszak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tkaczyk</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Bolikowski</surname>
          </string-name>
          ,
          <article-title>"Large Scale Citation Matching Using Apache Hadoop,"</article-title>
          <source>in International Conference on Theory and Practice of Digital Libraries (TPDL)</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Hirsch</surname>
          </string-name>
          ,
          <article-title>"An index to quantify an individual's scientific research output that takes into account the effect of multiple coauthorship,"</article-title>
          <source>Scientometrics</source>
          , vol.
          <volume>85</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>741</fpage>
          -
          <lpage>754</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>T.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Glänzel</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Schubert</surname>
          </string-name>
          ,
          <article-title>"A Hirsch-type index for journals,"</article-title>
          <source>Scientometrics</source>
          , vol.
          <volume>69</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>169</fpage>
          -
          <lpage>173</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>B.</given-names>
            <surname>González-Pereira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. P.</given-names>
            <surname>Guerrero Bote and F. de Moya Anegón</surname>
          </string-name>
          ,
          <article-title>"A new approach to the metric of journals' scientific prestige: The SJR indicator,"</article-title>
          <source>J. Informetrics</source>
          , vol.
          <volume>4</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>379</fpage>
          -
          <lpage>391</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Torres-Salinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Moreno-Torres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>López-Cózar</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <article-title>"A methodology for Institution-Field ranking based on a bidimensional analysis: the IFQ2A index,"</article-title>
          <source>Scientometrics</source>
          , vol.
          <volume>88</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>771</fpage>
          -
          <lpage>786</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>P.</given-names>
            <surname>Ahlgren</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Colliander</surname>
          </string-name>
          ,
          <article-title>"Document-document similarity approaches and science mapping: Experimental comparison of five approaches,"</article-title>
          <source>J. Informetrics</source>
          , vol.
          <volume>3</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>63</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Beel</surname>
          </string-name>
          ,
          <article-title>Virtual Citation Proximity (VCP): Calculating Co-Citation-Proximity-Based Document Relatedness for Uncited Documents with Machine Learning [</article-title>
          <source>Proposal]</source>
          ,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Williams</surname>
            ,
            <given-names>H.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Khabsa</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Caragea</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Tuarob</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ororbia</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mitra</surname>
            and
            <given-names>C. L.</given-names>
          </string-name>
          <string-name>
            <surname>Giles</surname>
          </string-name>
          ,
          <article-title>"CiteSeerX: AI in a Digital Library Search Engine,"</article-title>
          <source>AI Magazine</source>
          , vol.
          <volume>36</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>C. Xiong</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Power</surname>
            and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Callan</surname>
          </string-name>
          ,
          <article-title>"Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding,"</article-title>
          <source>in WWW</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>J. Beel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Aizawa</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Breitinger</surname>
            and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Gipp</surname>
          </string-name>
          ,
          <article-title>"Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia,"</article-title>
          <source>in JCDL</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>J. Beel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Gipp</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Langer</surname>
            and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Breitinger</surname>
          </string-name>
          ,
          <article-title>"Research-paper recommender systems: a literature survey,"</article-title>
          <source>Int. J. on Digital Libraries</source>
          , vol.
          <volume>17</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>305</fpage>
          -
          <lpage>338</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>D.</given-names>
            <surname>Tkaczyk</surname>
          </string-name>
          , A. Collins,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sheridan</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Beel,</surname>
          </string-name>
          <article-title>"Machine Learning vs. Rules and Out-ofthe-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers,"</article-title>
          <source>in Proceedings of ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <article-title>"Biblio,"</article-title>
          [Online]. Available: http://search.cpan.org/~mjewell/Biblio-Citation-Parser1.
          <fpage>10</fpage>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <article-title>"Citation,"</article-title>
          [Online]. Available: https://github.com/nishimuuu/citation.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>"</surname>
          </string-name>
          Citation-Parser,
          <article-title>"</article-title>
          [Online]. Available: https://github.com/manishbisht/Citation-Parser.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <article-title>"PDFSSA4MET,"</article-title>
          [Online]. Available: https://github.com/eliask/pdfssa4met.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>C.-C. Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-H. Yang</surname>
            ,
            <given-names>C.-L.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <surname>J.-M. Ho</surname>
          </string-name>
          ,
          <article-title>"BibPro: A Citation Parser Based on Sequence Alignment,"</article-title>
          <source>IEEE Trans. Knowl. Data Eng.</source>
          , vol.
          <volume>24</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>250</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. I. Councill,
          <string-name>
            <given-names>C.</given-names>
            <surname>Giles and M.-Y. Kan</surname>
          </string-name>
          ,
          <article-title>"ParsCit: an open-source CRF reference string parsing package,"</article-title>
          <source>in International Conference on Language Resources and Evaluation</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>P.</given-names>
            <surname>Lopez</surname>
          </string-name>
          ,
          <article-title>"GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications," Research and Advanced Technology for Digital Libraries</article-title>
          , pp.
          <fpage>473</fpage>
          -
          <lpage>474</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>D.</given-names>
            <surname>Tkaczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Szostek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fedoryszak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dendek</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Bolikowski</surname>
          </string-name>
          ,
          <article-title>"CERMINE: automatic extraction of structured metadata from scientific literature,"</article-title>
          <source>International Journal on Document Analysis and Recognition</source>
          , vol.
          <volume>18</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>317</fpage>
          -
          <lpage>335</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>"</surname>
          </string-name>
          Anystyle-Parser,
          <article-title>"</article-title>
          [Online]. Available: https://github.com/inukshuk/anystyle-parser.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <article-title>"Reference Tagger,"</article-title>
          [Online]. Available: https://github.com/rmcgibbo/reftagger.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <article-title>"Science Parse,"</article-title>
          [Online]. Available: https://github.com/allenai/science-parse.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>A.</given-names>
            <surname>Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaur and M.-Y. Kan</surname>
          </string-name>
          ,
          <article-title>"Neural ParsCit: a deep learning-based reference string parser,"</article-title>
          <source>International Journal on Digital Libraries</source>
          , vol.
          <volume>19</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>323</fpage>
          -
          <lpage>337</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>A.</given-names>
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pettifer</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Voronkov</surname>
          </string-name>
          ,
          <article-title>"PDFX: fully-automated pdf-to-xml conversion of scientific literature,"</article-title>
          <source>ACM Symposium on Document Engineering</source>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>180</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Cuong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kan</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>"Scholarly Document Information Extraction using Extensible Features for Efficient Higher Order Semi-CRFs,"</article-title>
          <source>in JCDL</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <given-names>D.</given-names>
            <surname>Tkaczyk</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Bolikowski</surname>
          </string-name>
          ,
          <article-title>"Extracting Contextual Information from Scientific Literature Using CERMINE System," in Semantic Web Evaluation Challenges - Second SemWebEval Challenge at</article-title>
          ESWC,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <given-names>H.</given-names>
            <surname>Bast</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Korzen</surname>
          </string-name>
          ,
          <article-title>"A Benchmark and Evaluation for Text Extraction from PDF,"</article-title>
          <source>in JCDL</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <given-names>H.</given-names>
            <surname>Bast</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Korzen</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>The Icecite Research Paper Management System," in Web Information Systems Engineering</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <given-names>R.</given-names>
            <surname>Kern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hristakeva</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Granitzer</surname>
          </string-name>
          ,
          <article-title>"Teambeam - meta-data extraction from scientific literature,"</article-title>
          <source>DLib Magazine</source>
          , vol.
          <volume>18</volume>
          , no.
          <issue>7</issue>
          /8,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>C. Lemke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Budka</surname>
            and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Gabrys</surname>
          </string-name>
          ,
          <article-title>"Metalearning: a survey of trends and technologies,"</article-title>
          <source>Artificial Intelligence Review</source>
          , vol.
          <volume>44</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>130</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33. H.
          <string-name>
            <surname>Seo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kang</surname>
            and
            <given-names>G. G.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>"A Meta Learning Approach to Grammatical Error Correctio," in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers</article-title>
          - Volume
          <volume>2</volume>
          ,
          <string-name>
            <surname>Jeju</surname>
            <given-names>Island</given-names>
          </string-name>
          , Korea,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang and H. Wu</surname>
          </string-name>
          ,
          <article-title>"Ensemble Learning for Sentiment Classification," in Chinese Lexical Semantics</article-title>
          , Berlin,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekbal</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <article-title>"Simulated annealing based classifier ensemble techniques: Application to part of speech tagging,"</article-title>
          <source>Information Fusion</source>
          , vol.
          <volume>14</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>288</fpage>
          -
          <lpage>300</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>R. D. Burke</surname>
          </string-name>
          ,
          <article-title>"Hybrid Web Recommender Systems," in The Adaptive Web</article-title>
          ,
          <source>Methods and Strategies of Web Personalization</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <given-names>F.</given-names>
            <surname>Vahedian</surname>
          </string-name>
          ,
          <article-title>"Weighted hybrid recommendation for heterogeneous networks,"</article-title>
          <source>in RecSys</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>M. Braunhofer</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Codina</surname>
            and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Ricci</surname>
          </string-name>
          ,
          <article-title>"Switching hybrid for cold-starting context-aware recommender systems,"</article-title>
          <source>in RecSys</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39. A. Collins,
          <string-name>
            <given-names>J.</given-names>
            <surname>Beel</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Tkaczyk</surname>
          </string-name>
          ,
          <article-title>"One-at-a-time: A Meta-Learning RecommenderSystem for Recommendation-Algorithm Selection on Micro Level," CoRR, vol</article-title>
          . abs/
          <year>1805</year>
          .12118.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40. L.
          <string-name>
            <surname>Breiman</surname>
          </string-name>
          ,
          <article-title>"Random Forests,"</article-title>
          <source>Machine Learning</source>
          , vol.
          <volume>45</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>