<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Can AI-estimated article quality be used to rank scholarly documents?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mike Thelwall</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Wolverhampton</institution>
          ,
          <addr-line>Wolverhampton WV1 1LY</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <fpage>10</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>This paper discusses the potential for machine learning predict the quality of scholarly documents to help rank them in information retrieval systems. Quality-based rankings may help users without the time or expertise to assess the value of the publications suggested by a system. It is argued that systems to learn the quality of documents with a degree of accuracy may be possible from the increasing availability of reviews and scores online.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        A key feature of scholarly information retrieval systems is their ranking algorithms. Users
may focus on the first documents that they see, unless they are attempting a comprehensive
review. The use of citation information to rank scholarly search results is arguably appropriate
for academics because citations are an obvious, but partial, indicator of scholarly uptake or
utility. A document that has been cited a lot is very likely to have been read by many publishing
researchers and found useful enough to cite. In contrast, end users may be more interested in
applied research. For this goal, citations may be less helpful, especially if they tend to point
to basic or methodological papers rather than practical applications. Their value may also be
undermined by attempts to manipulate them (e.g., [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). For end users, it may therefore be better
to rank papers by quality rather than by citation impact. Ranking-by-quality may also help in
the era of predatory publishing, by pointing end users and junior academics to high quality work
that is relevant to their needs. Both user groups may lack the time or experience to perform
efective quality control on search results. Whilst this issue may be resolved by collaborative
ifltering approaches (e.g., [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) it would be useful to rank documents before they have been seen.
      </p>
      <p>
        The main reason why academic articles are not ranked by quality in any mainstream scholarly
database may be that such quality scores are not available for most articles. Both journals and
conferences usually make binary publishing decisions (accept/reject) after reviewing and do
not publish a quality assessment or reviewers’ quality scores. Since there are increasingly many
exceptions (e.g., some open peer review conferences, F1000 post-publication ratings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) and
there may be a future increase in post-publication peer review scores for articles, there soon
may be enough public peer review scoring data for systems to harness, when available. The
score data may be supplemented by algorithms to classify reviews or post-publication comments
(e.g., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) for sentiment, or to detect problematic content in articles (e.g., [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]). An alternative
method of generating article quality scores would be to apply machine learning on a sample of
articles with peer reviews, perhaps from the aforementioned sources, and then use the trained
algorithms to estimate the quality scores for the remainder.
      </p>
      <p>
        Following on from the above, is it possible and desirable to use machine learning to estimate
the quality of an academic article to support ranking in academic information retrieval systems?
One study has predicted proxy-quality scores for articles with machine learning, using journal
impact (split into thirds) as a proxy for article quality. Using this heuristic, it is possible to
generate proxy quality predictions that are substantially above the baseline in some fields, but
not others [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This suggests that automatically detecting quality will be much more dificult in
some fields than in others. Intuitively, more hierarchical fields with standardised methods would
be more easily to check quality for, given that deviations from best practice could theoretically
be detected. In contrast, in a humanities field, it might take substantial or wide field knowledge
to judge the quality of outputs.
      </p>
      <p>Preliminary unpublished experiments to predict article quality with machine learning applied
to tens of thousands of human quality scores (high, medium, low) for articles in 27 Scopus broad
ifelds suggest that the highest accuracy is possible for the following biomedical and physical
science Scopus broad fields: Multidisciplinary; Biochemistry, Genetics and Molecular Biology;
Physics and Astronomy; Chemistry. In contrast, this task is most dificult or impossible in the
following Scopus broad fields: Engineering; Agricultural and Biological Sciences; Psychology;
Social Sciences; Environmental Science; Energy; Arts and Humanities; Dentistry; Nursing; and
Pharmacology &amp; Toxicology. Thus, the potential for harnessing machine learning for article
quality prediction may be restricted to the biomedical and physical sciences.</p>
      <p>
        Based on previous studies on predicting citation counts [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8, 9, 10, 11</xref>
        ], the following
recommendations are made for a ranking system to reflect article quality in fields where it is
possible.
      </p>
      <p>• Journal impact thirds, quartiles or other groupings can be used as the target of a machine
learning system in fields in which journal impact is a reasonable indicator of article quality
(medicine, health, physical sciences, economics, psychology) but not in areas where
citations have little value (engineering, other social sciences, arts and humanities). This
could be replaced by post publication or peer review scores when they become available
in suficient numbers. If this replacement is made, then a journal impact indicator could
become an input.
• Machine learning should be applied to data segmented into narrow coherent fields to
give the algorithms the chance to learn field-specific quality patterns.
• Inputs should be field and year normalised (e.g., not citation counts but normalised
variants such as the Mean Normalised Citation Score (MNCS) or the Mean Normalised
Log-transformed Citation Score (MNLCS)) so that related fields and years can be combined
to gain suficient training data.
• Valuable types of inputs include all those shown to associate with citation rates, including:
(normalised) article citations, number of authors, number of institutional afiliations,
article length, number of country afiliations, career publishing statistics of the authors,
and abstract readability.
• Text inputs, such as words and phrases used in the article title and abstract may reveal
important topics, which are more relevant to citations than quality. They may also point to
high quality methods (e.g., randomised control trials) and identify more subtle indicators
of high-quality work, such as appropriate hedging or shared data/code. If full text can be
analysed, then factors like the number of figures and tables in a paper may be useful in
judging the amount of evidence supporting the article in some fields.</p>
      <p>The above approach is clearly quite citation-dependant but at least moves one step away
from a pure reliance on citations.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Oriensubulitermes</surname>
            <given-names>inanis</given-names>
          </string-name>
          [pseudonym], PubPeer comment https://pubpeer.com/ publications/940C291607CF03969C6A936F8BA5B9#
          <fpage>2</fpage>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kershaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pettit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hristakeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jack</surname>
          </string-name>
          ,
          <article-title>Learning to rank research articles: A case study of collaborative filtering and learning to rank in ScienceDirect</article-title>
          ,
          <source>in: Proceedings BIR</source>
          <year>2020</year>
          ,
          <year>2020</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>88</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2591</volume>
          /paper-08.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Thelwall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Papas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Nyakoojo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Weigert</surname>
          </string-name>
          ,
          <article-title>Identification of highly-cited papers using topic-model-based and bibliometric features</article-title>
          ,
          <source>Online Information Review</source>
          <volume>44</volume>
          (
          <year>2020</year>
          ). doi:https://doi.org/10.1108/OIR-11-2019-0347.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Ortega</surname>
          </string-name>
          ,
          <article-title>Classification and analysis of PubPeer comments: How a web journal club is used</article-title>
          ,
          <source>Journal of the Association for Information Science and Technology</source>
          (
          <year>2021</year>
          ). doi:https: //doi.org/10.1002/asi.24568.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cabanac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Labbé</surname>
          </string-name>
          ,
          <article-title>Prevalence of nonsensical algorithmically generated papers in the scientific literature</article-title>
          ,
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>72</volume>
          (
          <year>2021</year>
          )
          <fpage>1461</fpage>
          -
          <lpage>1476</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cabanac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Labbé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Magazinov</surname>
          </string-name>
          ,
          <article-title>Tortured phrases: A dubious writing style emerging in science</article-title>
          .
          <source>Evidence of critical issues afecting established journals</source>
          ,
          <year>2021</year>
          . arXiv preprint arXiv:
          <volume>2107</volume>
          .
          <fpage>06751</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Thelwall</surname>
          </string-name>
          ,
          <article-title>Can the quality of published academic journal articles be assessed with machine learning?</article-title>
          <source>, Quantitative Science Studies</source>
          (
          <year>2022</year>
          ). doi:https://doi.org/10.1162/ qss_a_
          <fpage>00185refs</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abrishami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Aliakbary</surname>
          </string-name>
          ,
          <article-title>Predicting citation counts based on deep neural network learning techniques</article-title>
          ,
          <source>Journal of Informetrics</source>
          <volume>13</volume>
          (
          <year>2019</year>
          )
          <fpage>485</fpage>
          -
          <lpage>499</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y. H.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Tai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <article-title>Identification of highly-cited papers using topicmodel-based and bibliometric features</article-title>
          ,
          <source>Journal of Informetrics</source>
          <volume>14</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <article-title>Early prediction of scientific impact based on multibibliographic features and convolutional neural network</article-title>
          , IEEE Access (
          <year>2019</year>
          )
          <fpage>92248</fpage>
          -
          <lpage>92258</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>T. van Dongen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Wenniger</surname>
          </string-name>
          , L. Schomaker,
          <article-title>SChuBERT: Scholarly document chunks with bert-encoding boost citation count prediction</article-title>
          ,
          <year>2020</year>
          . arXiv preprint arXiv:
          <year>2012</year>
          .11740.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>