<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Alignment of Post-Publication Reviews &amp; Bibliometric and Altmetric Impact</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dirk Tunger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philipp Schaer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>A Case Study on Expert Statements from the Science Media Center</institution>
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Research Center Juelich</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TH Köln - University of Applied Sciences</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>11</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>In the context of academic publishing and peer review, this study investigates the relationship between postpublication expert evaluations, their agreement levels, and the subsequent scientific and public recognition of the reviewed research. Using expert statements from the Science Media Center Germany as a dataset, we analyze Research in Context reviews to examine the alignment between qualitative post-publication assessments and bibliometric as well as altmetric indicators. We employ a Large Language Model to translate unstructured expert reviews into a structured rating scheme. Furthermore, we correlate these evaluations with citation counts from the Web of Science and alternative impact metrics such as the Altmetric Attention Score, news mentions, and Mendeley readership statistics from the Altmetric Explorer. We investigate the alignment of positive or critical post-publication reviews and high or low citation or altmetric counts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Science Journalism</kwd>
        <kwd>Reviews</kwd>
        <kwd>LLM-based rating</kwd>
        <kwd>Bibliometrics</kwd>
        <kwd>Altmetrics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Academic peer review is a fundamental element of the scientific publication and credibility system [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
The rigorous feedback from peers who are both independent and qualified to judge the merits of
scientific studies and publications advances science. It helps to guarantee a high level of quality of
research. Although the current peer-review system is not without critique and alternatives like open
peer reviews have been discussed [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], it is one of the cornerstones of modern science.
      </p>
      <p>
        Peer reviewers are expected to examine the rationale behind research questions and evaluate the
originality, strengths, or weaknesses. Reviewers have the role of a gatekeeper, as only publications
that meet a discipline’s (unoficial and sometimes contradicting and arbitrary) quality standard are
published. Reviewers evaluate what is relevant enough to be published. Relevance is a central aspect of
peer reviewing in this case, as topicality, research field-specific requirements, and research standards
are among the most influential factors to be considered. We argue that the decisions that reviewers [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
have to make are similar to the relevance decisions researchers have to make when searching and
ifltering for relevant information in their field of research, as they are multi-faceted and -layered [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <p>
        However, researchers are not the only target group for scientific information. The general public and
science journalists, who act as intermediaries between science and the public, are important actors in
scientific communication. In this role, science journalists must scan the vast scientific output to find
groundbreaking studies that are potentially relevant to the public, a challenge Herbert Simon described
as “a wealth of information creates a poverty of attention” [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. To mitigate this information crisis
for journalists and the general public, specialized research information providers such as the Science
Media Center (SMC) Germany created services around scientific information. One of these services are
“Research in Context” (RIC) reviews, where exclusive embargo agreements with publishers worldwide
are used to provide the public with additional post-publication reviews. An inside view that is typically
hidden by the blind peer review process. As the reviews are not aimed to guide a decision about the
acceptance and publication of a manuscript, the reviewers are more free in their judgment, as their
praise or critique does not directly impact the publication process anymore – This decision is already
over. We call them post-publication reviews to underline this special case.
      </p>
      <p>While RIC and the other SMC services aim to provide early and recent access to science, the scientific
system relies on credibility and recognition systems that mostly use citations but also mentions in
social media and the like, known as Altmetrics. Previous research has shown how these two forms of
scientific attributions go hand in hand, but very little research was conducted that connects the outcome
and content of peer reviews with the later scientific and public attribution, measured by citations and
altmetrics, respectively.</p>
      <p>With the help of an exclusive data set of RIC post-publication reviews on scientific studies, we would
like to work along the following research questions:
RQ1 How can a Large Language Model transfer unstructured qualitative post-publication reviews
of experts into a quantitative rating scheme? How good is the inter-annotator agreement for
diferent configurations of this transfer process?
RQ2 Is there an alignment of positive or critical post-publication reviews and high or low citation
rates? Do these alignments also correspond with altmetrics?</p>
    </sec>
    <sec id="sec-2">
      <title>2. Datasets and Methods</title>
      <sec id="sec-2-1">
        <title>2.1. Research in Context Post-publication Reviews</title>
        <p>The SMC Germany is a non-profit organization funded by the Klaus Tschira Trust. Its mission is
to support journalists by providing access to exclusive pre-publication versions of scientific studies,
so-called fact sheets on complex topics curated by specialists, or specific services around recent research
and peer reviewing. Academic publishers like Springer, Nature, etc., have established a system that
provides journalists with information about upcoming studies in advance under a press embargo
(between 2-5 days). The studies are not yet publicly available, but journalists can get early access
after registration and verification. In addition to access to the studies themselves, the SMC curates
the “Research in Context” (RIC) service (now called “Statements”1, where experts in the field help to
provide insights into cutting-edge research results and to provide journalists with orientation before
reports on recent studies are written and published. Experts have first-hand access to the full texts of
the studies. They must give an informed review to highlight strengths and weaknesses and provide an
overall evaluation of the underlying research. This allows journalists to judge the significance of a new
scientific finding by relying on the perspective of independent researchers more quickly and before the
embargo period expires. After the embargo period, the RIC reviews will be published to the general
public on the SMC website, where the reviewers’ names are revealed.</p>
        <p>RIC reviews are a special case of peer reviews on scientific literature. While they are independent of
classic pre-publication peer reviews, their reviewers don’t have to weigh the pros and cons to come
up with a clear suggestion for or against publication; their reviews are publicly visible and are not
blinded. So, RIR reviewers are both more free to judge and put their finger in the wound, but they
might also be more restricted due to the public nature of their reviews. However, we see these RIC
reviews as an interesting research data set as we gain access to recent, real-world reviews on scientific
literature without any domain-specific filtering. The articles cover various topics, such as climate
research, energy, digital sciences, and medicine. The disciplines and topics addressed by the Science
Media Center (SMC) are primarily those with direct social relevance and of public interest, such as
medical topics and diseases like cancer, as well as environmental issues like climate change.
1https://www.sciencemediacenter.de/angebote?story_type=Statements (last accessed: 26 March 2025)
Scientific</p>
        <p>Journal</p>
        <p>Publication</p>
        <p>Post Publication
Reviews organized 
by the SMC</p>
        <p>Extracted DOI to
original paper</p>
        <p>Matched with WoS citation data</p>
        <p>Matched with Altmetric Explorer data
Extracted statements
from domain experts</p>
        <p>Map statements into a score using gpt4o-mini</p>
        <p>Match into one
data set for this
study via DOI</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data Acquisition and Inter-Rater Agreement</title>
        <p>We crawled the website of SMC Germany in May 2024 and extracted a total of 521 RIC articles. These
articles include diferent reviews (called statements) by domain experts on a specific primary article.
We extracted the DOI or the reference to the primary article. For all 521 RIC articles, we extracted a
total number of 1943 statements (≈ 3.72 review statements per article). We ignored all editorial content
around the reviewers’ statements, such as highlighted quotes or subtitles, focusing only on the review
statements themselves (see Figure 1).</p>
        <p>The extracted review statements are unstructured text without any score-based rating. To map the
text-based reviews into a comparable score, we utilized OpenAI’s gpt4o-mini2. The prompt instructed
the model to generate a score between 0 and 1 for each review along the following criteria:
K0 Examination of the research question (e.g. are the aims and rationale clearly formulated?)
K1 Evaluation of originality (contribution, increase in knowledge in the literature or in the subject)</p>
        <sec id="sec-2-2-1">
          <title>K2 The strengths and weaknesses of the method described are clearly stated</title>
          <p>K3 Specific comments on the writing of the manuscript (e.g. spelling, organisation, illustrations, etc.)</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>K4 Author’s interpretation of the results and conclusions drawn from the results</title>
          <p>K5 Comments on the statistics where appropriate (e.g. whether they are robust and fit for purpose
and whether the controls and sampling mechanisms are suficiently and well described)
Values close to 0 mean that the RIC reviewers expressed weaknesses or problems in the respective
evaluation criteria. Values close to 1 mean that the evaluation criteria have been fulfilled to the reviewers’
satisfaction. If there is insuficient data for the evaluation criteria or a clear decision, the LLM was
advised to give a value of NA. The evaluation criteria K0 to K5 were extracted from the oficial Elsevier
guide on how to conduct a review3.</p>
          <p>
            Next, we computed the inter-rater agreement between the scores for the diferent reviewers per
paper to measure the agreement level and identify controversial or non-controversial articles. We
utilized Krippendorf’s Alpha to measure the agreement as an alternative to the widely used Fleiss’
Kappa values. The advantages of Alpha over Kappa statistics are that while for Kappa, all assessors
have to rate the same number of subjects and use the same scale, the Alpha coeficient can usually
handle more variations and computes reliabilities that are comparable across any numbers of assessors
and values, diferent metrics, and unequal sample sizes [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. Krippendorf [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] argues for the use of Alpha
in comparison to other measures because of its independence from the number of assessors and its
robustness against imperfect data. No fixed or recommended values for Alpha are given, but a value of
more than 0.8 is considered a (near) perfect agreement.
2The code, extracted review texts, and R scripts to compute the inter-rater agreement are available here: https://github.com/
irgroup/SCOLIA2025-Research-in-Context (last accessed: 26 March 2025)
3https://www.elsevier.com/reviewer/how-to-review (last accessed: 26 March 2025)
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Citations and Altmetrics</title>
        <p>With the Web of Science as developed by Eugene Garfield in the 1970s, it was possible to search for
literature and see how often other scientists have cited an individual publication. Although the reasons
why scientists cite each other are varied, the number of citations that a publication has received indicates
how relevant other scientists consider this publication.</p>
        <p>
          Altmetrics complements traditional bibliometrics with citation statistics from social media and other
online media. Thus, altmetrics can be compared to the introduction of the Science Citation Index, which
enabled scientists to track where they have been cited for the first time. The only diference is that
these “citations” are called news items, blog posts, likes, reads, shares, or Mendeley readerships: this
type of altmetrics indicates how many users have saved a specific publication to their personal libraries,
reflecting interest and potential future use. It “calculates impact indicators for authors based on the
number of users who stored their articles in the reference management system Mendeley” [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. This
provides early signals of scholarly engagement, often preceding and complementing traditional citation
metrics.
        </p>
        <p>All kinds of altmetrics make scientific impact visible more quickly than traditional bibliometrics
because they evolve more quickly and dynamically.</p>
        <p>Together, altmetrics and bibliometrics form the basis for a multidimensional view of scientific impact
from diferent angles: Bibliometrics is the part of classical scientific communication that takes much
more time to make changes visible, because every citation is a publication that has to go through the
complete cycle of a paper, especially a time-consuming peer review process.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. RQ1: LLM-based encoding of reviews and inter-rater agreement</title>
        <p>The encoding of unstructured reviews onto values between 0 and 1 for six diferent evaluation criteria
was conducted using OpenAI’s gpt4o-mini. We ran the experiment three times to see the range of
values and to evaluate the inter-rater agreement between the diferent LLM runs. In the set of 521
crawled review articles, the oldest dates back to April 2016, and the newest is from March 2024.</p>
        <p>In a pretest based on gpt4o with only the first 50 RIC articles, we ran the annotation process three
times in a row to compare the variance produced by the LLM. The inter-rater agreement for the three
runs was 0.24, 0.2, and 0.28.</p>
        <p>In Table 1 (left), we see the outcome of the diferent Alpha values on 0.2 steps for the results of all 521
RIC articles with gpt4o-mini. 114 articles had contradicting reviewing scores indicated by a negative
Alpha value. In contrast, there were 49 articles, and all reviews were perfectly aligned. For 48 articles,
we could not compute an inter-rater agreement due to missing data or parsing errors during the web
crawls. The agreement averages an Alpha value of 0.27, a comparable value to the pretest that was
based on the larger and more expensive gpt4o model4.</p>
        <p>In Table 1 (right), we see the results of the LLM encoding on the six criteria K0 to K5. From a total
of 1943 possible single reviews, the most missing criterion is K5, with 1390. So, in 71% of the reviews,
the LLM could not find any mention of the underlying statistics. While this might be uncommon for a
typical pre-publication peer review, the special setting of RIC seems to encourage reviewers to leave
out these comments. Maybe because the papers under (post-)review are already accepted, and therefore,
comments on these fundamental issues are already sorted out.</p>
        <p>We can note two outcomes on RQ1: First, while there are diferences in the LLM annotation outcome,
it doesn’t matter much with respect to the achieved inter-rater agreement in the end. Second, the cost
diferences between gpt4o-mini and gpt4o don’t shine through on a higher inter-rater agreement.
For this kind of task, cheaper and smaller models seem good enough. Nevertheless, we have to note that
the general agreement between the virtual raters is not very high. Given the overall controversial nature
of the underlying data, that might not be surprising, and we have to consider that we applied a rating</p>
        <sec id="sec-3-1-1">
          <title>4At the time of writing, the cost of gpt4o-mini was only 6% of gpt4o.</title>
          <p>
            scheme that was not mandatory for the original domain experts when writing their RIC statements.
However, in a large meta-study [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] with a total of 19,443 manuscripts, an even lower agreement with a
Cohen’s Kappa of 0.17 was reported, putting the numbers into perspective.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. RQ2: Alignment of review scores with citations and altmetrics</title>
        <p>0.8 - 1, is higher than the average value, but even in the lowest class 0 - 0.19 the citation rate is not
diferent from the average. Therefore, from our point of view, the picture here is indiferent.</p>
        <p>
          Altmetrics are diferent, as we see in Table 3: While bibliometrics determine the perception of science
within the scientific community, altmetrics focus on the attention paid to science in online media. In
practice, this means that to receive a citation in bibliometrics, a completely new scientific paper that
has successfully completed the long peer review process and is also published in a scientific journal is
required. This process is very time-consuming [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Altmetrics is entirely diferent. The perception
measured here relates to news sites on the Internet and sources such as Facebook, all online sources
that have not been reviewed by peers. It is, therefore, much quicker to obtain attention to a scientific
publication via the sources measured with the help of Altmetrics.
        </p>
        <p>If we look at the results, we can clearly see that both the Altmetrics Attention Score per paper and
News Mentions per paper increase the higher the averaged K0 to K5 values get. In other words, the
clearer and more positive the RIC reviews by the peers, the stronger the measured perception per paper.
This is particularly evident in the two highest classes, but the overall trend across all classes is also very
clear here. This confirms both the selection of papers and topics by the SMC and the assessments of the
peers surveyed. It is not really surprising that this efect is measurable in the online media in particular,
as this is precisely the field of application of the assessments for which the SMC’s assessments are
obtained. The described efect does not apply to Mendeley Readerships (bookmarks on a scientific
paper). This is because this parameter, which is one of the altmetrics, is more firmly anchored in science
in terms of content and also shows a corresponding reaction here.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions and Future Work</title>
      <p>In summary, based on this small case study, we can say that there is a correlation between the evaluation
of a paper by peers, less if you look at bibliometric parameters and more if you look at altmetrics.
We argue that the results of this case study might help to understand what diferent criteria might be
underlying when deciding on the relevance of scientific articles. A clear limitation of our approach
is that the sample of Research in Context reviews is only a proxy for these decisions and that we
needed to rely on LLM encodings to analyze the unstructured reviews. Although we can see reasonable
LLM-based annotations when checking random results, we could not re-assess the whole document set.
Nevertheless, our proposed pipeline can describe what articles are controversial, where reviewers agree
or disagree, and how these measured criteria relate to a later reception in academia and social media:
Two diferent user groups that obviously give diferent kinds of recognition.</p>
      <p>
        In a broader sense, we think that this work can lead to a better understanding of relevance decisions in
information access, as we presented in our previous work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Based on the observations that decisions
encoded in information retrieval test collections overlap with high citation rates/altmetric scores, we
implemented this as a beneficial service in the form of a rank fusion approach [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>In future work, we would like to investigate the underlying dynamics further. What is the specific
efect of the six reviewed criteria? What of those caused the most disagreement? Is there a diferent
efect for diferent scientific disciplines? How is the influence of newer LLMs, like DeepSeek-V3 or
OpenAI’s o3?</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgment</title>
      <p>The bibliometric data used in this paper are based on the local installation of Web of Science from the
Competence Network for Bibliometrics located at DZHW in Berlin. The altmetric data used originate
from the Altmetrics Explorer of Altmetric.com. We want to thank Meik Bittkowski for providing access
and inside information on RIC and Nils Grote for crawling the SMC websites and conducting the initial
LLM-based prestudy.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and Grammarly to draft content, check
grammar and spelling, paraphrase, and reword. After using these tools, the authors reviewed and edited
the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Spier</surname>
          </string-name>
          ,
          <article-title>The history of the peer-review process</article-title>
          ,
          <source>Trends in Biotechnology</source>
          <volume>20</volume>
          (
          <year>2002</year>
          )
          <fpage>357</fpage>
          -
          <lpage>358</lpage>
          . doi:
          <volume>10</volume>
          .1016/S0167-
          <volume>7799</volume>
          (
          <issue>02</issue>
          )
          <fpage>01985</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ross-Hellauer</surname>
          </string-name>
          ,
          <article-title>What is open peer review? A systematic review</article-title>
          ,
          <source>F1000Research</source>
          <volume>6</volume>
          (
          <year>2017</year>
          )
          <article-title>588</article-title>
          . doi:
          <volume>10</volume>
          .12688/f1000research.11369.2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Jeferson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Alderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Wager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Davidof</surname>
          </string-name>
          ,
          <article-title>Efects of Editorial Peer Review: A Systematic Review</article-title>
          , JAMA
          <volume>287</volume>
          (
          <year>2002</year>
          )
          <article-title>2784</article-title>
          . doi:
          <volume>10</volume>
          .1001/jama.287.21.2784.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Breuer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tunger</surname>
          </string-name>
          ,
          <article-title>Relations between relevance assessments, bibliometrics and altmetrics</article-title>
          , in: G. Cabanac,
          <string-name>
            <surname>I. Frommholz</surname>
          </string-name>
          , P. Mayr (Eds.),
          <source>Proceedings of the 10th International Workshop on Bibliometric-enhanced Information Retrieval co-located with 42nd European Conference on Information Retrieval, BIR@ECIR</source>
          <year>2020</year>
          , Lisbon, Portugal, April 14th,
          <year>2020</year>
          [online only], volume
          <volume>2591</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>101</fpage>
          -
          <lpage>112</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2591</volume>
          /paper-10.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Breuer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tunger</surname>
          </string-name>
          ,
          <article-title>Relevance assessments, bibliometrics, and altmetrics: a quantitative study on pubmed and arxiv</article-title>
          ,
          <source>Scientometrics</source>
          <volume>127</volume>
          (
          <year>2022</year>
          )
          <fpage>2455</fpage>
          -
          <lpage>2478</lpage>
          . URL: https://doi.org/10.1007/ s11192-022-04319-4. doi:
          <volume>10</volume>
          .1007/S11192-022-04319-4.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <article-title>Designing organizations for an information rich world</article-title>
          , in: M.
          <string-name>
            <surname>Greenberger</surname>
          </string-name>
          (Ed.), Computers, communications, and
          <article-title>the public interest</article-title>
          ,
          <source>Baltimore</source>
          ,
          <year>1971</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <article-title>Better than their reputation? on the reliability of relevance assessments with students</article-title>
          , in: T. Catarci,
          <string-name>
            <given-names>P.</given-names>
            <surname>Forner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Peñas</surname>
          </string-name>
          , G. Santucci (Eds.),
          <source>Information Access Evaluation</source>
          . Multilinguality, Multimodality, and
          <string-name>
            <surname>Visual</surname>
          </string-name>
          Analytics - Third
          <source>International Conference of the CLEF Initiative, CLEF</source>
          <year>2012</year>
          , Rome, Italy,
          <source>September 17-20</source>
          ,
          <year>2012</year>
          . Proceedings, volume
          <volume>7488</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2012</year>
          , pp.
          <fpage>124</fpage>
          -
          <lpage>135</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -33247-0_
          <fpage>14</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -33247-0\_
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendorf</surname>
          </string-name>
          ,
          <article-title>Reliability in content analysis</article-title>
          ,
          <source>Human Communication Research</source>
          <volume>30</volume>
          (
          <year>2004</year>
          )
          <fpage>411</fpage>
          -
          <lpage>433</lpage>
          . URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-
          <fpage>2958</fpage>
          .
          <year>2004</year>
          .tb00738.x. doi:https: //doi.org/10.1111/j.1468-
          <fpage>2958</fpage>
          .
          <year>2004</year>
          .tb00738.x.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Haustein</surname>
          </string-name>
          ,
          <source>Multidimensional Journal Evaluation: Analyzing Scientific Periodicals beyond the Impact Factor, Knowledge &amp; Information</source>
          , De Gruyter/Saur, Berlin ; Boston,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bornmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mutz</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-D. Daniel</surname>
          </string-name>
          ,
          <article-title>A reliability-generalization study of journal peer reviews: A multilevel meta-analysis of inter-rater reliability and its determinants</article-title>
          ,
          <source>PLOS ONE 5</source>
          (
          <year>2010</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . doi:
          <volume>10</volume>
          .1371/journal.pone.
          <volume>0014331</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Clermont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krolak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tunger</surname>
          </string-name>
          ,
          <article-title>Does the citation period have any efect on the informative value of selected citation indicators in research evaluations?</article-title>
          ,
          <source>Scientometrics</source>
          <volume>126</volume>
          (
          <year>2021</year>
          )
          <fpage>1019</fpage>
          -
          <lpage>1047</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11192-020-03782-1.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Breuer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Kreutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tunger</surname>
          </string-name>
          ,
          <article-title>Bibliometric Data Fusion for Biomedical Information Retrieval</article-title>
          ,
          <source>in: 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)</source>
          , IEEE,
          <string-name>
            <surname>Santa</surname>
            <given-names>Fe</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NM</surname>
          </string-name>
          , USA,
          <year>2023</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>118</lpage>
          . doi:
          <volume>10</volume>
          .1109/JCDL57899.
          <year>2023</year>
          .
          <volume>00026</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>