<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Modes of Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Autumn Toney-Wails</string-name>
          <email>autumn.toney@georgetown.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kornraphop Kawintiranon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lisa Singh</string-name>
          <email>lisa.singh@georgetown.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul-Emmanuel Courtines</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Lin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haofei Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgetown University</institution>
          ,
          <addr-line>Washington, D.C.</addr-line>
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The AAAI-23 Workshop on Scientific Document Understanding</institution>
          ,
          <addr-line>Febru-</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an initial analysis of scientific misinformation from three areas of research: Computer Science, Environmental Science, and Medicine. We investigate keywords in publication titles and abstracts from retracted scientific publications, which we view as a proxy for misinformation publications. Using the Altmetric Attention Score as a signal of publication popularity, we group articles into low-popularity and high-popularity subsets. We apply three modes of learning (unsupervised, semi-supervised, and supervised), to identify main themes from scientific research publications and compare the results between publication popularity sets. We find that while there is overlap among the terms identified by diferent methods, they are not the same. However, general topic coverage using diferent words is similar, highlighting the dificulty in identifying keyword “markers” for popular, poor-quality scientific information.</p>
      </abstract>
      <kwd-group>
        <kwd>scientific documents</kwd>
        <kwd>misinformation</kwd>
        <kwd>altmetrics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR</p>
      <p>
        ceur-ws.org
1. Introduction
troversial scientific research areas that have
discrepancies surrounding their scientific validity, particularly in
politically-charged environments [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. Recent studies
have shown a rise in public skepticism of scientists and
scientific research, with 35% of Americans believing that
the scientific method may be used to produce “any
result a researcher wants” and less than 20% of Americans
believing that scientists are transparent in their work
and hold themselves accountable for mistakes in their
publications [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">4, 3, 5</xref>
        ]. This scientific distrust and
controversy is a leading factor in research focusing on scientific
misinformation, as it undermines the public’s ability to
consume and trust scientific information [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>While there is no universal set of steps that leads to
scientific discovery, there are particular characteristics of
research across all disciplines of science that distinguish it
Generally, the scientific method involves 1) developing
a theory or hypothesis, 2) conducting qualitative and/or
quantitative experiments to measure observations and
CEUR
htp:/ceur-ws.org
ISN1613-073</p>
      <p>CEUR</p>
      <p>Workshop Proceedings (CEUR-WS.org)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).</p>
      <p>be principled, as it relies on reproducible experiments and
evidence-based conclusions. However, with the increase
for information sharing partnered with the “publish or
perish” reality, the challenge of preserving the rigour and
reliability of scientific research is magnified [</p>
      <p>
        <xref ref-type="bibr" rid="ref5 ref8">8, 5</xref>
        ].
      </p>
      <p>Scientific misinformation is dificult to characterize,
and as a result, dificult to identify [</p>
      <p>
        <xref ref-type="bibr" rid="ref10 ref3 ref9">9, 10, 3</xref>
        ]. We adopt
the following scientific misinformation definition from
Southwell et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: “publicly available information that
is misleading or deceptive relative to the best available
scientific evidence and that runs contrary to statements
by actors or institutions who adhere to scientific
principles.” The majority of research on misinformation focuses
on news articles and social media in the context of fake
news and propaganda campaigns and analyzes how these
stories disseminate through social networks. We found
that a critical limitation of this avenue of work is that
scientific misinformation is not yet well-researched and
      </p>
      <p>
        In this paper, we link scientific misinformation content
to popularity. We are interested in understanding if it is
possible to tease out themes of those pieces of scientific
that are not. Here, we use retracted publications as a
proxy for identifying publications with a high potential
for misinformation and the Altmetic1 Attention Score as
a proxy for publication popularity. For this exploratory
analysis, we compare text analysis techniques that
employ diferent modes of learning: unsupervised,
semisupervised, and supervised. Each text analysis technique
is performed on retracted scientific publications with
1www.altmetric.com
from general inquiry, which make it rigorous and reliable. there are no available ground-truth datasets.
mentation [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. Thus, scientific research is considered to
collect results, and 3) deriving conclusions from experi- thought that are poor quality and popular from those
low popularity and high popularity in major research (GTM) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In addition to text, GTM takes a set of seed
domains: Computer Science, Environmental Science, and words for topics as input and implements the
GeneralMedicine. We find that all three methods produce compli- ized Polya Urn (GPU) sampling method to help keep seed
mentary, non-overlapping, but not contradictory results, words within a single topic together during the
generahighlighting the complexity of identifying “markers” for tion process. We selected GTM because we have short,
popular, poor-quality scientific information. noisy text, and GTM generates both a topic and noise
dis
      </p>
      <p>To summarize, the main contributions of this paper tribution, removing words that are domain-specific but
are as follows: 1) analyzing scientific misinformation appear across a large number of topics. It also identifies
across diferent domains of research, 2) measuring the other topics that domain experts may have missed. In
prevalence of scientific misinformation, and 3) comparing our implementation, we use the default parameters for
learning text analysis techniques applied to scientific GTM.
research publications. Predictive Modeling (supervised): We train a
Decision Tree on our datasets to test if we can identify
important n-gram features (key terms) in predicting if
2. Experimental Design a research publication is in the top or bottom 10% of
Altmetric Attention Scores. We use sklearn’s tree
implementation and its default parameters.</p>
      <p>
        We apply three modes of learning for text analysis on
our data. First, we use unsupervised learning methods
for traditional keyword extraction. Next, we employ a
semi-supervised, generative topic model that uses ex- 3. Datasets
pert identified seed terms to guide the topic discovery
process.Lastly, we run an interpretable, supervised ma- For our analysis, we use retracted publications as a proxy
chine learning model that predicts popularity and identify for scientific research that could be scientific
misinformakeyword features that are used to separate the classes. tion. By using these scientific publications in our study
Figure 1 shows the overview process. Each method uses we are not definitively labeling them as scientific
misintext from the titles and abstracts of scientific publications. formation. An example of a peer-reviewed, retracted (due
We normalize the text by setting all tokens to lowercase, to misinformation) publication is Hydroxychloroquine or
removing urls, digits, symbols, and the word retracted. chloroquine with or without a macrolide for treatment of
This normalized text is the input to all of our models. COVID-19: a multinational registry analysis [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. This
publication is in the top 5% of all research outputs, from
any year, scored by Altmetric [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Figure 2 displays the
overview of attention found on Altmeteric for this
publication, which received an Altmetric Attenion Score of
22,503. We are interested in the comprehensive Altmetric
Attention Score (displayed in the colorful circle), which
represents a combination of all the attention a
publication receives (displayed in the category counts on the far
left).
      </p>
      <p>
        Keyword Extraction Methods (unsupervised): We
use the three keyword extraction methods as shown in
Table 1: 1) term frequency-inverse document frequence
(TF-IDF), 2) YAKE [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and 3) KeyBERT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Each
method provides a diferent approach to keyword
extraction (term frequency, unsupervised feature extraction,
and contextualized word embeddings), enabling us to
compare results across extraction methods. The last two
columns of the table show the Python package used and
the non-default parameters in cases where the default 1
parameters were not used.
      </p>
      <p>Generative Modeling (semi-supervised): Be- Figure 2: Altmertic.com overview of attention.
cause we have some domain knowledge, we test a
semi-supervised topic model, Guided Topic-Noise Model
Number of Tweeters
3,244</p>
      <p>
        Retraction Watch Database: We used the publicly
available, manually curated Retraction Watch Database
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Retraction Watch contains 22,614 articles with a
DOI, enabling us to link the articles to Dimensions, a
large scientific literature database, and obtain their titles
and abstracts for analysis. Because Retraction Watch is
manually curated, each retracted paper is labeled with
at least one reason for retraction; there are 105 unique
reasons, such as Investigation by Journal/Publisher,
Concerns/Issues About Data, and Unreliable Results. Table
2 provides the top five retraction reasons by number of
publications for the research areas that we analyze. There
is minimal overlap in the top five reasons across research
areas, but at least three of the five reasons are concerned
with scientific integrity related to data and methods.
      </p>
      <p>
        Dimensions: Our dataset of paper titles and abstracts
is sourced from Dimensions, an inter-linked research
information system provided by Digital Science [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. We
have three sets of scientific research articles that we
select from Dimensions: Computer Science, Environmental
Science, and Medicine. Each publication in Dimensions
      </p>
    </sec>
    <sec id="sec-2">
      <title>Popularity</title>
      <p>Low
High</p>
    </sec>
    <sec id="sec-3">
      <title>Comp.</title>
    </sec>
    <sec id="sec-4">
      <title>Science</title>
      <p>
        is labeled with a broad area of research, which we use to
create our subsets of publications. Using the DOIs from
these three publication sets, we query the Altmetric API
to identify publications with Altmetric attention scores
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The Altmetric Attention Score is a weighted count
of the online attention a research publication receives
from various groups, such as scientists, policy-makers,
news sources, and the general public. The Altmetric
Attention Score is not an indicator of scientific impact.
      </p>
      <p>For each of the three subsets of research
publications (Computer Science, Environmental Science, and
Medicine) with Altmetric scores, we generate two
subcategories, low-popularity and high-popularity. We
select the publications with a bottom 10% Altmetric
Attention Score as low-popularity and the publications with
a top 10% Altmetric Attention Score as high-popularity.
Table 3 displays the number of retracted publications in
each of the six categories we analyze. Medicine has
significantly more publications with Altmertic data compared
to Computer and Environmental Science.</p>
      <sec id="sec-4-1">
        <title>4. Empirical Evaluation</title>
        <p>We perform our text analysis on the low-popularity and
high-popularity sets of scientific research publications
from our three domains. For all methods, except GTM,
the only input required is the input text; GTM also
requires a seed set of words organized by topics. We
implemented noiseless Latent Dirichlet Allocation on all sets
Computer Science
Low Popularity High Popularity
autoimmune,
biomass, cells, gut, nutrient,
gene, physiology,
photothermal, radiation, therapeutic
CsdceeooctmmeutWchrmmtieitiroyurua,nnenp,sliyiectmcesaagstotiaboinoinlon,eg,raBApiphoyp/Mliceadtiicoanl acmAMllguelogtsohdotreoiertrdlihiitn,nmhgtgmes,c,lshecanalarinsnqdsiuniefgise,r</p>
        <p>Environmental Science</p>
        <p>Low Popularity High Popularity
Paleogeology
cretaceous,
paleoenvironment,
paleolatitudes,
stratigraphy
Geology
sediments, shale, soil</p>
        <p>Renewable Energy
clean, climate,
policy, renewable</p>
        <p>Marine
fisheries, marine,
seafood</p>
        <p>Natural Disaster
earthquake, rupturing,
seismic, tsunami, volcanic
Climate Change
change, climate,
deforestation, global
analysis, clinical,
compared, control,
effects, results, study</p>
        <p>Medicine
Low Popularity High Popularity
cancer. cells, cancer, cells,
ovarian, chemotherapy,
metastasis, tumor Cancer oncology, tumor</p>
        <p>COVID-19
covid, exposure,
facemasks, ivermectin,
Clinical Trials pccrl,inviaccacl,inoebjective,</p>
        <p>outcomes, study, trial
Osteoporosis
bone, knee, joint,
osteoporosis
of publications to find candidate seed words that could be
organized into coherent topics and then manually select
the final list of seed words. Table 4 displays the seed
words selected for the GTM experiments.</p>
        <p>We first compared our results across all five methods
for each subset of research and publication popularity
and found that no terms appeared in all five methods
for any subset of publications that we analyzed.
However, we did find that diferent words related to the same
theme appeared across all five methods; for example the
high popularity, Medicine results has facemasks (TF-IDF),
adult exposure (KeyBERT), ivermectin (YAKE), pcr
(decision tree), and covid (GTM).</p>
        <p>While the keyword results across all five methods
varied, we find general themes for each research area and
popularity (see Figure 3). Under each theme we provide
a sample of keywords that appeared from at least one
of the methods. Computer Science and Medicine have
overlapping themes between the low popularity and high
popularity publications, whereas Environmental Science
does not. Additionally, Computer Science has a theme
relating to biology and medicine applications in both low
and high popularity subsets, which resulted in words
that are not directly related to computer science, such as
biomass and radiation.</p>
        <p>We find that the Medicine subset of research
publications produced the most coherent results, perhaps
indicating that these methods perform best on larger sets of
documents.</p>
      </sec>
      <sec id="sec-4-2">
        <title>5. Conclusions</title>
        <p>In this work, we investigate scientific research
misinformation. As an initial analysis, we select publications from
three broad areas of research (Computer Science,
Environmental Science, and Medicine) and attempt to identify
keyword diferences between low popularity and high
popularity scientific misinformation using unsupervised,
semi-supervised, and supervised modes of learning on
scientific research publication text. We find that across
all experimental results, we are able to identify themes
of research topics in each research area using diferent
learning approaches, but some themes overlap in
popularity levels, highlighting the complexity of using keywords
as indicators for this task. Future work will consider
using network metrics to identify popular poor quality
scientific information.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Acknowledgments</title>
        <p>This work was supported in part by the Massive Data
Institute (MDI), the Fritz Family Fellows Program, and
the Center for Security and Emerging Technology (CSET)
at Georgetown University.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Scheufele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Krause</surname>
          </string-name>
          , Science audiences, misinformation, and fake news,
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>116</volume>
          (
          <year>2019</year>
          )
          <fpage>7662</fpage>
          -
          <lpage>7669</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Farrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>McConnell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brulle</surname>
          </string-name>
          ,
          <article-title>Evidence-based strategies to combat scientific misinformation</article-title>
          ,
          <source>Nature climate change 9</source>
          (
          <year>2019</year>
          )
          <fpage>191</fpage>
          -
          <lpage>195</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Southwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S. B.</given-names>
            <surname>Brennen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Paquin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Boudewyns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zeng</surname>
          </string-name>
          , Defining and measuring scientific misinformation,
          <source>American Academy of Political and Social Science</source>
          <volume>700</volume>
          (
          <year>2022</year>
          )
          <fpage>98</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Kabat</surname>
          </string-name>
          ,
          <article-title>Taking distrust of science seriously: To overcome public distrust in science, scientists need to stop pretending that there is a scientific consensus on controversial issues when there is not</article-title>
          ,
          <source>EMBO reports 18</source>
          (
          <year>2017</year>
          )
          <fpage>1052</fpage>
          -
          <lpage>1055</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <article-title>Key findings about americans' confidence in science and their views on scientists' role in society</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H. G.</given-names>
            <surname>Gauch</surname>
          </string-name>
          , Scientific method in practice, Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>National</given-names>
            <surname>Academies of Sciences</surname>
          </string-name>
          , Engineering, and Medicine,
          <source>Reproducibility and Replicability in Science, Technical Report</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sarewitz</surname>
          </string-name>
          ,
          <article-title>The pressure to publish pushes down quality</article-title>
          ,
          <source>Nature</source>
          <volume>533</volume>
          (
          <year>2016</year>
          )
          <fpage>147</fpage>
          -
          <lpage>147</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E. K.</given-names>
            <surname>Vraga</surname>
          </string-name>
          , L. Bode,
          <article-title>Defining misinformation and understanding its bounded nature: Using expertise and evidence for describing misinformation</article-title>
          ,
          <source>Political Communication</source>
          <volume>37</volume>
          (
          <year>2020</year>
          )
          <fpage>136</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Druckman</surname>
          </string-name>
          , Threats to science: Politicization, misinformation, and inequalities,
          <source>The ANNALS of the American Academy of Political and Social Science</source>
          <volume>700</volume>
          (
          <year>2022</year>
          )
          <fpage>8</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mangaravite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pasquali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jorge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <article-title>Yake! keyword extraction from single documents using multiple local features</article-title>
          ,
          <source>Information Sciences</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grootendorst</surname>
          </string-name>
          ,
          <article-title>Keybert: Minimal keyword extraction with bert</article-title>
          .,
          <year>2020</year>
          . URL: https://doi. org/10.5281/zenodo.4461265. doi:
          <volume>10</volume>
          .5281/zenodo. 4461265, https://doi.org/10.5281/zenodo.4461265.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , E. Duchesnay,
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Churchill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>A guided topic-noise model for short texts</article-title>
          , in: International World Wide Web Conference,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Mehra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Desai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruschitzka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Patel</surname>
          </string-name>
          , Retracted:
          <article-title>Hydroxychloroquine or chloroquine with or without a macrolide for treatment of covid19: a multinational registry analysis</article-title>
          ,
          <source>Lancet (London, England)</source>
          (
          <year>2020</year>
          )
          <fpage>S0140</fpage>
          -
          <lpage>6736</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Altmetric</surname>
          </string-name>
          , Altmetric.com, www.almetric.com/,
          <year>2012</year>
          . Accessed:
          <fpage>2022</fpage>
          -01-25.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <article-title>The Center For Scientific Integrity, The Retraction Watch Database</article-title>
          , http://retractiondatabase. org/,
          <year>2018</year>
          . ISSN:
          <fpage>2692</fpage>
          -
          <lpage>465X</lpage>
          . Accessed:
          <fpage>2022</fpage>
          -01-25.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Hook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Porter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Herzog</surname>
          </string-name>
          ,
          <article-title>Dimensions: building context for search and evaluation</article-title>
          ,
          <source>Frontiers in Research Metrics and Analytics</source>
          <volume>3</volume>
          (
          <year>2018</year>
          )
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>