<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ROBOKOP database, Bioinformatics (Oxford, England)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>TETYS: Towards the Next-Generation Open-Source Web Topic Explorer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna Bernasconi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Invernici</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Ceri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electronics, Information and Bioengineering - Politecnico di Milano</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Project Information Project Name: Topics Evolution That You See (TETYS); Funding Agency: innovation programme under the grant agreement 101069364 and is framed under Next Generation Internet Initiative); URL: https://annabernasconi.faculty.polimi.it/project-tetys/;</institution>
          <addr-line>Running Period: 01.09.2023 - 31.08.2024; Team: Anna Bernasconi (Team Leader</addr-line>
          ,
          <institution>Politecnico di Milano); Francesco Invernici (Politecnico di Milano); Stefano Ceri</institution>
          ,
          <addr-line>Politecnico di Milano</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>37</volume>
      <issue>2021</issue>
      <fpage>586</fpage>
      <lpage>587</lpage>
      <abstract>
        <p>Users who search the web for specialized content typically lack knowledge of the precise topology of the dataset upon which the search is performed. Funded by European Union, TETYS is a beneficiary of the Next Generation Internet (NGI) Search Initiative; it proposes to build the next-generation open-source Web topic explorer. Our architecture inspects big textual corpora; it is composed of 1) a pipeline for ingesting huge data corpora, extracting highly relevant topics, clustered along orthogonal dimensions; and 2) an interactive dashboard, supporting topic visualization as word clouds and exploration of temporal series, with easy-to-drive statistical testing. The first prototype, CORToViz, explores the CORD-19 dataset (COVID-19 / SARS-CoV-2 virus research abstracts). Many diferent domains will be explored using TETYS (e.g., climate change and controversial debates on social media).</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Big Data Analytics</kwd>
        <kwd>Scientific Literature</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Topic Modeling</kwd>
        <kwd>Time Series</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>interactive dashboard, activated after a data preparation stage, supporting topic visualization as
word clouds and their exploration through user-friendly interaction.</p>
      <p>
        The TETYS concept is already fully demonstrated in the CORToViz prototype, which explores
the CORD-19 dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] collected during the pandemic, focused on COVID-19 and the
SARSCoV-2 virus (see Section 2). An unbounded number of interesting domains can be explored
using the TETYS approach, including climate change and controversial debates on social media.
      </p>
      <p>TETYS is a beneficiary of the 2nd Open Call of the NGI Search, a cascade funding project
designed to help applicants (academic researchers, hi-tech startups, and SMEs) to adopt and
develop open source digital innovation in the domain of searching and discovering data or other
resources on the internet. Starting our path within the program in September 2023 with a
Technology Readiness Level (TRL) of 3, NGI Search is supporting us to develop the testbed into
a solid architecture, reaching TRL 5.
1.1. Scope
As a first testbed, we focus on research documents’ datasets, considering all the text reviewed
and published in edited proceedings and journals. By consolidating our first demonstrator,
we aim at making the pipeline applicable to any corpus of Web textual documents and then
validating the dashboard experience with solid and broad user studies. Subsequent TETYS
dashboards will be used in the context of user search on widely adopted platforms, whose
data can be freely accessed and processed for purposes of visualization. We build completely
stateless applications, without saving any client session data or their search details on back-end
databases. All results will be shared under open licenses; TETYS will contribute to fostering the
difusion of knowledge through the Open Science paradigm.
1.2. Work Plan (spanning 12 months)
WP1.1 Prototype refinement; Verification of the main technology employed [M1-3]
WP1.2 Deployment of cross-platform technology and test on diferent platforms/browsers
[M4-6]
WP2 User-centred validation using the testbed implementation (A/B testing, Demand Validation
Tests, Moderated Usability Study, etc.) [M1-6]
WP3.1 Technology consolidation based on feedback; Deployment of the advanced prototype
[M7-9]
WP3.2 Selection of other test cases and experimentation, porting the prototype to other domains
[M10-12]
WP4 Business plan preparation [M7-12]
Milestone 1 (end of M9): definition of user validation results (detailed report) and consolidation
of the initial testbed (manual and documentation)
Milestone 2 (end of M12): production plan (redaction of documents) and multi-domain
demonstrator, showing that TETYS can be quickly reapplied on unexplored domains (demo pilot)</p>
    </sec>
    <sec id="sec-2">
      <title>2. First prototype</title>
      <p>
        CORD-19 has enabled many text mining approaches [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], leading to remarkable results [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
building for instance knowledge graphs for research acceleration [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] and drug repurposing [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
resource annotation services [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], claim verification systems [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and purpose-specific language
models [9]. Since May 5th, 2023, the pandemic is no longer considered a public health emergency
by the Worlds Health Organization [10]; then, we may finally consider it as a concluded
phenomenon and therefore analyze its history as a whole.
      </p>
      <p>In this direction, the first TETYS prototype, named CORToViz (CORD-19 Topic Visualizer,
online at http://gmql.eu/cortoviz) aims to show how the big literature corpus CORD-19 can
be successfully exploited to gather a comprehensive overview of the pandemic, tracing the
trends that have characterized its scientific literature narrative. Our first results are described
in a journal publication [11], where the software architecture shown in Figure 1 is described,
comprising modules for data preparation, hyper-parameter optimization, fitting of the topic
model, with final data transformation for enabling time series visualization and keyword search
functionalities. CORToViz ofers insight into the various topics that have denfied the global
COVID-19 crisis, their interactions, and temporal dynamics. Furthermore, we demonstrate the
advantages of employing a statistical approach for dynamic topic modeling based on results
from deep learning-based language models. This prototype utilizes the most comprehensive
dataset available to date about research on COVID-19 and SARS-CoV-2.</p>
      <p>The code of the prototypal architecture of CORToViz is available on Docker (https://hub.
docker.com/r/frinve/cortoviz/) and the code is open source on GitHub (https://github.com/
FrInve/TETYS/), under license BSD 3-clause that permits distribution, changes, and
commercial/private use. CORToViz addressed the community of COVID-19 researchers; TETYS will be
applicable to any corpus of Web documents, reaching the wider community of Web users who
explore textual content.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Excellence</title>
      <p>TETYS contributes to addressing the need of users who search the web for specialized
content. They typically use buzzwords and start searching with just one of them, then exploiting
preliminary results to fuel following search iterations, thereby enriching the query set. This
process is required by the obvious lack of knowledge of the precise topology of the dataset upon
which the search is performed. More importantly, it generally happens that users do not know
anything about the temporal context in which specific words have been used and which topics
were more trending at what time. Can a data-driven statistical/visualization-based strategy be
introduced to make Web users 1) more aware of the semantic content of the analyzed dataset
and 2) understand the temporal evolution of the topics in the domain they are exploring? Word of
clouds and topic modeling have been around for some time for representing documents of big
corpora, but no full-stack approaches have been thus far proposed for working on large-scale
evolving domains.</p>
      <p>TETYS has the ambition to make research information more accessible to common Web users
who access very technical and specialistic contributions, primarily textual data with semantics.
At present times, topic modeling and – more in general techniques of machine learning-based
natural language processing – are proposed in considerably customized cases and are applied
successfully in specific and circumscribed contexts. With TETYS, we propose a completely
general full-stack process that is virtually applicable to any corpus of medium-sized textual
documents, using any topic model of choice, and a time-series visualizer. TETYS aims to address
the two problems expressed above, providing a tangible way for Web users to 1) explore semantic
contents of big document corpora; 2) appreciate the temporal evolution of topics. More importantly,
it allows building dashboards for achieving (1) and (2) very quickly thanks to a lightweight
technology stack applicable to any Web textual document dataset.</p>
      <p>Regarding competitors, several tools and platforms employ topic modeling on top of existing
datasets. These have already been used in business-oriented settings for their ease of use
and efective outputs. For instance, MonkeyLearn implements several algorithms that can
be customized without coding; MarketMuse helps with content optimization and automates
content audits; DataScienceCentral ofers tutorials and resources on various angles of data
science, including topic modeling; and Gensim and Spacy are examples of popular NLP libraries
for topic modeling.</p>
      <p>BERTopic [12] is a recent Python library to create dense and interpretable topics; it is exploited
within TETYS, as a core component for topic extraction. In terms of proposing an innovative
solution, the novelty of TETYS stands in bringing a one-click solution to apply this data science
technique on virtually any possible document corpus, allowing lightweight analytics at the service
of Web searchers. Here, BERTopic – or any other topic model – can be embedded within TETYS.</p>
      <p>Specifically within COVID-19-related matters, other works have previously focused on topic
analysis: some analyzed the early stages of the pandemic [13, 14], others analyzed the broader
ifeld of coronaviruses [ 15], focused on topic distribution by country [16] or on the delineation
and impact in scientometric terms of the early CORD-19 [17]. The approach conducted in the
CORToViz prototype is broader, as it applies to the entire pandemic history without choosing a
specific field of investigation a priori.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Impact</title>
      <p>In terms of scientific impact , TETYS will contribute to fostering the difusion of knowledge
through the Open Science paradigm. TETYS will enable improved access to summarized and
digested content both on domain-specific Web content (e.g., reviews on water-scoping machines)
and more domain-general ones (e.g., climate change reports). Similarly, it will be applied to both
highly technical texts (e.g., scientific research abstracts) and general texts (e.g., book reviews).
The flexibility of the approach makes it applicable to solving needs in very diverse domains
and markets. TETYS will be able to address the very diverse needs of stakeholders requiring a
one-click stack that provides immediate high-level analytics on the topics of the text that are
relevant to that field. This can potentially be exploited to quickly grasp trends in -for
instancee-commerce reviews, events feedback tweets, public engagement threads, or any other business
in which observing temporal trends is crucial.</p>
      <p>TETYS will bring an immediate positive impact on the communication strategy in multiple
ifelds. We are building a second prototype on climate change-related research abstracts (and
social media text, in parallel), so as to study the evolution in the interest in the topic from
the research community. Studies of this kind can immediately stimulate public debate and
awareness of environmental changes. TETYS will also help improve private/public services
to meet relevant environmental policies or goals; indeed, environmental policy or planning
decisions can be informed by the evidence collected through TETYS. For instance, this principle
can be applied in public engagement in public works or urban planning (e.g., to quickly scan
emails from citizens and grasp the trend in their proposals and interests). We will contribute to
increasing levels of engagement of members of the public with research, and corresponding
levels of confidence in public science dialogue.</p>
      <p>TETYS can be configured as an add-on to several search engines or social media platforms.
It will not save any information on the user side but only employ data from the platform
that the user – eventually with a personal login – already has access to. Quick elaboration
of topics contained in the text documents on the platform, resulting in powerful immediate
visualization, will be an added value to platforms, increasing the engagement of users and
prospectively bringing more interest toward them, thus economic value. Prospectively, we
envision a TETYS browser extension that improves the search of known engines (e.g., Google
Chrome has ChatGPT, Google Scholar, InvisibleHand, or Google Similar Pages add-ons). Notably,
no extensions that analyze topics and their evolution currently exist for common browsers,
making the approach of TETYS highly promising and impactful. TETYS will only work on
selected domains that require a pre-processing of the Web content to allow for a quick reaction
to user inputs.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Project status</title>
      <p>The CORToViz prototype has been described in [11] and shared at http://gmql.eu/cortoviz/; we
are expanding the scope of the documents’ dataset encompassing major research publication
editors’ data (namely, Scopus and Springer). We are preparing the first dashboards regarding
research macro areas (e.g., regarding climate change-related topics or inequalities/inclusion in
research).</p>
      <p>In parallel, we are designing a completely novel front-end experience targeted to specific
personas with research and communication backgrounds, while conducting market landscaping
to understand key players in our (niche) industry.</p>
      <p>The project has been presented at the SFSCON conference (South Tyrol Free Software
Conference in Bozen, 2023) gathering the interest of a wide audience of open-source software
practitioners; we have also been interviewed in the context of many Community events with
NGI Search partners.</p>
      <p>The one-year project duration is a short period; it is been dedicated to guaranteeing that
the idea stands on solid technological and business bases, while the development of the core
contribution will continue longer in our team, leveraging on the obtained funding.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Relevance to CAiSE</title>
      <p>The TETYS project connects with the ‘Artificial Intelligence and Machine Learning ’ CAiSE area
of interest. Specifically, our approach relies on a meticulous selection of modern technologies,
including large language models. This leads to developing an architecture tailored for clustering
articles across orthogonal dimensions and employing extraction techniques for temporal topic
mining. In this sense, we strengthen our connection with the ‘Big Data architectures’ CAiSE
topic. Our focus is not on proposing a new model per se but on engineering a pipeline for
identifying topics in a big corpus of text documents and exploring their temporal trend with
an interactive interface. The components of the pipeline are carefully crafted and extended
from the ones of an existing topic modeling framework, BERTopic. Additionally, we also benefit
from having a statistical approach for dynamic topic modeling built on the results of deep
learning-based language models – this feature contributes in the area of ‘Big Data, Data Science
and Analytics’, being able to handle large sizes of text in short times. Approximately, the
whole pipeline, ingesting 1 million abstracts, performing filtering, preparation, learning topic
representation, classifying abstracts, building time series, and visualizing the results in the final
dashboard, requires less than 20 hours.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Outlook</title>
      <p>CORToViz is the first research demonstrator showcasing the TETYS approach. TETYS will
consolidate this demonstrator, making the pipeline applicable to any corpus of Web textual
documents, and validating the dashboard experience with solid and broad user studies. TETYS
brings Web topic modeling from its current restricted research scale to the next level, making it
applicable on a pilot scale on several other domains of interest for Web users.</p>
      <p>TETYS research is a beneficiary of the NGI Search 2nd Open Call. Funded
by the European Union. Views and opinions expressed are however those
of the author(s) only and do not necessarily reflect those of the European
Union or European Commission. Neither the European Union nor the
granting authority can be held responsible for them. Funded within the
framework of the NGI Search project under grant agreement No 101069364.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chandrasekhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Reas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Burdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Katsis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Kinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Merrill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mooney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Murdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sheehan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Wade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. X. R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wilhelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Raymond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kohlmeier</surname>
          </string-name>
          , CORD-
          <volume>19</volume>
          : The COVID-19 open research dataset, in: K. Verspoor,
          <string-name>
            <given-names>K. B.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dredze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          , J. May,
          <string-name>
            <given-names>R.</given-names>
            <surname>Munro</surname>
          </string-name>
          , C. Paris, B. Wallace (Eds.),
          <source>Proceedings of the 1st Workshop on NLP for COVID-19 at ACL</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <article-title>Text mining approaches for dealing with the rapidly expanding literature on COVID-19, Briefings in Bioinformatics 22 (</article-title>
          <year>2021</year>
          )
          <fpage>781</fpage>
          -
          <lpage>799</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parulian</surname>
          </string-name>
          , G. Han,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Ma, J. Tu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chauhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ji</surname>
          </string-name>
          , J. Han,
          <string-name>
            <given-names>S.-F.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pustejovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liem</surname>
          </string-name>
          , A. ELsayed, M. Palmer,
          <string-name>
            <given-names>C.</given-names>
            <surname>Voss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Onyshkevych</surname>
          </string-name>
          , COVID
          <article-title>-19 literature knowledge graph construction and drug repurposing report generation, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Logette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lorin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Favreau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Oshurko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Coggan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Casalegno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Sy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Monney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bertschy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Delattre</surname>
          </string-name>
          , et al.,
          <article-title>A machine-generated view of the role of blood glucose levels in the severity of COVID-19, Frontiers in Public Health 9 (</article-title>
          <year>2021</year>
          )
          <fpage>695139</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wise</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Calvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          , G. Karypus,
          <string-name>
            <given-names>G.</given-names>
            <surname>Price</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kulkani</surname>
          </string-name>
          , COVID
          <article-title>-19 knowledge graph: Accelerating information retrieval and discovery for scientific literature</article-title>
          ,
          <source>in: Proceedings of Knowledgeable NLP: the First Workshop on Integrating Structured Knowledge and Neural Networks for NLP</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>T.-H. K. Huang</surname>
          </string-name>
          , C.-Y. Huang,
          <string-name>
            <surname>C.-K. C. Ding</surname>
            ,
            <given-names>Y.-C.</given-names>
          </string-name>
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          <string-name>
            <surname>Giles</surname>
          </string-name>
          , CODA-19:
          <article-title>Using a non-expert crowd to annotate research aspects on 10,000+ abstracts in the COVID-19 open research dataset</article-title>
          , in: K. Verspoor,
          <string-name>
            <given-names>K. B.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dredze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          , J. May,
          <string-name>
            <given-names>R.</given-names>
            <surname>Munro</surname>
          </string-name>
          , C. Paris, B. Wallace (Eds.),
          <source>Proceedings of the 1st Workshop on NLP for COVID-19 at ACL</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Serna</surname>
          </string-name>
          <string-name>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Al</given-names>
            <surname>Khalaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Invernici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Bernasconi,
          <article-title>CoVEfect: interactive system for mining the efects of SARS-CoV-2 mutations and variants based on deep learning</article-title>
          ,
          <source>GigaScience</source>
          <volume>12</volume>
          (
          <year>2023</year>
          )
          <article-title>giad036</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wadden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Zuylen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <article-title>Fact or Fiction: Verifying Scientific Claims</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>