<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>W. J. Wiersinga, A. Rhodes, A. C. Cheng, S. J. Pea- for Biology and Medicine</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>An Open-Publishing Response to the COVID-19 Infodemic</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Halie M. Rando</string-name>
          <email>halie.rando@cuanschutz.edu</email>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simina M. Boca</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucy D'Agostino McGowan</string-name>
          <email>lucydagostino@gmail.com</email>
          <xref ref-type="aff" rid="aff9">9</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel S. Himmelstein</string-name>
          <email>daniel.himmelstein@gmail.com</email>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael P. Robson</string-name>
          <email>michael.robson@villanova.edu</email>
          <xref ref-type="aff" rid="aff8">8</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincent Rubinetti</string-name>
          <email>vince.rubinetti@gmail.com</email>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ryan Velazquez</string-name>
          <email>rnhvelazquez@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Casey S. Greene</string-name>
          <email>greenescientist@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anthony Gitter</string-name>
          <email>gitter@biostat.wisc.edu</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alex's Lemonade Stand Foundation, Childhood Cancer Data Lab</institution>
          ,
          <addr-line>Philadelphia, PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Azimuth1</institution>
          ,
          <addr-line>McLean, VA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Georgetown University Medical Center, Innovation Center for Biomedical Informatics</institution>
          ,
          <addr-line>Washington, DC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Morgridge Institute for Research</institution>
          ,
          <addr-line>Madison, WI</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Colorado School of Medicine, Center for Health AI</institution>
          ,
          <addr-line>Aurora, CO</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Colorado School of Medicine, Department of Biochemistry and Molecular Genetics</institution>
          ,
          <addr-line>Aurora, CO</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>University of Pennsylvania, Perelman School of Medicine, Department of Systems Pharmacology and Translational Therapeutics</institution>
          ,
          <addr-line>Philadelphia, PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff7">
          <label>7</label>
          <institution>University of Wisconsin-Madison, Department of Biostatistics and Medical Informatics</institution>
          ,
          <addr-line>Madison, WI</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff8">
          <label>8</label>
          <institution>Villanova University, Department of Computing Sciences</institution>
          ,
          <addr-line>Villanova, PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff9">
          <label>9</label>
          <institution>Wake Forest University, Department of Mathematics and Statistics</institution>
          ,
          <addr-line>Winston-Salem, NC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>8</volume>
      <issue>2013</issue>
      <fpage>367</fpage>
      <lpage>368</lpage>
      <abstract>
        <p>The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-todate data from online sources nightly, regenerating some of the manuscript's figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this efort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many eforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;COVID-19</kwd>
        <kwd>living document</kwd>
        <kwd>open publishing</kwd>
        <kwd>open source</kwd>
        <kwd>data integration</kwd>
        <kwd>Manubot</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>6 Related Sciences</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <sec id="sec-2-1">
        <title>Coronavirus Disease 2019 (COVID-19) caused a worldwide public health crisis that has reshaped many aspects</title>
        <p>of society. The scientific community has, in turn, devoted
significant attention and resources towards COVID-19
and the associated virus, SARS-CoV-2, resulting in the
release of data and publications at a rate and scale never
previously seen for a single topic. Over 20,000 articles
about COVID-19 were released in the first four months
of the pandemic [1], causing an “infodemic” [1, 2]. The
COVID-19 Open Research Dataset (CORD-19) [3], which
was developed in part with the goal of training machine
learning algorithms on COVID-19-related text, illustrates
the growth of related scholarly literature (Figure 1). This
resource was developed by querying several sources for
terms related to SARS-CoV-2 and COVID-19, as well
as the coronaviruses SARS-CoV-1 and MERS-CoV and
their associated diseases [3]. CORD-19 contained 768,929
manuscripts as of September 6, 2021. Additional
curation by CoronaCentral [4] has produced, at present, a
set of over 180,000 publications particularly relevant to
COVID-19 and closely related viruses. Despite many
advances in understanding the virus and the disease, there
are also downsides to the availability of so much infor- now-defunct http://covidpreprints.com4, comments on
mation. "Excessive publication" has been recognized as a preprint servers5 [12], and even a journal6. However, the
concern for over forty years [5] and has been discussed explosive rate of publication presents challenges for such
with respect to the COVID-19 literature [6]. Any efort eforts, many of which are no longer active. Similarly,
to synthesize, summarize, and contextualize COVID-19 many literature reviews have been written on the
availresearch will face a vast corpus of potentially relevant able COVID-19 literature [13, 14, 15, 16, 17], but static
material. reviews quickly become outdated as new research is
released or existing research is retracted or superseded.</p>
        <p>One example is a review of topics in COVID-19 research
including vaccine development [17]. This review was
published on July 10, 2020, four days before Moderna
released the surprisingly promising results of their phase 1
trial [18] that changed expectations surrounding vaccines.</p>
        <p>Therefore, the COVID-19 publishing climate presented a
challenge where curation of the literature by a diverse
group of experts in a format that could respond quickly
to high-volume, high-velocity information was desirable.</p>
        <p>We therefore sought to develop a platform for
scientific discussion and collaboration around COVID-19 by
adapting open publishing infrastructure to
accommodate the scale of COVID-19 publishing. Recent advances
in open publishing have created an infrastructure that
facilitates distributed, version-controlled collaboration
on manuscripts [19]. Manubot [19] is a collaborative
Figure 1: Growth of the CORD-19 dataset. The number framework developed to adapt open-source software
deof articles has proliferated, with both traditional and preprint velopment techniques and version control for manuscript
manuscripts in the corpus. The first release (March 16, 2020) writing. With Manubot, manuscripts are managed and
contained 28,000 documents [3]. As of September 6, 2021, maintained using GitHub, a popular, online version
conthis had increased to 768,929 articles. Of these, 30,726 are trol interface. We selected Manubot because it ofers
sevpreprints from arXiv, medRxiv, and bioRxiv. eral advantages over comparable collaborative writing
platforms such as Authorea, Overleaf, Google Docs, Word</p>
        <p>Online, or wikis [19]. Citation-by-identifier ensures
con</p>
        <p>Information was released rapidly by both traditional sistent reference metadata standards that would be
difipublishers and preprint servers, and many papers faced cult to maintain manually in a manuscript with dozens of
subsequent scrutiny. The number of COVID-19 papers re- authors and over 1,500 citations. Manubot’s pull
requesttracted may be higher, and potentially much higher, than based contribution model balances the goals of making
is typical, although a thorough investigation of this ques- the project open to everyone and maintaining scientific
tion requires more time to elapse [7, 8]. Many preprints accuracy. All contributions are reviewed, discussed, and
and papers are also associated with corrections or ex- formally approved on GitHub before text updates appear
pressions of concern1 [8]. Preprints are released prior to in the public-facing manuscript7. Continuous integration
peer review, but some traditional publishing venues have (CI) seamlessly combines author-produced text and
figfast-tracked COVID-19 papers through peer review, lead- ures with automatically generated and updated statistics
ing to questions about whether they are held to typical and figures derived from external data sources and the
standards [9]. Therefore, evaluating the COVID-19 liter- manuscript’s own content. In addition, the authors who
ature requires not only digesting available information initially launched this project included Manubot
developbut also monitoring subsequent changes. ers who had prior successes using Manubot for massively</p>
        <p>Because of the fast-moving nature of the topic, open and traditional manuscript, such as a large-scale
many eforts to summarize and synthesize the COVID- collaborative eforts such as a review of developments
19 literature have been undertaken. These eforts in deep learning [20] and a re-evaluation of the role of
include newsletters2 [10], web portals3 [11] or the authorship in modern collaborations [21].</p>
        <p>Collaboration via massively open online papers has
1https://asapbio.org/preprints-and-covid-19 as well as https://
retractionwatch.com/retracted-coronavirus-covid-19-papers</p>
        <p>2https://depts.washington.edu/pandemicalliance/
covid-19-literature-report/latest-reports
3https://outbreaksci.prereview.org
4https://asapbio.org/preprints-and-covid-19
5https://disqus.com/by/sinaiimmunologyreviewproject
6https://rapidreviewscovid19.mitpress.mit.edu
7https://greenelab.github.io/covid19-review
been identified as a strategy for promoting inclusion and participants and provide an introduction to working with
interdisciplinary thought [22]. However, the Manubot GitHub issues. Interested participants were encouraged
workflow can be intimidating to contributors who are to contribute in several ways. One option was to catalog
not well-versed in git [22]. The synthesis and discus- articles of interest as issues. We developed a
standardsion of the emerging literature by biomedical scientists ized set of questions for contributors to consider when
and clinicians is imperative to a robust interpretation of evaluating an article following a framework often used
COVID-19 research. Such eforts in biology often rely on for assessing medical literature. This approach
emphaWhat You See Is What You Get tools such as Google Docs, sizes examining the methods used, assignment (whether
despite the significant limitations of these platforms in the study was observational or randomized), assessment,
the face of excessive publication. We recognized that the results, interpretation, and how well the study
extrapoproblem of synthesizing the COVID-19 literature lent lates [27]. Contributors were also invited to contribute
itself well to the Manubot platform, but that the poten- or edit text using GitHub’s pull request system. These
tial technical expertise required to work with Manubot contributions were not strictly defined and could range
presented a barrier to domain experts. from minor corrections to punctuation and grammar to</p>
        <p>Here, we describe the adaptation of Manubot to facili- large-scale additions of text. Finally, a small number of
tate collaboration in the extreme case of the COVID-19 contributors (the authors of this paper) contributed
techinfodemic, with the objective of developing a centralized nical expertise, either through the development of
stanplatform for summarizing and synthesizing a massive dardized approaches to the evaluation of papers based
amount of preprints, news stories, journal publications, on the MAARIE Framework [28], the writing of code to
and data. Unlike prior collaborations built on Manubot, generate manuscript figures, or the addition of features
most contributors to the COVID-19 collaborative litera- to Manubot. All of these additions were also submitted as
ture review came from biology or medicine. The members pull requests, either to the COVID-19 review repository
of the COVID-19 Review Consortium consolidated infor- or to an external repository, as appropriate.
mation about the virus in the context of related viruses Each pull request was reviewed and approved by at
and to synthesize rapidly emerging literature. Manubot least one other contributor before being merged into the
provided the infrastructure to manage contributions from main branch. We tagged potential reviewers based on the
the community and create a living, scholarly document introductions they had contributed in order to encourage
integrating data from multiple sources. Its back-end al- participation. Authorship was determined based on the
lowed biomedical scientists to sort and distill informative Contributor Roles Taxonomy8. Due to the permeability of
content out of the overwhelming flood of information ideas among diferent sections, contributors to a specific
[23] in order to provide a resource that would be useful to manuscript were recognized with masthead authorship,
the broader scientific community. This case study demon- while all contributors to the project were recognized with
strates the value of open collaborative writing tools such consortium authorship on all papers. Emphasizing the
as Manubot to emerging challenges. Because it is open use of issues and pull requests was designed to
encoursource software, we were able to adapt and customize age authors with and without git experience to discuss
Manubot to flexibly meet the needs of COVID-19 review. papers and provide feedback (both formal and informal)
Recording the evolution of information over time and as- on proposed text additions or changes. We also used the
sembling a resource that auto-updated in response to the Gitter chat platform9 to promote informal questions and
evolving crisis revealed the particular value that Manubot sharing of information among collaborators.
holds for managing rapid changes in scientific thought.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. METHODS</title>
      <sec id="sec-3-1">
        <title>2.1. Contributor Recruitment and Roles</title>
        <sec id="sec-3-1-1">
          <title>First, it was necessary to establish Manubot as a plat</title>
          <p>form accessible to researchers with limited experience
working with version control, given that this is not
typically emphasized in biology and medicine [24, 25, 26].
Contributors were recruited primarily by word of mouth
and on Twitter, and we also collaborated with existing
eforts to train early-career researchers. We invited
potential collaborators to contribute a short introduction
on a GitHub issue in order to collect information about</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Utilization and Expansion of</title>
      </sec>
      <sec id="sec-3-3">
        <title>Manubot</title>
        <p>Applying Manubot’s existing capabilities allowed us to
confront several challenges common in large-scale
collaborations, such as maintaining a record of contributions
that allowed us to allocate credit appropriately or to
contact the original author if questions arose. Additionally,
an up-to-date version of the content was available at all
times online in HTML10 or PDF format11. This approach
also allowed us to minimize the demand on authors to
8https://casrai.org/credit
9https://www.gitter.im
10https://greenelab.github.io/covid19-review
11https://greenelab.github.io/covid19-review/manuscript.pdf
curate and sync bibliographic resources. Manubot pro- citation to clinical trials. Other researchers identified
vides the functionality to create a bibliography using the same need16. Trials that are registered with
clinidigital object identifiers (DOIs), website URLs, or other caltrials.gov receive a unique clinical trial identifier, or
identifiers such as PubMed identifiers and arXiv IDs. “NCT ID.” Because clinical trials are registered long
beThe author can insert a citation in-line using a format fore results are published, referencing clinical trial
identisuch as [@doi:10.1371/journal.pcbi.1007128]. fiers was a priority. Manubot uses the Zotero translation
Manubot then obtains reference metadata, exports the ci- server17 to extract citation metadata for some types of
tations as Citation Style Language JSON Data Items, and citations. However, Zotero did not support clinical trial
renders the bibliographic information needed to generate identifiers and could not extract relevant metadata from
the references section [19]. This approach allows multi- their URLs. In order to pull clinical trial metadata
associple authors to work on a piece of text without needing ated into Manubot, we added Zotero support for these
to make manual adjustments to the reference lists. identifiers. To achieve this, we query clinicaltrials.gov to</p>
        <p>Due to the needs of this project, several new fea- retrieve XML metadata associated with each identifier
tures were implemented in Manubot. Because of the using JavaScript18. This extension enables citing a trial as
ever-evolving nature of the COVID-19 crisis, figures and @clinicaltrials:NCT04280705 instead of the URL.
statistics in the text quickly became outdated. To ad- Then, when Manubot requests clinical trial metadata
dress this concern, Manubot and GitHub’s CI features from the Zotero translation server, the response includes
were used to create figures that integrated online data the trial sponsors, responsible investigators, title, and
sources and to dynamically update information, such summary. Manubot now supports directly citing
hunas the current number of active COVID-19 clinical tri- dreds of registered Compact Uniform Resource
Identials [29], within the text of the manuscripts (Figure 2). fiers 19, beyond just the clinicaltrials identifier.
GitHub Actions runs a nightly workflow to update these Because of the large number of citations used in this
external data and regenerate the statistics and figures for manuscript and the fast-moving nature of COVID-19
the manuscript. The workflow uses the GitHub API to research, keeping track of retractions, corrections, and
detect and save the latest commit of the external data notices of concern also became a challenge. We
implesources that are GitHub repositories12. It then downloads mented a new Manubot plugin to support “smart
citaversioned data from that snapshot of the external reposi- tions” in the HTML build of manuscripts. The plugin
tories and runs bash and Python scripts to calculate the uses the scite [31] service to display a badge below any
desired statistics and produce the summary figures using citation with a DOI. The badge contains a set of icons
Matplotlib [30]. The statistics are stored in JSON files and numbers that indicate how many times that source
that are accessed by Manubot to populate the values of has been mentioned, supported, or disputed and whether
placeholder template variables dynamically every time there have been any important editorial notices. We were
the manuscript is built. For instance, the template vari- thus able to identify references that needed to be
reevalable {{ebm_trials_results}} in the manuscript is uated by an expert. This addition was invaluable given
replaced by the actual number of clinical trials with re- the nature of the project, where we were disseminating
sults, 98. The template variables also include versioned rapidly evolving information of great consequence from
URLs to the dynamically updated figures. The JSON over 1,500 diferent sources. The badges also allow
readifles and figures are stored in the external-resources ers to ascertain a rough approximation of the reliability
branch of the GitHub repository, providing versioned of cited sources at a glance.
storage. The GitHub Actions workflow automatically Because most collaborators were writing and editing
adds and commits the new JSON files and figures to the text through the GitHub website rather than in a local
external-resources branch every time it runs, and text editor, we also needed to add spell-checking
functionManubot uses the latest version of these resources when alities to Manubot. We integrated an existing Pandoc20
it builds the manuscript. The GitHub Actions workflow spell-check extension with AppVeyor CI to automatically
ifle is available online 13, as are the scripts14. The Python post spelling errors as comments in a GitHub pull
repackage versions are also available15. quest. The comment reported both unique misspelled
Another issue identified was the need for standardized tokens and all locations where the token was detected.</p>
        <p>Project maintainers managed a custom dictionary to
al12Vaccines: https://github.com/owid/covid-19-data; Clinical
Trials: https://github.com/ebmdatalab/covid_trials_tracker-covid;
Cases and Deaths: https://github.com/CSSEGISandData/COVID-19</p>
        <p>13https://github.com/greenelab/covid19-review/blob/master/
.github/workflows/update-external-resources.yaml</p>
        <p>14https://github.com/greenelab/covid19-review/tree/
external-resources</p>
        <p>15https://github.com/greenelab/covid19-review/blob/
external-resources/environment.yml
16https://forums.zotero.org/discussion/74933/
import-from-clinical-trials-registry and https://forums.zotero.
org/discussion/77721/add-reference-from-clinical-trials-org</p>
        <p>17https://www.zotero.org and https://github.com/zotero/
translation-server
18https://github.com/zotero/translators/pull/2153
19https://identifiers.org
20https://pandoc.org</p>
        <p>External resource
GitHub branch</p>
        <p>GitHub Actions workflow
Manuscript published on GitHub Pages
gr eenel ab. gi t hub. i o/ covi d19- r evi ew</p>
        <p>GitHub repository
gr eenel ab/ covi d19- r evi ew
arXiv
ClinicalTrials.gov</p>
        <p>Zotero
DataCite</p>
        <p>Scite</p>
        <p>Reference metadata
- Pull requests
- Issues
- Comments/feedback
- Manuscript text (.md)
- Static figures
- Author metadata</p>
        <p>mast er
EBM Data Lab</p>
        <p>CORD-19
CSSE</p>
        <p>Our World in Data</p>
        <p>Data sources
- Download data
- Update figures and statistics
- Python and bash scripts
- Dynamic figures and statistics
ext er nal - r esour ces
- HTML and PDF outputs
- Individual LaTeX ouputs
- Individual DOCX outputs
- Reference metadata
- Statistics
out put
- HTML and PDF manuscript
- Images
- Prior manuscript versions</p>
        <p>gh- pages
low over 1,500 scientific and technical terms that were tributions to pertain to that specific section. In addition,
not common English words. Spell-checking also helped we expanded the export formats to include partial
Lastandardize the writing style across dozens of authors TeX support via Pandoc. Pandoc converts the markdown
by detecting features such as British versus American content for an individual section to TeX and the Citation
English spellings. The actual spell-checking was imple- Style Language JSON, which contains reference
metamented using GNU Aspell21 and the Pandoc spellcheck data generated by Manubot, to BibTeX. We customized
iflter 22. The filter enables checking only the manuscript a LaTeX template and reformatted the Manubot
metatext, ignoring URLs and formatting. data, such as authors and their afiliations, for the LaTeX</p>
        <p>Manubot can render a manuscript in several formats template. The exported TeX file requires manual
refinethat serve diferent purposes. Prior to this project, ment but contains all manuscript content and most of the
Manubot could use Pandoc to convert the markdown- formatting. Because LaTeX is required for manuscript
formatted manuscript to HTML, PDF, and DOCX for- submission in many fields, automating most of the
promats. We expanded this functionality to export individ- cess of converting markdown to a submission-friendly
ual sections of the manuscript as separate DOCX files format expands Manubot’s potential user base. Manubot
while still rendering the complete manuscript in HTML users can write in the simple markdown format, render
and PDF formats. This development was necessary be- the manuscript in continuously-updated PDF or
interaccause the manuscript grew so large that it needed to be tive HTML formats, and export the manuscript in DOCX
split into seven separate papers for journal submission or TeX and BibTeX for submission to traditional
publishwhile still maintaining shared GitHub discussion across ers, taking full advantage of Pandoc’s powerful document
topics. When exporting an individual section, Manubot conversion capabilities and Manubot’s automation.
customizes the manuscript title, authors, and author
con21http://aspell.net
22https://github.com/pandoc/lua-filters/tree/master/spellcheck
Yusha Sun
Yoson Park</p>
        <p>Yael Marshal
Vincent Rubinetti</p>
        <p>Vikas Bansal
Tiago Lubiana
Temitayo Lukan
Soumita Ghosh</p>
        <p>Simina Boca
Sergey Knyazev
Sandipan Ray
Ryan Velazquez
Ronnie Russel
Ronan Lordan
Nils Wel hausen</p>
        <p>Michael Robson</p>
        <p>Marouen Ben Guebila
Lucy D'Agostino McGowan</p>
        <p>Likhitha Kol a
Lamonica Shinholster</p>
        <p>John J. Dziak
Jinhui Wang</p>
        <p>Jeff Field
J. Brian Byrd
Halie Rando
Greg Szeto
Fengling Hu
Elizabeth Sel</p>
        <p>Dimitri Perrin
Diane Rafizadeh
David Manheim</p>
        <p>David Mai
Daniel Himmelstein
Christian Brueffer</p>
        <p>Casey Greene
Ashwin Skel y</p>
        <p>Anthony Gitter
Anna Ada Dattoli</p>
        <p>Amruta Naik
Alexandra Lee
Adam MacLean</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. RESULTS</title>
      <sec id="sec-4-1">
        <title>3.1. Recruitment and Manuscript</title>
      </sec>
      <sec id="sec-4-2">
        <title>Development</title>
        <p>25https://github.com/ismms-himc/covid-19_sinai_reviews</p>
        <sec id="sec-4-2-1">
          <title>Coverage by Nature Toolbox [32] and an associated</title>
          <p>tweet23 about the project on April 1, 2020 attracted the
interest of the scientific community (Figure 3). Because papers and preprints in each category. A total of 285 new
GitHub issues are similar to other common web com- paper issues had been opened as of September 13, 2021.
menting systems, authors learned these tools quickly. The manuscripts produced by the consortium
(excludThe Gitter chat also presented a low barrier to entry. The ing this one) will be submitted to mSystems as part of
manuscript continued to grow throughout the first year a special issue that provides support for continuous
upand a half of the project in both word count and the num- dates as more information becomes available. One has
ber of references (Figure 3). Though only a fraction of been published and two are available as preprints. This
potential contributors contributed to the text included in approach allows for a version of record to be maintained
the manuscripts (Figure 3), many contributors remained alongside the most recent version, which is always
availengaged over the long term (Figure 4). Additionally, new able through GitHub. These manuscripts cover a wide
contributors continued to join even into the second year range of topics including the fundamental biology of
of the project. SARS-CoV-2 (pathogenesis [33] and evolution),
biomedi</p>
          <p>In order to make the project more accessible, we de- cal advances in responding to the virus and COVID-19
veloped resources explaining how to use GitHub’s web (pharmaceuticals [29], nutraceuticals [34], vaccines, and
interface to develop and edit text for Manubot assuming diagnostic technologies), and biological and social factors
no prior experience with version control. These tutori- influencing disease transmission and outcomes. To date,
als explained how to open an issue, open a pull request, 50 authors are associated with the consortium (Figure 3).
and review a pull request24. Additionally, the framework More formal recruitment eforts to integrate with
exfor evaluating literature was converted into issue tem- isting projects providing support for undergraduate
stuplates to simplify the review of new articles. Articles dents during COVID-19 were also successful. We
incorpowere classified as diagnostic, therapeutic, or other, with rated summaries written by the students, post-docs, and
an associated template developed to guide the review of faculty of the Immunology Institute at the Mount Sinai
School of Medicine25 [12]. Additionally, two of the
con23https://twitter.com/j_perkel/status/1245454628235309057
24CONTRIBUTING.md and INSTRUCTIONS.md within the
repository
sortium authors were undergraduate students recruited (Table 1). Open publishing thus allowed us to harness
through the American Physician Scientist Association’s the domain expertise of a large group of non-technical
Virtual Summer Research Program. Thus, the consor- users to respond to the flood of COVID-19 publications.
tium was successful in providing a venue for researchers Several existing and new features in Manubot aid in
across all career stages to continue investigating and pub- responding to the challenges posed by the infodemic.
lishing at a time when many biomedical researchers were Manuscripts are written in markdown and can be
renunable to access their laboratory facilities. dered in several formats providing diferent advantages to
users. For example, beyond building just a PDF, Manubot
3.2. Integrating Data also renders the manuscript in HTML, DOCX, and now,
LaTeX (in a more limited capacity). The interactive
We integrated data into the manuscripts from several HTML manuscript format ofers several advantages over
sources (Figure 2). Worldwide cases and deaths were a static PDF to harmonize available resources and
adtracked by the COVID-19 Data Repository by the Center dress specific problems related to COVID-19. The
intefor Systems Science and Engineering at Johns Hopkins gration of scite into the HTML build makes references
University26. The clinical trials statistics and figure were more manageable by visually indicating whether their
regenerated based on data from the University of Oxford sults are contested or whether they have been corrected
Evidence-Based Medicine Data Lab’s COVID-19 Trial- or retracted. Cross-referencing diferent pieces of the
sTracker [35]. Information about vaccine distribution manuscript, such as cited preprints with reviews stored
was extracted from Our World In Data27 [36]. Figure 1 in an appendix, is another interactive option presented
integrates data from the CORD-19 dataset [3]. by HTML. The DOCX format was preferred by most</p>
          <p>Manubot’s bibliographic management capabilities non-technical users for reviewing the final version of the
were critical because the amount of relevant literature manuscript and was useful for creating submissions to a
published far outstripped what we had anticipated at biological journal. Additionally, because of the heavy
emthe beginning of the project. As of September 10, 2021, phasis on Word processing in biology, Manubot’s ability
there were 1,676 references (Figure 3). The scite plu- to generate DOCX outputs was expanded to allow users
gin provided a way to visually inspect the reference to generate DOCX files containing only a section of the
list to identify possible references of concern. This manuscript. In our case, where the full project is nearly
and the other new features required for the COVID-19 150,000 words, this allows individual pieces to be shared
project are now included in Manubot’s rootstock, which more easily. Finally, the preliminary addition of LaTeX
is the template GitHub repository for creating a new output is useful for researchers from computational fields
manuscript. Using CI, Manubot now checks that the who submit papers in TeX format and removes the step
manuscript was built correctly, runs spell-checking, and of reformatting markdown prior to submission.
cross-references the manuscripts cited in this review. In
addition, Manubot now supports citing clinical trial
identifiers such as clinicaltrials:NCT04292899 [37].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. DISCUSSION</title>
      <p>The current project was based in the GitHub repository
greenelab/covid19-review using Manubot [19] to
continuously generate the manuscript. The Manubot
framework facilitated a massive collaborative review on
an urgent topic. We demonstrated the utility of Manubot
to a project where many contributors lacked expertise
or even experience working with version control. This
efort has produced not only seven literature reviews
on topics relevant to the COVID-19 pandemic, but has
also generated cyberinfrastructure for training novice
users in GitHub. We also extended the functionalities
of Manubot to provide more of the benefits of What
You See Is What You Get platforms such as Google Docs</p>
      <p>26https://github.com/CSSEGISandData/COVID-19/tree/master/
csse_covid_19_data/csse_covid_19_time_series
27https://github.com/owid/covid-19-data</p>
      <p>Type</p>
      <p>Description
CI Regularly download external data sources,
generate new figures and statistics, and read them
when Manubot builds the latest manuscript
CI Post spell-checking reports as pull request
com</p>
      <p>ments
Citations Zotero extension to report more relevant
clini</p>
      <p>cal trial metadata from https://clinicaltrials.gov
Citations Cite any Compact Uniform Resource Identifier,</p>
      <p>such as clinicaltrials or ncbigene
Citations scite badges to track retractions, corrections,</p>
      <p>and notices of concern
Outputs Improved support for Pandoc’s LaTeX output
Outputs Build complete manuscript alongside individual</p>
      <p>sections as standalone documents</p>
      <p>The COVID-19 Review Consortium provided a
platform for researchers to engage in scientific investigation
early in the pandemic when many biological scientists
were unable to access their research spaces. In turn, by
seeking to adapt Manubot to allow for broader partic- experts. This asynchronicity could potentially introduce
ipation, we made a number of improvements that are incompatibility between the figures and the
surroundexpected to increase its appeal to researchers from all ing text. Similarly, in line with the collaboration-related
backgrounds. Manubot provided a way for contributors challenges of the project, some authors returned to
upfrom a variety of backgrounds, including early-career date their text, while others did not. As a result, the
researchers, to join a massive collaborative project while lead authors of each paper often spent several weeks
demonstrating their individual contributions to the larger prior to journal submission updating the text to reflect
work and gaining experience with version control. The new developments in each area. In the future, it may be
licensing and infrastructure also provide the basis for possible to streamline this process through integration
individuals to adapt from this project to create their own with a tool such as CoronaCentral [4] to automatically
snapshots of the COVID-19 literature that derive from, identify relevant, high-impact papers that need to be
but are not wholly identical to, the primary versions of included, although expertise would still be required to
these reviews. This project suggests that massive online incorporate them. Another challenge involves tracking
open publishing eforts can indeed advance scholarship preprints as they are reviewed or critiqued, revised, and
through inclusion [22], including during the extreme potentially published. While updating the content of the
challenges presented by the COVID-19 pandemic. manuscript would likely fall to human contributors,
au</p>
      <p>Some challenges did arise in eforts to include an aca- tomatic detection of published versions of preprints [38]
demically diverse set of authors. The barriers to entry could be integrated in the future. These challenges are
posed by git and GitHub likely still reduced participation exacerbated by the scale of the infodemic, but developing
from individuals who might have otherwise been inter- solutions would benefit future projects tracking more
typested. Using pull requests as a tool for writing text is also ical trends in publication. Similarly, outputting machine
unfamiliar to many or most scientists, and the review readable summaries of key information in the
COVIDprocess can be slow, which might cause interested con- 19 review manuscripts could reduce their contribution
tributors to lose interest. Additionally, the pull request to the infodemic. As it stands, the integration of
Commodel may limit people from providing general feedback pact Uniform Resource Identifier does make a step in this
on the manuscript or a section of the manuscript. As a direction. Formal identifiers could be used to extract
reresult, some feedback came through email or comments lationships among clinical trials, genes, publications, and
on the DOCX outputs that were then translated into is- other entities. Thus, the experience of using Manubot
sues or pull requests by the project managers. Given that for a massive project has laid the foundation for future
our approach hinged on these version control tools, it is additions to enhance user experience and inclusivity.
likely that our group of contributors was biased towards
those who were interested in or experienced with
computational tools. The trajectory of the pandemic itself 5. CONCLUSION
also likely influenced participation: engagement waned
over the course of the pandemic as labs opened back up With the worldwide scientific community uniting
durand researchers were able to return to their work, and we ing 2020 and 2021 to investigate COVID-19 from a wide
recruited very few senior clinicians to the project, which range of perspectives, findings from many disciplines
is unsurprising given the load on medical professionals are relevant on a rapid timescale to a broad scientific
during this time. Engagement that waxes and wanes is, audience. As many other eforts have described, the
pubhowever, typical when writing massively open online lishing rate of formal manuscripts and preprints about
papers [22]. Adding features such as spell-check did im- COVID-19 has been unprecedented [1], and eforts to
prove usability, and additional features such as automati- review the body of COVID-19 literature are faced with
cally checking the formatting of citations could further an ever-expanding corpus to evaluate. In the case of the
improve the usability of this tool. In the future, a formal seven manuscripts produced by the COVID-19 Review
study of participation could allow for quantification of Consortium, Manubot allows for continuous updating of
these biases and improved eforts to foster inclusion. the manuscripts as the pandemic enters its second year</p>
      <p>Additional limitations are challenges associated with and the landscape shifts with the emergence of
promismassively open online papers in general. With such a ing therapeutics and vaccines [29]. These manuscripts
large amount of text, it is not possible to keep all sec- pull data from external sources and update information
tions of the manuscript up to date at all times. Read- and visualizations daily using CI. By of-loading some
ers are not able to distinguish when each section was updates to computational pipelines, domain experts can
updated. Even GitHub’s blame functionality does not focus on the broader implications of new information as it
distinguish minor changes from substantive updates to emerges. Centralizing, summarizing, and critiquing data
the text. While much of the data and statistics update and literature broadly relevant to COVID-19 can expedite
automatically, the text itself required updating by human the interdisciplinary scientific process that is currently
happening at an advanced pace. As of September 13, 2021, Peer Review?, Advances in Chronic Kidney Disease
2,886 commits have been made to the manuscript across 27 (2020) 418–426.
575 merged pull requests. The eforts of the COVID-19 [2] J. Zarocostas, How to fight an infodemic, The
Review Consortium illustrate the value of including open Lancet 395 (2020) 676.
source tools, including those focused on open publishing, [3] L. L. Wang, K. Lo, Y. Chandrasekhar, R. Reas, J. Yang,
in these eforts. By facilitating the versioning of text, such D. Burdick, D. Eide, K. Funk, Y. Katsis, R. Kinney,
platforms also allow for documentation of the evolution Y. Li, Z. Liu, W. Merrill, P. Mooney, D. Murdick,
of thought in an evolving area and formal analysis of a D. Rishi, J. Sheehan, Z. Shen, B. Stilson, A. Wade,
collaborative project. This application of version control K. Wang, N. X. R. Wang, C. Wilhelm, B. Xie, D.
Rayholds the potential to improve scientific publishing in a mond, D. S. Weld, O. Etzioni, S. Kohlmeier,
CORDrange of disciplines, including those outside of traditional 19: The COVID-19 Open Research Dataset, arXiv
computational fields. While Manubot is a technologically (2020) 2004.10706.
complex tool, this project demonstrates that it can be [4] J. Lever, R. B. Altman, Analyzing the vast
coronapplied to a variety of projects. Future work can address avirus literature with CoronaCentral, Proceedings
remaining limitations and continue to advance Manubot of the National Academy of Sciences 118 (2021)
as an inclusive tool for open publishing projects. e2100766118.
[5] G. Eysenbach, The impact of preprint servers and
electronic publishing on biomedical research,
CurAcknowledgements rent Opinion in Immunology 12 (2000) 499–503.
[6] D. Lowe, Too Many Papers, 2021. URL: https://www.</p>
      <p>This work would not be possible without support from science.org/content/blog-post/too-many-papers.
the COVID-19 Review Consortium28. We are also grate- [7] N. S. L. Yeo-Teh, B. L. Tang, An alarming retraction
ful to Nick DeVito for assistance with the Evidence-Based rate for scientific publications on Coronavirus
DisMedicine Data Lab COVID-19 TrialsTracker data, Josh ease 2019 (COVID-19), Accountability in Research
Nicholson and Milo Mordaunt for scite support, David 28 (2020) 47–53.</p>
      <p>Nicholson for spell-check assistance, and Milton Pivi- [8] A. Abritis, A. Marcus, I. Oransky, An “alarming”
dori as well as consortium members Alex Lee and Chris- and “exceptionally high” rate of COVID-19
retractian Bruefer for feedback. Research was supported by tions?, Accountability in Research 28 (2020) 58–59.
the Gordon and Betty Moore Foundation award GBMF [9] G. Agoramoorthy, M. J. Hsu, P. Shieh, Queries on
4552 (HMR, DSH, CSG), the National Institutes of Health the COVID-19 quick publishing ethics, Bioethics
award R01HG010067 (HMR, CSG), and the John W. and 34 (2020) 633–634.</p>
      <p>Jeanne M. Rowe Center for Research in Virology (AG)29. [10] C. Boodman, S. Lee, J. Bullard, Idle medical students
review emerging COVID-19 research, Medical
EdReferences ucation Online 25 (2020) 1770562.
[11] J. Brainard, Scientists are drowning in COVID-19
[1] C. Vlasschaert, J. M. Topf, S. Hiremath, Proliferation papers. Can new tools keep them afloat?, Science
of Papers and Preprints During the Coronavirus (2020).</p>
      <p>Disease 2019 Pandemic: Progress or Problems With [12] N. Vabret, R. Samstein, N. Fernandez, M. Merad, T. S.
I. R. Project, Trainees, Faculty, Advancing scientific
28COVID-19 Review Consortium: Vikas Bansal, John P. Barton, knowledge in times of pandemics, Nature Reviews
Simina M. Boca, Joel D Boerckel, Christian Bruefer, James Brian Immunology 20 (2020) 338–338.
Byrd, Stephen Capone, Shikta Das, Anna Ada Dattoli, John J. Dziak, [13] J. Sun, W.-T. He, L. Wang, A. Lai, X. Ji, X. Zhai,
Jefrey M. Field, Soumita Ghosh, Anthony Gitter, Rishi Raj Goel, G. Li, M. A. Suchard, J. Tian, J. Zhou, M. Veit,
FCeansgeylinSg. HGure,eNnaefis,aMMa.roJaudeanvjBi,eJnerGemueybiPla.,KDamanili,eSleSr.geHyimKmnyealszteevi,n, S. Su, Covid-19: Epidemiology, Evolution, and
Likhitha Kolla, Alexandra J. Lee, Ronan Lordan, Tiago Lubiana, Cross-Disciplinary Perspectives, Trends in
MolecuTemitayo Lukan, Adam L. MacLean, David Mai, Serghei Mangul, lar Medicine 26 (2020) 483–495.
David Manheim, Lucy D’Agostino McGowan, Amruta Naik, YoSon [14] R. Weissleder, H. Lee, J. Ko, M. J. Pittet,
COVIDPark, Dimitri Perrin, Yanjun Qi, Diane N. Rafizadeh, Bharath Ram- 19 diagnostics in context, Science Translational
sRuunbdinaer,ttHi,aElileizMab. eRtahnSdeol,l,SLanamdiopnanicaRaSyh,iMnhicohlsateelr,PA.RshowbsionnN,V.iSnkceelnlyt, Medicine 12 (2020) eabc1931.
Yuchen Sun, Yusha Sun, Gregory L Szeto, Ryan Velazquez, Jinhui [15] J. M. Sanders, M. L. Monogue, T. Z. Jodlowski, J. B.
Wang, Nils Wellhausen Cutrell, Pharmacologic Treatments for Coronavirus
29Conflicts of interest. SMB: Now employed by AstraZeneca Disease 2019 (COVID-19), JAMA (2020).
(Gaithersburg, MD). May own stock or stock options. Work con- [16] T. Carvalho, COVID-19 Research in Brief:
DecemAduccetleitdy aatndprSeavniooufi.sAGpo:sPiatitoenn.t aLpDpMlic:aRtieocneifilveeddwciothnstuhletiWngisfceoenssfinrom ber, 2019 to June, 2020, Nature Medicine 26 (2020)
Alumni Research Foundation related to classifying activated T cells. 1152–1153.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>