-

W. J. Wiersinga, A. Rhodes, A. C. Cheng, S. J. Pea- for Biology and Medicine

An Open-Publishing Response to the COVID-19 Infodemic

Halie M. Rando

halie.rando@cuanschutz.edu 4 5 6

Simina M. Boca

Lucy D'Agostino McGowan

lucydagostino@gmail.com 9

Daniel S. Himmelstein

daniel.himmelstein@gmail.com 6

Michael P. Robson

michael.robson@villanova.edu 8

Vincent Rubinetti

vince.rubinetti@gmail.com 4 6

Ryan Velazquez

rnhvelazquez@gmail.com 1

Casey S. Greene

greenescientist@gmail.com 0 4 5 6

Anthony Gitter

gitter@biostat.wisc.edu 3 7 0 Alex's Lemonade Stand Foundation, Childhood Cancer Data Lab , Philadelphia, PA , USA 1 Azimuth1 , McLean, VA , USA 2 Georgetown University Medical Center, Innovation Center for Biomedical Informatics , Washington, DC , USA 3 Morgridge Institute for Research , Madison, WI , USA 4 University of Colorado School of Medicine, Center for Health AI , Aurora, CO , USA 5 University of Colorado School of Medicine, Department of Biochemistry and Molecular Genetics , Aurora, CO , USA 6 University of Pennsylvania, Perelman School of Medicine, Department of Systems Pharmacology and Translational Therapeutics , Philadelphia, PA , USA 7 University of Wisconsin-Madison, Department of Biostatistics and Medical Informatics , Madison, WI , USA 8 Villanova University, Department of Computing Sciences , Villanova, PA , USA 9 Wake Forest University, Department of Mathematics and Statistics , Winston-Salem, NC , USA

2019

8 2013 367 368

The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-todate data from online sources nightly, regenerating some of the manuscript's figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this efort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many eforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis.

eol>COVID-19 living document open publishing open source data integration Manubot

6 Related Sciences

1. INTRODUCTION Coronavirus Disease 2019 (COVID-19) caused a worldwide public health crisis that has reshaped many aspects

of society. The scientific community has, in turn, devoted significant attention and resources towards COVID-19 and the associated virus, SARS-CoV-2, resulting in the release of data and publications at a rate and scale never previously seen for a single topic. Over 20,000 articles about COVID-19 were released in the first four months of the pandemic [1], causing an “infodemic” [1, 2]. The COVID-19 Open Research Dataset (CORD-19) [3], which was developed in part with the goal of training machine learning algorithms on COVID-19-related text, illustrates the growth of related scholarly literature (Figure 1). This resource was developed by querying several sources for terms related to SARS-CoV-2 and COVID-19, as well as the coronaviruses SARS-CoV-1 and MERS-CoV and their associated diseases [3]. CORD-19 contained 768,929 manuscripts as of September 6, 2021. Additional curation by CoronaCentral [4] has produced, at present, a set of over 180,000 publications particularly relevant to COVID-19 and closely related viruses. Despite many advances in understanding the virus and the disease, there are also downsides to the availability of so much infor- now-defunct http://covidpreprints.com4, comments on mation. "Excessive publication" has been recognized as a preprint servers5 [12], and even a journal6. However, the concern for over forty years [5] and has been discussed explosive rate of publication presents challenges for such with respect to the COVID-19 literature [6]. Any efort eforts, many of which are no longer active. Similarly, to synthesize, summarize, and contextualize COVID-19 many literature reviews have been written on the availresearch will face a vast corpus of potentially relevant able COVID-19 literature [13, 14, 15, 16, 17], but static material. reviews quickly become outdated as new research is released or existing research is retracted or superseded.

One example is a review of topics in COVID-19 research including vaccine development [17]. This review was published on July 10, 2020, four days before Moderna released the surprisingly promising results of their phase 1 trial [18] that changed expectations surrounding vaccines.

Therefore, the COVID-19 publishing climate presented a challenge where curation of the literature by a diverse group of experts in a format that could respond quickly to high-volume, high-velocity information was desirable.

We therefore sought to develop a platform for scientific discussion and collaboration around COVID-19 by adapting open publishing infrastructure to accommodate the scale of COVID-19 publishing. Recent advances in open publishing have created an infrastructure that facilitates distributed, version-controlled collaboration on manuscripts [19]. Manubot [19] is a collaborative Figure 1: Growth of the CORD-19 dataset. The number framework developed to adapt open-source software deof articles has proliferated, with both traditional and preprint velopment techniques and version control for manuscript manuscripts in the corpus. The first release (March 16, 2020) writing. With Manubot, manuscripts are managed and contained 28,000 documents [3]. As of September 6, 2021, maintained using GitHub, a popular, online version conthis had increased to 768,929 articles. Of these, 30,726 are trol interface. We selected Manubot because it ofers sevpreprints from arXiv, medRxiv, and bioRxiv. eral advantages over comparable collaborative writing platforms such as Authorea, Overleaf, Google Docs, Word

Online, or wikis [19]. Citation-by-identifier ensures con

Information was released rapidly by both traditional sistent reference metadata standards that would be difipublishers and preprint servers, and many papers faced cult to maintain manually in a manuscript with dozens of subsequent scrutiny. The number of COVID-19 papers re- authors and over 1,500 citations. Manubot’s pull requesttracted may be higher, and potentially much higher, than based contribution model balances the goals of making is typical, although a thorough investigation of this ques- the project open to everyone and maintaining scientific tion requires more time to elapse [7, 8]. Many preprints accuracy. All contributions are reviewed, discussed, and and papers are also associated with corrections or ex- formally approved on GitHub before text updates appear pressions of concern1 [8]. Preprints are released prior to in the public-facing manuscript7. Continuous integration peer review, but some traditional publishing venues have (CI) seamlessly combines author-produced text and figfast-tracked COVID-19 papers through peer review, lead- ures with automatically generated and updated statistics ing to questions about whether they are held to typical and figures derived from external data sources and the standards [9]. Therefore, evaluating the COVID-19 liter- manuscript’s own content. In addition, the authors who ature requires not only digesting available information initially launched this project included Manubot developbut also monitoring subsequent changes. ers who had prior successes using Manubot for massively

Because of the fast-moving nature of the topic, open and traditional manuscript, such as a large-scale many eforts to summarize and synthesize the COVID- collaborative eforts such as a review of developments 19 literature have been undertaken. These eforts in deep learning [20] and a re-evaluation of the role of include newsletters2 [10], web portals3 [11] or the authorship in modern collaborations [21].

Collaboration via massively open online papers has 1https://asapbio.org/preprints-and-covid-19 as well as https:// retractionwatch.com/retracted-coronavirus-covid-19-papers

2https://depts.washington.edu/pandemicalliance/ covid-19-literature-report/latest-reports 3https://outbreaksci.prereview.org 4https://asapbio.org/preprints-and-covid-19 5https://disqus.com/by/sinaiimmunologyreviewproject 6https://rapidreviewscovid19.mitpress.mit.edu 7https://greenelab.github.io/covid19-review been identified as a strategy for promoting inclusion and participants and provide an introduction to working with interdisciplinary thought [22]. However, the Manubot GitHub issues. Interested participants were encouraged workflow can be intimidating to contributors who are to contribute in several ways. One option was to catalog not well-versed in git [22]. The synthesis and discus- articles of interest as issues. We developed a standardsion of the emerging literature by biomedical scientists ized set of questions for contributors to consider when and clinicians is imperative to a robust interpretation of evaluating an article following a framework often used COVID-19 research. Such eforts in biology often rely on for assessing medical literature. This approach emphaWhat You See Is What You Get tools such as Google Docs, sizes examining the methods used, assignment (whether despite the significant limitations of these platforms in the study was observational or randomized), assessment, the face of excessive publication. We recognized that the results, interpretation, and how well the study extrapoproblem of synthesizing the COVID-19 literature lent lates [27]. Contributors were also invited to contribute itself well to the Manubot platform, but that the poten- or edit text using GitHub’s pull request system. These tial technical expertise required to work with Manubot contributions were not strictly defined and could range presented a barrier to domain experts. from minor corrections to punctuation and grammar to

Here, we describe the adaptation of Manubot to facili- large-scale additions of text. Finally, a small number of tate collaboration in the extreme case of the COVID-19 contributors (the authors of this paper) contributed techinfodemic, with the objective of developing a centralized nical expertise, either through the development of stanplatform for summarizing and synthesizing a massive dardized approaches to the evaluation of papers based amount of preprints, news stories, journal publications, on the MAARIE Framework [28], the writing of code to and data. Unlike prior collaborations built on Manubot, generate manuscript figures, or the addition of features most contributors to the COVID-19 collaborative litera- to Manubot. All of these additions were also submitted as ture review came from biology or medicine. The members pull requests, either to the COVID-19 review repository of the COVID-19 Review Consortium consolidated infor- or to an external repository, as appropriate. mation about the virus in the context of related viruses Each pull request was reviewed and approved by at and to synthesize rapidly emerging literature. Manubot least one other contributor before being merged into the provided the infrastructure to manage contributions from main branch. We tagged potential reviewers based on the the community and create a living, scholarly document introductions they had contributed in order to encourage integrating data from multiple sources. Its back-end al- participation. Authorship was determined based on the lowed biomedical scientists to sort and distill informative Contributor Roles Taxonomy8. Due to the permeability of content out of the overwhelming flood of information ideas among diferent sections, contributors to a specific [23] in order to provide a resource that would be useful to manuscript were recognized with masthead authorship, the broader scientific community. This case study demon- while all contributors to the project were recognized with strates the value of open collaborative writing tools such consortium authorship on all papers. Emphasizing the as Manubot to emerging challenges. Because it is open use of issues and pull requests was designed to encoursource software, we were able to adapt and customize age authors with and without git experience to discuss Manubot to flexibly meet the needs of COVID-19 review. papers and provide feedback (both formal and informal) Recording the evolution of information over time and as- on proposed text additions or changes. We also used the sembling a resource that auto-updated in response to the Gitter chat platform9 to promote informal questions and evolving crisis revealed the particular value that Manubot sharing of information among collaborators. holds for managing rapid changes in scientific thought.

2. METHODS 2.1. Contributor Recruitment and Roles First, it was necessary to establish Manubot as a plat

form accessible to researchers with limited experience working with version control, given that this is not typically emphasized in biology and medicine [24, 25, 26]. Contributors were recruited primarily by word of mouth and on Twitter, and we also collaborated with existing eforts to train early-career researchers. We invited potential collaborators to contribute a short introduction on a GitHub issue in order to collect information about

2.2. Utilization and Expansion of Manubot

Applying Manubot’s existing capabilities allowed us to confront several challenges common in large-scale collaborations, such as maintaining a record of contributions that allowed us to allocate credit appropriately or to contact the original author if questions arose. Additionally, an up-to-date version of the content was available at all times online in HTML10 or PDF format11. This approach also allowed us to minimize the demand on authors to 8https://casrai.org/credit 9https://www.gitter.im 10https://greenelab.github.io/covid19-review 11https://greenelab.github.io/covid19-review/manuscript.pdf curate and sync bibliographic resources. Manubot pro- citation to clinical trials. Other researchers identified vides the functionality to create a bibliography using the same need16. Trials that are registered with clinidigital object identifiers (DOIs), website URLs, or other caltrials.gov receive a unique clinical trial identifier, or identifiers such as PubMed identifiers and arXiv IDs. “NCT ID.” Because clinical trials are registered long beThe author can insert a citation in-line using a format fore results are published, referencing clinical trial identisuch as [@doi:10.1371/journal.pcbi.1007128]. fiers was a priority. Manubot uses the Zotero translation Manubot then obtains reference metadata, exports the ci- server17 to extract citation metadata for some types of tations as Citation Style Language JSON Data Items, and citations. However, Zotero did not support clinical trial renders the bibliographic information needed to generate identifiers and could not extract relevant metadata from the references section [19]. This approach allows multi- their URLs. In order to pull clinical trial metadata associple authors to work on a piece of text without needing ated into Manubot, we added Zotero support for these to make manual adjustments to the reference lists. identifiers. To achieve this, we query clinicaltrials.gov to

Due to the needs of this project, several new fea- retrieve XML metadata associated with each identifier tures were implemented in Manubot. Because of the using JavaScript18. This extension enables citing a trial as ever-evolving nature of the COVID-19 crisis, figures and @clinicaltrials:NCT04280705 instead of the URL. statistics in the text quickly became outdated. To ad- Then, when Manubot requests clinical trial metadata dress this concern, Manubot and GitHub’s CI features from the Zotero translation server, the response includes were used to create figures that integrated online data the trial sponsors, responsible investigators, title, and sources and to dynamically update information, such summary. Manubot now supports directly citing hunas the current number of active COVID-19 clinical tri- dreds of registered Compact Uniform Resource Identials [29], within the text of the manuscripts (Figure 2). fiers 19, beyond just the clinicaltrials identifier. GitHub Actions runs a nightly workflow to update these Because of the large number of citations used in this external data and regenerate the statistics and figures for manuscript and the fast-moving nature of COVID-19 the manuscript. The workflow uses the GitHub API to research, keeping track of retractions, corrections, and detect and save the latest commit of the external data notices of concern also became a challenge. We implesources that are GitHub repositories12. It then downloads mented a new Manubot plugin to support “smart citaversioned data from that snapshot of the external reposi- tions” in the HTML build of manuscripts. The plugin tories and runs bash and Python scripts to calculate the uses the scite [31] service to display a badge below any desired statistics and produce the summary figures using citation with a DOI. The badge contains a set of icons Matplotlib [30]. The statistics are stored in JSON files and numbers that indicate how many times that source that are accessed by Manubot to populate the values of has been mentioned, supported, or disputed and whether placeholder template variables dynamically every time there have been any important editorial notices. We were the manuscript is built. For instance, the template vari- thus able to identify references that needed to be reevalable {{ebm_trials_results}} in the manuscript is uated by an expert. This addition was invaluable given replaced by the actual number of clinical trials with re- the nature of the project, where we were disseminating sults, 98. The template variables also include versioned rapidly evolving information of great consequence from URLs to the dynamically updated figures. The JSON over 1,500 diferent sources. The badges also allow readifles and figures are stored in the external-resources ers to ascertain a rough approximation of the reliability branch of the GitHub repository, providing versioned of cited sources at a glance. storage. The GitHub Actions workflow automatically Because most collaborators were writing and editing adds and commits the new JSON files and figures to the text through the GitHub website rather than in a local external-resources branch every time it runs, and text editor, we also needed to add spell-checking functionManubot uses the latest version of these resources when alities to Manubot. We integrated an existing Pandoc20 it builds the manuscript. The GitHub Actions workflow spell-check extension with AppVeyor CI to automatically ifle is available online 13, as are the scripts14. The Python post spelling errors as comments in a GitHub pull repackage versions are also available15. quest. The comment reported both unique misspelled Another issue identified was the need for standardized tokens and all locations where the token was detected.

Project maintainers managed a custom dictionary to al12Vaccines: https://github.com/owid/covid-19-data; Clinical Trials: https://github.com/ebmdatalab/covid_trials_tracker-covid; Cases and Deaths: https://github.com/CSSEGISandData/COVID-19

13https://github.com/greenelab/covid19-review/blob/master/ .github/workflows/update-external-resources.yaml

14https://github.com/greenelab/covid19-review/tree/ external-resources

15https://github.com/greenelab/covid19-review/blob/ external-resources/environment.yml 16https://forums.zotero.org/discussion/74933/ import-from-clinical-trials-registry and https://forums.zotero. org/discussion/77721/add-reference-from-clinical-trials-org

17https://www.zotero.org and https://github.com/zotero/ translation-server 18https://github.com/zotero/translators/pull/2153 19https://identifiers.org 20https://pandoc.org

External resource GitHub branch

GitHub Actions workflow Manuscript published on GitHub Pages gr eenel ab. gi t hub. i o/ covi d19- r evi ew

GitHub repository gr eenel ab/ covi d19- r evi ew arXiv ClinicalTrials.gov

Zotero DataCite

Scite

Reference metadata - Pull requests - Issues - Comments/feedback - Manuscript text (.md) - Static figures - Author metadata

mast er EBM Data Lab

CORD-19 CSSE

Our World in Data

Data sources - Download data - Update figures and statistics - Python and bash scripts - Dynamic figures and statistics ext er nal - r esour ces - HTML and PDF outputs - Individual LaTeX ouputs - Individual DOCX outputs - Reference metadata - Statistics out put - HTML and PDF manuscript - Images - Prior manuscript versions

gh- pages low over 1,500 scientific and technical terms that were tributions to pertain to that specific section. In addition, not common English words. Spell-checking also helped we expanded the export formats to include partial Lastandardize the writing style across dozens of authors TeX support via Pandoc. Pandoc converts the markdown by detecting features such as British versus American content for an individual section to TeX and the Citation English spellings. The actual spell-checking was imple- Style Language JSON, which contains reference metamented using GNU Aspell21 and the Pandoc spellcheck data generated by Manubot, to BibTeX. We customized iflter 22. The filter enables checking only the manuscript a LaTeX template and reformatted the Manubot metatext, ignoring URLs and formatting. data, such as authors and their afiliations, for the LaTeX

Manubot can render a manuscript in several formats template. The exported TeX file requires manual refinethat serve diferent purposes. Prior to this project, ment but contains all manuscript content and most of the Manubot could use Pandoc to convert the markdown- formatting. Because LaTeX is required for manuscript formatted manuscript to HTML, PDF, and DOCX for- submission in many fields, automating most of the promats. We expanded this functionality to export individ- cess of converting markdown to a submission-friendly ual sections of the manuscript as separate DOCX files format expands Manubot’s potential user base. Manubot while still rendering the complete manuscript in HTML users can write in the simple markdown format, render and PDF formats. This development was necessary be- the manuscript in continuously-updated PDF or interaccause the manuscript grew so large that it needed to be tive HTML formats, and export the manuscript in DOCX split into seven separate papers for journal submission or TeX and BibTeX for submission to traditional publishwhile still maintaining shared GitHub discussion across ers, taking full advantage of Pandoc’s powerful document topics. When exporting an individual section, Manubot conversion capabilities and Manubot’s automation. customizes the manuscript title, authors, and author con21http://aspell.net 22https://github.com/pandoc/lua-filters/tree/master/spellcheck Yusha Sun Yoson Park

Yael Marshal Vincent Rubinetti

Vikas Bansal Tiago Lubiana Temitayo Lukan Soumita Ghosh

Simina Boca Sergey Knyazev Sandipan Ray Ryan Velazquez Ronnie Russel Ronan Lordan Nils Wel hausen

Michael Robson

Marouen Ben Guebila Lucy D'Agostino McGowan

Likhitha Kol a Lamonica Shinholster

John J. Dziak Jinhui Wang

Jeff Field J. Brian Byrd Halie Rando Greg Szeto Fengling Hu Elizabeth Sel

Dimitri Perrin Diane Rafizadeh David Manheim

David Mai Daniel Himmelstein Christian Brueffer

Casey Greene Ashwin Skel y

Anthony Gitter Anna Ada Dattoli

Amruta Naik Alexandra Lee Adam MacLean

3. RESULTS 3.1. Recruitment and Manuscript Development

25https://github.com/ismms-himc/covid-19_sinai_reviews

Coverage by Nature Toolbox [32] and an associated

tweet23 about the project on April 1, 2020 attracted the interest of the scientific community (Figure 3). Because papers and preprints in each category. A total of 285 new GitHub issues are similar to other common web com- paper issues had been opened as of September 13, 2021. menting systems, authors learned these tools quickly. The manuscripts produced by the consortium (excludThe Gitter chat also presented a low barrier to entry. The ing this one) will be submitted to mSystems as part of manuscript continued to grow throughout the first year a special issue that provides support for continuous upand a half of the project in both word count and the num- dates as more information becomes available. One has ber of references (Figure 3). Though only a fraction of been published and two are available as preprints. This potential contributors contributed to the text included in approach allows for a version of record to be maintained the manuscripts (Figure 3), many contributors remained alongside the most recent version, which is always availengaged over the long term (Figure 4). Additionally, new able through GitHub. These manuscripts cover a wide contributors continued to join even into the second year range of topics including the fundamental biology of of the project. SARS-CoV-2 (pathogenesis [33] and evolution), biomedi

In order to make the project more accessible, we de- cal advances in responding to the virus and COVID-19 veloped resources explaining how to use GitHub’s web (pharmaceuticals [29], nutraceuticals [34], vaccines, and interface to develop and edit text for Manubot assuming diagnostic technologies), and biological and social factors no prior experience with version control. These tutori- influencing disease transmission and outcomes. To date, als explained how to open an issue, open a pull request, 50 authors are associated with the consortium (Figure 3). and review a pull request24. Additionally, the framework More formal recruitment eforts to integrate with exfor evaluating literature was converted into issue tem- isting projects providing support for undergraduate stuplates to simplify the review of new articles. Articles dents during COVID-19 were also successful. We incorpowere classified as diagnostic, therapeutic, or other, with rated summaries written by the students, post-docs, and an associated template developed to guide the review of faculty of the Immunology Institute at the Mount Sinai School of Medicine25 [12]. Additionally, two of the con23https://twitter.com/j_perkel/status/1245454628235309057 24CONTRIBUTING.md and INSTRUCTIONS.md within the repository sortium authors were undergraduate students recruited (Table 1). Open publishing thus allowed us to harness through the American Physician Scientist Association’s the domain expertise of a large group of non-technical Virtual Summer Research Program. Thus, the consor- users to respond to the flood of COVID-19 publications. tium was successful in providing a venue for researchers Several existing and new features in Manubot aid in across all career stages to continue investigating and pub- responding to the challenges posed by the infodemic. lishing at a time when many biomedical researchers were Manuscripts are written in markdown and can be renunable to access their laboratory facilities. dered in several formats providing diferent advantages to users. For example, beyond building just a PDF, Manubot 3.2. Integrating Data also renders the manuscript in HTML, DOCX, and now, LaTeX (in a more limited capacity). The interactive We integrated data into the manuscripts from several HTML manuscript format ofers several advantages over sources (Figure 2). Worldwide cases and deaths were a static PDF to harmonize available resources and adtracked by the COVID-19 Data Repository by the Center dress specific problems related to COVID-19. The intefor Systems Science and Engineering at Johns Hopkins gration of scite into the HTML build makes references University26. The clinical trials statistics and figure were more manageable by visually indicating whether their regenerated based on data from the University of Oxford sults are contested or whether they have been corrected Evidence-Based Medicine Data Lab’s COVID-19 Trial- or retracted. Cross-referencing diferent pieces of the sTracker [35]. Information about vaccine distribution manuscript, such as cited preprints with reviews stored was extracted from Our World In Data27 [36]. Figure 1 in an appendix, is another interactive option presented integrates data from the CORD-19 dataset [3]. by HTML. The DOCX format was preferred by most

Manubot’s bibliographic management capabilities non-technical users for reviewing the final version of the were critical because the amount of relevant literature manuscript and was useful for creating submissions to a published far outstripped what we had anticipated at biological journal. Additionally, because of the heavy emthe beginning of the project. As of September 10, 2021, phasis on Word processing in biology, Manubot’s ability there were 1,676 references (Figure 3). The scite plu- to generate DOCX outputs was expanded to allow users gin provided a way to visually inspect the reference to generate DOCX files containing only a section of the list to identify possible references of concern. This manuscript. In our case, where the full project is nearly and the other new features required for the COVID-19 150,000 words, this allows individual pieces to be shared project are now included in Manubot’s rootstock, which more easily. Finally, the preliminary addition of LaTeX is the template GitHub repository for creating a new output is useful for researchers from computational fields manuscript. Using CI, Manubot now checks that the who submit papers in TeX format and removes the step manuscript was built correctly, runs spell-checking, and of reformatting markdown prior to submission. cross-references the manuscripts cited in this review. In addition, Manubot now supports citing clinical trial identifiers such as clinicaltrials:NCT04292899 [37].

4. DISCUSSION

The current project was based in the GitHub repository greenelab/covid19-review using Manubot [19] to continuously generate the manuscript. The Manubot framework facilitated a massive collaborative review on an urgent topic. We demonstrated the utility of Manubot to a project where many contributors lacked expertise or even experience working with version control. This efort has produced not only seven literature reviews on topics relevant to the COVID-19 pandemic, but has also generated cyberinfrastructure for training novice users in GitHub. We also extended the functionalities of Manubot to provide more of the benefits of What You See Is What You Get platforms such as Google Docs

26https://github.com/CSSEGISandData/COVID-19/tree/master/ csse_covid_19_data/csse_covid_19_time_series 27https://github.com/owid/covid-19-data

Type

Description CI Regularly download external data sources, generate new figures and statistics, and read them when Manubot builds the latest manuscript CI Post spell-checking reports as pull request com

ments Citations Zotero extension to report more relevant clini

cal trial metadata from https://clinicaltrials.gov Citations Cite any Compact Uniform Resource Identifier,

such as clinicaltrials or ncbigene Citations scite badges to track retractions, corrections,

and notices of concern Outputs Improved support for Pandoc’s LaTeX output Outputs Build complete manuscript alongside individual

sections as standalone documents

The COVID-19 Review Consortium provided a platform for researchers to engage in scientific investigation early in the pandemic when many biological scientists were unable to access their research spaces. In turn, by seeking to adapt Manubot to allow for broader partic- experts. This asynchronicity could potentially introduce ipation, we made a number of improvements that are incompatibility between the figures and the surroundexpected to increase its appeal to researchers from all ing text. Similarly, in line with the collaboration-related backgrounds. Manubot provided a way for contributors challenges of the project, some authors returned to upfrom a variety of backgrounds, including early-career date their text, while others did not. As a result, the researchers, to join a massive collaborative project while lead authors of each paper often spent several weeks demonstrating their individual contributions to the larger prior to journal submission updating the text to reflect work and gaining experience with version control. The new developments in each area. In the future, it may be licensing and infrastructure also provide the basis for possible to streamline this process through integration individuals to adapt from this project to create their own with a tool such as CoronaCentral [4] to automatically snapshots of the COVID-19 literature that derive from, identify relevant, high-impact papers that need to be but are not wholly identical to, the primary versions of included, although expertise would still be required to these reviews. This project suggests that massive online incorporate them. Another challenge involves tracking open publishing eforts can indeed advance scholarship preprints as they are reviewed or critiqued, revised, and through inclusion [22], including during the extreme potentially published. While updating the content of the challenges presented by the COVID-19 pandemic. manuscript would likely fall to human contributors, au

Some challenges did arise in eforts to include an aca- tomatic detection of published versions of preprints [38] demically diverse set of authors. The barriers to entry could be integrated in the future. These challenges are posed by git and GitHub likely still reduced participation exacerbated by the scale of the infodemic, but developing from individuals who might have otherwise been inter- solutions would benefit future projects tracking more typested. Using pull requests as a tool for writing text is also ical trends in publication. Similarly, outputting machine unfamiliar to many or most scientists, and the review readable summaries of key information in the COVIDprocess can be slow, which might cause interested con- 19 review manuscripts could reduce their contribution tributors to lose interest. Additionally, the pull request to the infodemic. As it stands, the integration of Commodel may limit people from providing general feedback pact Uniform Resource Identifier does make a step in this on the manuscript or a section of the manuscript. As a direction. Formal identifiers could be used to extract reresult, some feedback came through email or comments lationships among clinical trials, genes, publications, and on the DOCX outputs that were then translated into is- other entities. Thus, the experience of using Manubot sues or pull requests by the project managers. Given that for a massive project has laid the foundation for future our approach hinged on these version control tools, it is additions to enhance user experience and inclusivity. likely that our group of contributors was biased towards those who were interested in or experienced with computational tools. The trajectory of the pandemic itself 5. CONCLUSION also likely influenced participation: engagement waned over the course of the pandemic as labs opened back up With the worldwide scientific community uniting durand researchers were able to return to their work, and we ing 2020 and 2021 to investigate COVID-19 from a wide recruited very few senior clinicians to the project, which range of perspectives, findings from many disciplines is unsurprising given the load on medical professionals are relevant on a rapid timescale to a broad scientific during this time. Engagement that waxes and wanes is, audience. As many other eforts have described, the pubhowever, typical when writing massively open online lishing rate of formal manuscripts and preprints about papers [22]. Adding features such as spell-check did im- COVID-19 has been unprecedented [1], and eforts to prove usability, and additional features such as automati- review the body of COVID-19 literature are faced with cally checking the formatting of citations could further an ever-expanding corpus to evaluate. In the case of the improve the usability of this tool. In the future, a formal seven manuscripts produced by the COVID-19 Review study of participation could allow for quantification of Consortium, Manubot allows for continuous updating of these biases and improved eforts to foster inclusion. the manuscripts as the pandemic enters its second year

Additional limitations are challenges associated with and the landscape shifts with the emergence of promismassively open online papers in general. With such a ing therapeutics and vaccines [29]. These manuscripts large amount of text, it is not possible to keep all sec- pull data from external sources and update information tions of the manuscript up to date at all times. Read- and visualizations daily using CI. By of-loading some ers are not able to distinguish when each section was updates to computational pipelines, domain experts can updated. Even GitHub’s blame functionality does not focus on the broader implications of new information as it distinguish minor changes from substantive updates to emerges. Centralizing, summarizing, and critiquing data the text. While much of the data and statistics update and literature broadly relevant to COVID-19 can expedite automatically, the text itself required updating by human the interdisciplinary scientific process that is currently happening at an advanced pace. As of September 13, 2021, Peer Review?, Advances in Chronic Kidney Disease 2,886 commits have been made to the manuscript across 27 (2020) 418–426. 575 merged pull requests. The eforts of the COVID-19 [2] J. Zarocostas, How to fight an infodemic, The Review Consortium illustrate the value of including open Lancet 395 (2020) 676. source tools, including those focused on open publishing, [3] L. L. Wang, K. Lo, Y. Chandrasekhar, R. Reas, J. Yang, in these eforts. By facilitating the versioning of text, such D. Burdick, D. Eide, K. Funk, Y. Katsis, R. Kinney, platforms also allow for documentation of the evolution Y. Li, Z. Liu, W. Merrill, P. Mooney, D. Murdick, of thought in an evolving area and formal analysis of a D. Rishi, J. Sheehan, Z. Shen, B. Stilson, A. Wade, collaborative project. This application of version control K. Wang, N. X. R. Wang, C. Wilhelm, B. Xie, D. Rayholds the potential to improve scientific publishing in a mond, D. S. Weld, O. Etzioni, S. Kohlmeier, CORDrange of disciplines, including those outside of traditional 19: The COVID-19 Open Research Dataset, arXiv computational fields. While Manubot is a technologically (2020) 2004.10706. complex tool, this project demonstrates that it can be [4] J. Lever, R. B. Altman, Analyzing the vast coronapplied to a variety of projects. Future work can address avirus literature with CoronaCentral, Proceedings remaining limitations and continue to advance Manubot of the National Academy of Sciences 118 (2021) as an inclusive tool for open publishing projects. e2100766118. [5] G. Eysenbach, The impact of preprint servers and electronic publishing on biomedical research, CurAcknowledgements rent Opinion in Immunology 12 (2000) 499–503. [6] D. Lowe, Too Many Papers, 2021. URL: https://www.

This work would not be possible without support from science.org/content/blog-post/too-many-papers. the COVID-19 Review Consortium28. We are also grate- [7] N. S. L. Yeo-Teh, B. L. Tang, An alarming retraction ful to Nick DeVito for assistance with the Evidence-Based rate for scientific publications on Coronavirus DisMedicine Data Lab COVID-19 TrialsTracker data, Josh ease 2019 (COVID-19), Accountability in Research Nicholson and Milo Mordaunt for scite support, David 28 (2020) 47–53.

Nicholson for spell-check assistance, and Milton Pivi- [8] A. Abritis, A. Marcus, I. Oransky, An “alarming” dori as well as consortium members Alex Lee and Chris- and “exceptionally high” rate of COVID-19 retractian Bruefer for feedback. Research was supported by tions?, Accountability in Research 28 (2020) 58–59. the Gordon and Betty Moore Foundation award GBMF [9] G. Agoramoorthy, M. J. Hsu, P. Shieh, Queries on 4552 (HMR, DSH, CSG), the National Institutes of Health the COVID-19 quick publishing ethics, Bioethics award R01HG010067 (HMR, CSG), and the John W. and 34 (2020) 633–634.

Jeanne M. Rowe Center for Research in Virology (AG)29. [10] C. Boodman, S. Lee, J. Bullard, Idle medical students review emerging COVID-19 research, Medical EdReferences ucation Online 25 (2020) 1770562. [11] J. Brainard, Scientists are drowning in COVID-19 [1] C. Vlasschaert, J. M. Topf, S. Hiremath, Proliferation papers. Can new tools keep them afloat?, Science of Papers and Preprints During the Coronavirus (2020).

Disease 2019 Pandemic: Progress or Problems With [12] N. Vabret, R. Samstein, N. Fernandez, M. Merad, T. S. I. R. Project, Trainees, Faculty, Advancing scientific 28COVID-19 Review Consortium: Vikas Bansal, John P. Barton, knowledge in times of pandemics, Nature Reviews Simina M. Boca, Joel D Boerckel, Christian Bruefer, James Brian Immunology 20 (2020) 338–338. Byrd, Stephen Capone, Shikta Das, Anna Ada Dattoli, John J. Dziak, [13] J. Sun, W.-T. He, L. Wang, A. Lai, X. Ji, X. Zhai, Jefrey M. Field, Soumita Ghosh, Anthony Gitter, Rishi Raj Goel, G. Li, M. A. Suchard, J. Tian, J. Zhou, M. Veit, FCeansgeylinSg. HGure,eNnaefis,aMMa.roJaudeanvjBi,eJnerGemueybiPla.,KDamanili,eSleSr.geHyimKmnyealszteevi,n, S. Su, Covid-19: Epidemiology, Evolution, and Likhitha Kolla, Alexandra J. Lee, Ronan Lordan, Tiago Lubiana, Cross-Disciplinary Perspectives, Trends in MolecuTemitayo Lukan, Adam L. MacLean, David Mai, Serghei Mangul, lar Medicine 26 (2020) 483–495. David Manheim, Lucy D’Agostino McGowan, Amruta Naik, YoSon [14] R. Weissleder, H. Lee, J. Ko, M. J. Pittet, COVIDPark, Dimitri Perrin, Yanjun Qi, Diane N. Rafizadeh, Bharath Ram- 19 diagnostics in context, Science Translational sRuunbdinaer,ttHi,aElileizMab. eRtahnSdeol,l,SLanamdiopnanicaRaSyh,iMnhicohlsateelr,PA.RshowbsionnN,V.iSnkceelnlyt, Medicine 12 (2020) eabc1931. Yuchen Sun, Yusha Sun, Gregory L Szeto, Ryan Velazquez, Jinhui [15] J. M. Sanders, M. L. Monogue, T. Z. Jodlowski, J. B. Wang, Nils Wellhausen Cutrell, Pharmacologic Treatments for Coronavirus 29Conflicts of interest. SMB: Now employed by AstraZeneca Disease 2019 (COVID-19), JAMA (2020). (Gaithersburg, MD). May own stock or stock options. Work con- [16] T. Carvalho, COVID-19 Research in Brief: DecemAduccetleitdy aatndprSeavniooufi.sAGpo:sPiatitoenn.t aLpDpMlic:aRtieocneifilveeddwciothnstuhletiWngisfceoenssfinrom ber, 2019 to June, 2020, Nature Medicine 26 (2020) Alumni Research Foundation related to classifying activated T cells. 1152–1153.