=Paper= {{Paper |id=Vol-3415/paper-8 |storemode=property |title=FAIR Functional Enrichment: Assessing and Modelling Provenance in Omics Results |pdfUrl=https://ceur-ws.org/Vol-3415/paper-8.pdf |volume=Vol-3415 |dblpUrl=https://dblp.org/rec/conf/swat4ls/ChenVW23 }} ==FAIR Functional Enrichment: Assessing and Modelling Provenance in Omics Results== https://ceur-ws.org/Vol-3415/paper-8.pdf
FAIR Functional Enrichment: Assessing and Modelling
Provenance in Omics Results
Yi Chen1 , Fons.J.Verbeek1 and Katherine.J. Wolstencroft1,*
1
    Leiden Institute of Advanced Computer Science, Leiden 2333CA, NL


                                         Abstract
                                         Functional enrichment analysis is an essential downstream process in high throughput omics studies,
                                         such as transcriptomics and proteomics. By using the Gene Ontology (GO) and its annotations (GOA),
                                         underlying functional patterns of over-representation can be identified, leading to better interpretation
                                         of the omics data and new biological insights. However, GO reflects the current understanding of
                                         gene product function and evolves with our changing biological knowledge. When performing such
                                         analyses, it is therefore crucial to record GO version provenance, together with related parameters, such
                                         as statistical cut-offs and annotation sources. Surveying the literature on functional enrichment results
                                         reveals provenance information is rarely available, reducing the reproducibility and interpretation of
                                         results and preventing objective comparisons between related studies. In this work, we propose minimal
                                         metadata requirements for functional enrichment reproducibility. Our model complies with the FAIR
                                         principles and is based on the provenance ontology (PROV-O). We demonstrate the scale of the problem
                                         and the utility of our solution with data from SARS-CoV-2.

                                         Keywords
                                         enrichment analysis, reproducibility, provenance, Gene Ontology, FAIR, PROV-O




1. Introduction
Functional enrichment analysis has been widely used in biomedical research, to interpret high
throughput data[1][2] or to discover underlying mechanisms of diseases[3]. These analyses
are dependent on biological knowledge-bases, such as the Gene Ontology (GO)[4], or Kyoto
Encyclopedia of Genes and Genomes (KEGG)[5], that capture and structure our biological
understanding. However, knowledge-bases are not static. Instead, they are frequently updated
to depict the latest biological knowledge in the science community[6]. The Gene Ontology, for
example, is updated monthly. Updates may include changes to the hierarchical structure and
the conceptualization of our knowledge about gene functions, and changes to Gene Ontology
annotation, which describes the associations between genes and GO terms, including the
evidence for associations. The KEGG pathway knowledge-base is updated quarterly. Differences
between knowledge-base versions can strongly affect the outcome of functional enrichment
analyses. Tomczak et al[7] showed the extent to which the consistency, significance scores and

SWAT4HCLS 2023: The 14th International Conference on Semantic Web Applications and Tools for Health Care and Life
Sciences
*
  Corresponding author.
$ y.chen@liacs.leidenuniv.nl (Y. Chen); f.j.verbeek@liacs.leidenuniv.nl ( Fons.J.Verbeek);
k.j.wolstencroft@liacs.leidenuniv.nl (Katherine.J. Wolstencroft)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
the interpretation of enrichment analysis results are changed by using different versions of
GO and GOA. Another study[8] showed that 74% of enriched terms were changed between
2010 and 2016. If versioning information is not recorded, results from different studies are less
comparable, making previously published studies less re-usable. To confound this problem
further, a range of software tools[9][10][11] are available for functional enrichment. Each tool
has its own update schedule, which may not follow the update schedules of the underlying
knowledge-bases. Using the latest version of a functional enrichment tool does not guarantee
that the latest version of underlying knowledge-bases are available[8]. In addition to problems
with versioning, different functional enrichment applications implement different statistical
tests, different background gene sets, and different default values for statistical significance and
multiple testing corrections.
   Wijesooriya et al (2022)[12] showed the extent of the differences in significant pathways and
ontology terms for different methods in functional enrichment and multiple-testing correction.
These findings together demonstrate the importance of capturing provenance to enable the in-
terpretation of enrichment analysis results, but this information is seldom found in publications.
Here, we demonstrate the current state of enrichment analysis reproducibility by studying
data from SARS-Cov-2. The international response to the virus resulted in the generation
and publication of large amounts of data and fast evolution of our collective knowledge. By
surveying this data, we show that research conclusions were frequently based on outdated
versions of available knowledge-bases, potentially missing the inclusion of new insights as they
were discovered and shared. In this work, we randomly selected and manually inspected the
metadata and provenance provided in research output from the PubMed Central[13] identifying
common reporting practices and the most common tools[10][14][11][15][16][17][9][18] and
methods used for analysis. In addition, we compared the versions of tools and underlying
knowledge-bases between 2020 and 2022, revealing a lack of consistency in knowledge used to
interpret experimental results. To address these problems, we propose minimal metadata and
provenance requirements to improve comparison between functional enrichment experiments.
   Our model complies with the Findability, Accessibility, Interoperability, and Reusability (FAIR)
principles[19] and builds on established methods and standards. We adopt the PROV Ontology
(PROV-O)[20] to describe a set of classes, properties, and restrictions to capture experimental
parameters sufficiently and demonstrate the utility of the model using data from the SARS-CoV-2
literature.


2. Methods
To determine the variation of tools and knowledge-base versions utilized by functional enrich-
ment studies in SARS-CoV-2, we counted the frequency of tools used from a random sample of
PubMed Central (PMC)[13] publications, and examined the additional information provided on
functional enrichment. For the most commonly used tools, we investigated the update schedule
for underlying knowledge-bases and present the results for the Gene Ontology. From these
results, we propose a PROV-O based model to capture functional enrichment provenance.
2.1. Functional Enrichment Literature Survey
2.1.1. Data collection
SARS-CoV-2 was selected for this investigation due to the a large number of publications that
were produced on this topic over a short period of time. Understanding the virus required data
and knowledge sharing on a large scale, and having access to the most recent insights was
essential for comparing studies. To investigate the functional enrichment provenance typically
provided, we first identified SARS-CoV-2 publications in PMC. The search terms can be found
in supplementary file 1.
    A total of 3206 publication identifiers were retrieved (the full list of PMC identifiers can be
found in supplementary file 2) and full papers were retrieved using the PubMed Central Open
Access API in BioC format PMC BioC API (accessed in Nov 23th, 2022).
    Based on the occurrence of the search term ’enrichment analysis’ in the methods section of
each paper, we selected the top 100 papers and manually inspected them. Articles were excluded
if they described enrichment analysis methods, instead of presenting analysis results. In total,
92 articles were retained, containing 135 enrichment analyses.
    For each enrichment analysis described in the cohort, we identified the following information:
1) The name of the tool and method used for enrichment 2) the version of the tool 3) The name(s)
of the knowledge-bases used for enrichment 4) The version of knowledge-bases used
    Data was collected and visualized by python package Matplotlib.pyplot (v3.5.2)[21].

2.1.2. Survey of knowledge-base consistency
Tomczak et al [7], demonstrated that variation in the version of GO and GO annotation (GOA)
affects the interpretation, p-value and the consistency of enrichment analysis. For the tools
identified with the highest frequency of use in the literature survey, we investigated their update
schedule and the update schedule for the underlying knowledge-bases. This investigation was
to determine if researchers were using the most recent knowledge and if there was consistency
of underlying knowledge across studies published at similar times.
   For the top 8 most frequently identified tools, we manually examined the metadata available.
The version and the released date information for associated knowledge-bases were recorded.
If there were multiple releases of a tool between 2020 and 2022, every release was inspected. In
addition, some tools used knowledge-bases derived from primary sources, but further processed
and integrated them into other systems. The Molecular Signatures Database, for example,
(MSigDB)[14][22], integrates multiple knowledge-bases, but without clear versioning informa-
tion. By manually inspecting the release notes, we examined the availability of knowledge-base
metadata. If the metadata was not provided, we recorded the metadata of the knowledge-base
version closest to the releases of database.

2.2. Metadata and Provenance Requirements
The minimal metadata and provenance requirements we propose are based on the recommenda-
tions from Jauer for minimal provenance[23], the FAIR principles, and key factors described by
Wijesooriya[12] that affect the reproducibility of functional enrichment. We propose metadata
for four aspects; input data, knowledge-base data, enrichment analysis methods conducted,
and output data. Key factors identified by Wijesooriya[12] included, multiple-testing correc-
tion methods, statistical cut-offs and background gene sets. Knowledge-bases like GO, should
be available with the version and the source(s) of data for annotation. Following the FAIR
principles[19], persistent identifiers (PIDs) and standard gene identifiers should be used, as
should timestamps and references to individuals or institutions who hold responsibility for the
experiments 1.

2.3. Representing Provenance in PROV-O
The PROV-O ontology encodes the PROV data model in OWL (web ontology language)[24].
The core elements of PROV-O are: 1) Entities, which can be any real or conceptual objects, 2)
Activities, something that occurs over a given time period, and 3) Agents, which hold the respon-
sibility for activities and the existence of entities. Seven properties, (e.g. ’wasGeneratedBy’),
describe the relationships between these core elements. Here, we propose a PROV-O model to
describe the entities and activities involved in a functional enrichment analysis, showing how
our proposed metadata elements could be used to represent the provenance of the experiment,
to improve comparability and reproducibility.


3. Results
Here, We present the results of the literature survey on enrichment analysis metadata, followed
by an investigation into the consistency of versions of knowledge-bases in the most frequently
used tools. Finally, we propose the minimum metadata required for functional enrichment and
a provenance model to address the comparability problems identified by the literature survey.
During the survey period, 30 versions of the Gene Ontology were released, showing an overall
reduction from 44,700 terms and 92230 edges in 2020 and 43272 terms and 85618 edges in 2022.
935 new terms were added, 524 were merged, and 1417 terms were made obsolete.

3.1. Survey of Enrichment Analysis Results Metadata
Figure 1 shows the results of surveying 135 enrichment analyses from 92 publications, published
between 2020 and 2022. Through manual inspection, 25 different tools and 28 knowledge-bases
and databases were identified. The largest proportion of analyses was conducted using R,
with ’ClusterProfiler’[15], ‘FGSEA’[9] and ’GSVA’[16] accounting for 47 of 135 analyses. The
GSEA platform was the second largest, in which 21 analyses were conducted. Web-based tools
like ’Metascape’[11] were also frequently used, with more than 20 analyses in our survey. In
23 analyses, authors did not report which tool(s) were used. GO and KEGG were the most
frequently used knowledge-bases, with 46 and 35 analyses respectively. Other knowledge-bases
like Reactome[25][26] and Wikipathway[27] were less common in the collection. 37 analyses
did not provide any information on which knowledge-base was used. Our findings showed
large variations in the tools and knowledge-bases in functional enrichment analysis. The use
of different tools should not prevent the comparison of results, but experimental parameters,
source data and version information are required to interpret those differences. Our survey
Figure 1: Functional enrichment survey results. a) Software used in functional enrichment b) Frequency
of analyses reporting metadata of software c) Knowledge-bases used and their frequencies. d) Frequency
of analyses reporting knowledge-base metadata.


showed that metadata relating to the parameters or tool versions were not provided in 96 out of
135 analyses. Versioning information of knowledge-bases was omitted in a further 110 out of 135
analyses. Taken together, these data show a lack of metadata relating to function enrichment
analyses in SARS-Cov-2 studies.

3.2. Knowledge-Base Versioning in Enrichment Tools
For the top 8 most frequently used tools from our survey, we identified the versions of the GO
knowledge-base in use, as described by the tool providers. Figure2 shows the results.
As we can see, some tools, such as DAVID 6.8[18], used versions of GO that predated the SARS-
CoV-2 pandemic. From December 2021, David began quarterly updates of its software, although
how these updates tracked GO updates is not transparent. For EnrichR[10], a 2018 version of GO
was in use until early 2021. In contrast, ClueGO[17] and other Bioconductor-based tools provided
more frequent updates, but only Metascape updated GO monthly and remained up to date with
the Gene Ontology. For tools such as, GSEA[14], the GO knowledge-base version depended
on the GSEA version. These findings show that at any given time in the pandemic, the choice
Figure 2: A timeline of GO knowledge-base updates in functional enrichment tools. The timestamp
inside the boxes represents the date of the last GO update. At any given time point, multiple GO versions
are in-use across the tools, resulting in difficulties in data comparison.


of enrichment analysis software dictated how up-to-date the underlying biological knowledge
was for analysing enrichment results. Consequently, papers published at similar times were
not necessarily basing analysis conclusions on the same collective understanding of biology.
Re-analysing these studies may therefore yield new insights with our recent accumulation of
knowledge. These results highlight the necessity of recording version information and more
extensive provenance data.

3.3. Minimum Metadata Requirements
To increase the reproducibility and comparability of enrichment analysis results, in line with
the FAIR principles, we propose minimum metadata requirements for enrichment analysis in
four aspects; input data, knowledge-base and data sources, enrichment analysis execution, and
output data Table1. These recommendations are based on previous work to define minimum
provenance and reproducible enrichment analysis, as well as on the results of our literature
survey. Example annotations are provided to show what should be recorded for each metadata
element.

3.4. Proposed Provenance Model
The proposed minimum metadata requirements from the previous section form the core com-
ponents of an enrichment analysis provenance model. Figure3 shows the relationships between
these metadata elements, formally modelling the provenance of an enrichment analysis and
expressed using PROV-O. The example instances represented in the model are the same as
the example annotations from table 1, showing how each element is necessary for capturing
sufficient information for comparison and interpretation. Where provenance information is
incomplete, anonymous nodes can be used to highlight what is unknown in an experiment.

Table 1
Minimal metadata requirements for Functional enrichment analyses. The relation to specific
FAIR principles is shown by F(Findable), A (Accessible), I (Interoperable) and R (Reusable).
                                                        Knowledge-
                                                                       Enrichment
 Criteria      Recommendations         Input data       base/Data                         Output data      FAIR
                                                                       Execution
                                                        sources
               A persistent identi-                     GO URI /
 Persistent                            prov:Collection;                                   prov:Collection;
               fier should be as-                       gene sets      tool URI                            F,A
 ID (PID)                              PID                                                PID
               signed                                   URI
               Gene products and
               knowledge-base
               terms should be
 Standard                              ENSG000-
               described using per-                      GO:0006915    enrichr            GO:0071375       F,I
 Identifiers                           0012584
               sistent identifiers
               from a recognised
               source
               An Institution or
                                       researcher or-                                     researcher or-
 Creator       person bears the re-                      maayanlab     maayanlab                           A, R
                                       cidID                                              cidID
               sponsibility
               The time when data
               was generated/en-
 Timestamp                                               2021-03-01                                        R
               richment analysis
               was conducted
               A description of the    Differentially
                                                         GO      and
 Origin        source of the origi-    Expressed                       Ensembl                             F,I,R
                                                         GOA
               nal data                RNA-seq
               How the data was
                                                         biological
 Extraction    obtained from the       significant up-                 reviewed hu-
                                                         process                                           R
 Method        source of the origi-    regulated                       man genes
                                                         terms
               nal data
               The version of
                                                         GO(v2021)
               tools/knowledge-
                                                         and           release 2021-
 Versioning    bases used in                                                                               A, I, R
                                                         GOAv2021/     03-29
               enrichment analy-
                                                         release 108
               sis
               Gene sets used as
 Background                                                            prov:Collection;
               backgrounds in en-                                                                          I, R
 Gene sets                                                             PID
               richment analysis
               Statistical test used
 Statistical                                                           fisher    exact
               in enrichment anal-                                                                         R
 test                                                                  test
               ysis.
 multi-test    The methods used                                        Benjamini-
 correction    for multi-test cor-                                     Hochberg                            R
 method        rection.                                                procedure
               The cut-off used in
 Cut-off       enrichment analy-                                       0.05; 0.01                          R
               sis (p and q)
Figure 3: Prov-O representation for an enrichment analysis experiment on differentially expressed
RNA-Seq data, analysed with Enrichr and 2021 version of GO. Rectangle represents Activity, eclipse
represents entity and rhombus represents agents. Details can be seen atFAIRDOMHUB


4. Discussion
This study highlights problems of reproducibility and comparability in functional enrichment
analyses. We showed there was little consistency in the information reported about such
experiments and revealed a large proportion of the studies we surveyed were not being conducted
using the latest versions of biological knowledge-bases. Structured knowledge resources, such
as GO, allow us to identify patterns in complex, high-throughput omics data, enabling new
insights from our collective knowledge. However, as our knowledge changes, these supporting
knowledge resources also change. This should be an advantage, allowing scientists to benefit
from the work of others. However, our survey showed a large range of enrichment analysis tools
are in common use (Figure1), but that each has its own update schedule for underlying knowledge
(Figure2). The result is that different studies, conducted at similar times, use different versions of
knowledge-bases, and therefore different uderlying knowledge. If we know where the differences
lie, comparison is still possible, but 110/135 enrichment analyses did not provide information
on knowledge-base versioning, and only 39/135 reported the version of the enrichment analysis
tool that was used. From the literature survey, and previous studies on minimal provenance
[23] and reproducible enrichment analyses[12], we propose a minimum set of metadata to
combat the problems described above. In addition, we present a PROV-O based model for
expressing enrichment analysis results, with an example of an enrichment analysis experiment
run using Enrichr[10]. Minimum metadata guidelines for upstream analyses, describing the
generation and statistical analysis of high throughput omics data have long been established[28].
By implementing similar paradigms for downstream analyses, we can improve the FAIRness of
studies overall and enable FAIRer comparison and reuse of important data sets.
5. Appendices
Supplementary files can be found at https://fairdomhub.org/investigations/583


Acknowledgments
Here we thank the support from the Chinese Scholarship Council through Leiden University.


References
 [1] A. Conesa, P. Madrigal, S. Tarazona, D. Gomez-Cabrero, A. Cervera, A. McPherson, M. W.
     Szcześniak, D. J. Gaffney, L. L. Elo, X. Zhang, et al., A survey of best practices for rna-seq
     data analysis, Genome biology 17 (2016) 1–19.
 [2] P. Krishnamoorthy, A. S. Raj, S. Roy, N. S. Kumar, H. Kumar, Comparative transcriptome
     analysis of sars-cov, mers-cov, and sars-cov-2 to identify potential pathways for drug
     repurposing, Computers in biology and medicine 128 (2021) 104123.
 [3] P. Gollapalli, S. B. S, H. Rimac, P. Patil, S. K. Nalilu, S. Kandagalla, P. Shetty, Pathway
     enrichment analysis of virus-host interactome and prioritization of novel compounds
     targeting the spike glycoprotein receptor binding domain–human angiotensin-converting
     enzyme 2 interface to combat sars-cov-2, Journal of Biomolecular Structure and Dynamics
     40 (2022) 2701–2714.
 [4] The gene ontology resource: enriching a gold mine, Nucleic acids research 49 (2021)
     D325–D334.
 [5] M. Kanehisa, S. Goto, Kegg: kyoto encyclopedia of genes and genomes, Nucleic acids
     research 28 (2000) 27–30.
 [6] G. O. Consortium, Expansion of the gene ontology knowledgebase and resources, Nucleic
     acids research 45 (2017) D331–D338.
 [7] A. Tomczak, J. M. Mortensen, R. Winnenburg, C. Liu, D. T. Alessi, V. Swamy, F. Vallania,
     S. Lofgren, W. Haynes, N. H. Shah, et al., Interpretation of biological experiments changes
     with evolution of the gene ontology and its annotations, Scientific reports 8 (2018) 1–10.
 [8] L. Wadi, M. Meyer, J. Weiser, L. D. Stein, J. Reimand, Impact of outdated gene annotations
     on pathway enrichment analysis, Nature methods 13 (2016) 705–706.
 [9] G. Korotkevich, V. Sukhov, N. Budin, B. Shpak, M. N. Artyomov, A. Sergushichev, Fast
     gene set enrichment analysis, BioRxiv (2021) 060012.
[10] M. V. Kuleshov, M. R. Jones, A. D. Rouillard, N. F. Fernandez, Q. Duan, Z. Wang, S. Koplev,
     S. L. Jenkins, K. M. Jagodnik, A. Lachmann, et al., Enrichr: a comprehensive gene set
     enrichment analysis web server 2016 update, Nucleic acids research 44 (2016) W90–W97.
[11] Y. Zhou, B. Zhou, L. Pache, M. Chang, A. H. Khodabakhshi, O. Tanaseichuk, C. Benner, S. K.
     Chanda, Metascape provides a biologist-oriented resource for the analysis of systems-level
     datasets, Nature communications 10 (2019) 1–10.
[12] K. Wijesooriya, S. A. Jadaan, K. L. Perera, T. Kaur, M. Ziemann, Urgent need for consis-
     tent standards in functional enrichment analysis, PLoS computational biology 18 (2022)
     e1009935.
[13] NCBI, Pubmed central, 1999. URL: https://www.ncbi.nlm.nih.gov/pmc/.
[14] A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette,
     A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander, et al., Gene set enrichment analysis: a
     knowledge-based approach for interpreting genome-wide expression profiles, Proceedings
     of the National Academy of Sciences 102 (2005) 15545–15550.
[15] G. Yu, L.-G. Wang, Y. Han, Q.-Y. He, clusterprofiler: an r package for comparing biological
     themes among gene clusters, Omics: a journal of integrative biology 16 (2012) 284–287.
[16] S. Hänzelmann, R. Castelo, J. Guinney, Gsva: gene set variation analysis for microarray
     and rna-seq data, BMC bioinformatics 14 (2013) 1–15.
[17] G. Bindea, B. Mlecnik, H. Hackl, P. Charoentong, M. Tosolini, A. Kirilovsky, W.-H. Fridman,
     F. Pagès, Z. Trajanoski, J. Galon, Cluego: a cytoscape plug-in to decipher functionally
     grouped gene ontology and pathway annotation networks, Bioinformatics 25 (2009)
     1091–1093.
[18] G. Dennis, B. T. Sherman, D. A. Hosack, J. Yang, W. Gao, H. C. Lane, R. A. Lempicki, David:
     database for annotation, visualization, and integrated discovery, Genome biology 4 (2003)
     1–11.
[19] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak,
     N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne, et al., The fair guiding
     principles for scientific data management and stewardship, Scientific data 3 (2016) 1–9.
[20] K. Belhajjame, J. Cheney, D. Corsar, D. Garijo, S. Soiland-Reyes, S. Zednik, J. Zhao, PROV-O:
     The PROV Ontology, Technical Report, 2012. URL: http://www.w3.org/TR/prov-o/.
[21] J. D. Hunter, Matplotlib: A 2d graphics environment, Computing in science & engineering
     9 (2007) 90–95.
[22] A. Liberzon, C. Birger, H. Thorvaldsdóttir, M. Ghandi, J. P. Mesirov, P. Tamayo, The
     molecular signatures database hallmark gene set collection, Cell systems 1 (2015) 417–425.
[23] M.-L. Jauer, T. M. Deserno, Data provenance standards and recommendations for fair data,
     Digital Personalized Health and Medicine (2020) 1237–1238.
[24] S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. McGuinness, P. Patel-Schneijder,
     L. A. Stein, OWL Web Ontology Language Reference, Recommendation, World Wide Web
     Consortium (W3C), 2004. See http://www.w3.org/TR/owl-ref/.
[25] J. Griss, G. Viteri, K. Sidiropoulos, V. Nguyen, A. Fabregat, H. Hermjakob, Reactomegsa-
     efficient multi-omics comparative pathway analysis, Molecular & Cellular Proteomics 19
     (2020) 2115–2125.
[26] M. Gillespie, B. Jassal, R. Stephan, M. Milacic, K. Rothfels, A. Senff-Ribeiro, J. Griss,
     C. Sevilla, L. Matthews, C. Gong, et al., The reactome pathway knowledgebase 2022,
     Nucleic acids research 50 (2022) D687–D692.
[27] M. Martens, A. Ammar, A. Riutta, A. Waagmeester, D. N. Slenter, K. Hanspers, R. A. Miller,
     D. Digles, E. N. Lopes, F. Ehrhart, et al., Wikipathways: connecting communities, Nucleic
     acids research 49 (2021) D613–D621.
[28] C. F. Taylor, D. Field, S.-A. Sansone, J. Aerts, R. Apweiler, M. Ashburner, C. A. Ball, P.-A.
     Binz, M. Bogue, T. Booth, et al., Promoting coherent minimum reporting guidelines for
     biological and biomedical investigations: the mibbi project, Nature biotechnology 26 (2008)
     889–896.