Features of a FAIR Vocabulary

                             Fuqi Xu1,∗[0000−0002−5923−3859] , Nick Juty2,*[0000−0002−2036−8350] ,
                            Carole Goble2[0000−0003−1219−2137] , Simon Jupp3[0000−0002−0643−3144] ,
                                        Helen Parkinson1[0000−0003−3035−4195] , and
                                          Mélanie Courtot1,†[0000−0002−9551−6370]
                       1
                         European Molecular Biology Laboratory, European Bioinformatics Institute,
                       Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
                                                   mcourtot@gmail.com
                           2
                             University of Manchester, , Manchester M13 9PL, United Kingdom
                       3
                          SciBite, BioData Innovation Centre, Wellcome Genome Campus, Hinxton,
                                         Cambridge CB10 1DR, United Kingdom


                             Abstract. The FAIR Principles explicitly require the use of FAIR vo-
                             cabularies, but what precisely constitutes a FAIR vocabulary remains
                             unclear. Here we provide definitions for FAIR vocabularies, examine
                             the application of the FAIR Principles to vocabularies, align their re-
                             quirements with the Open Biomedical Ontologies (OBO) Principles, and
                             propose FAIR Vocabulary Features (FVFs). We also design assessment
                             approaches for FAIR vocabularies by mapping the FVFs with existing
                             FAIR assessment indicators. Finally, we demonstrate how FVFs can be
                             used for evaluating and improving vocabularies using exemplar biomed-
                             ical vocabularies.


                   Keywords: FAIR principles · Vocabulary · Ontology · Assessment


                   1       Introduction

                   The Findable, Accessible, Interoperable and Reusable (FAIR) Principles [40]
                   have gained traction in the biomedical community since their publication in
                   2016, with many groups attempting to improve their data quality, develop FAIR
                   capable data resources, and design generic FAIR assessment tools for biomedical
                   data[6] [16] [39]. Due to the heterogeneous nature and broad scope of biomedical
                   data, from molecules to human studies via interdisciplinary analysis, stringent
                   requirements for its FAIRness need to be met to ensure its usefulness toward
                   benefiting human health. While assessing the FAIR level of datasets and data
                   resources [11], we noted a futile cycle with respect to the ‘Interoperable’ FAIR
                   Principle, ”I2 - (Meta)data use vocabularies that follow FAIR principles”. To
                   comply with that principle, datasets need to use FAIR vocabularies, which them-
                   selves need to be FAIR. FAIR vocabularies promote the exchange of biomedical
                       ∗
                           These authors contributed equally to this paper.
                       †
                           To whom correspondence should be addressed.


Copyright © 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2      F.Xu, N.Juty et al.

data, which are usually generated, annotated and used by different groups of re-
searchers. FAIR vocabularies promote biomedical data FAIRness throughout the
data life cycle, during the data generation, curation, and distribution processes,
and support data exchange and integration across data resources.
    Multiple efforts have been made to develop standards for FAIR vocabular-
ies. The FAIRsFAIR recommendations provide guidance[24] on FAIR seman-
tic artefacts, as well as supporting vocabulary search engines and repositories.
Garijo and Poveda-Villalon[22] discussed detailed requirements of ontology URI
and versioning strategies, as well as the formatting of the ontologies. Ten sim-
ple rules[15] for converting print-based or other forms of legacy vocabularies to
FAIR vocabularies have also been proposed.
    Researchers have also developed approaches to assess the FAIR level of digital
objects both manually and automatically; FAIRsharing hosts FAIR indicators
for automated tests in their FAIR Maturity Evaluation Service[39], FAIR met-
rics in the F-UJI Automated FAIR Data Assessment Tool[4], and the Research
Data Alliance(RDA) Data Maturity Model Specification and Guidelines[5], also
known as the RDA indicators. Among them, the RDA indicators are a set of
representative and descriptive indicators to evaluate the FAIR level of data and
have been used in many projects and with many types of data. Some automated
assessments of the FAIRness of vocabulary have been developed to measure the
FAIR level of public, machine-readable vocabularies, such as FOOPS![21] To
the best of our knowledge, there have not yet been quantifiable FAIR assess-
ment approaches developed to measure the FAIR level of different formats of
vocabularies objectively.
    Therefore, in this paper, we distinguish the concepts of FAIR data, FAIR data
resources, and FAIR vocabularies, and propose a set of general FAIR Vocabulary
Features (FVFs) as a set of satisfiable features for vocabularies. We also adapted
the RDA indicators to measure the FAIR level of vocabularies. Further, we
provide example assessments based on selected ontologies available from the
EMBL-EBI Ontology Lookup Service (OLS)[25] and other vocabulary resources.


2   FAIR Data and FAIR Vocabulary

In the execution of this work, we note the distinction between FAIR data and
FAIR capable data resources. In our analysis, data can be FAIR, to a greater or
lesser extent, and data resources and data vocabularies are capable of supporting
FAIRness (FAIR capable) at different levels. Data vocabularies are designed to
support FAIR data, and they can also be considered as FAIR data resources.
The orthogonality of these concepts is an important context for this work when
determining the features of a FAIR vocabulary. We must also determine whether
a vocabulary itself is 1) FAIR in terms of its application to FAIR data 2) FAIR
in the context of FAIR capable resources 3) FAIR in the context of other vocabu-
laries. A FAIR vocabulary has a set of FAIR features and have a list of associated
FAIR indicators. It is usable for annotation, analysis and presentation of data,
and is deployable in the context of a FAIR capable data resource or tools. It also
                                                  Features of a FAIR Vocabulary   3

serves ’aggregation’ use cases where data originates from different domain, and
enables data interoperability where different vocabularies are used.
    Vocabularies come in different forms, such as lists, thesaurus, taxonomies,
and ontologies; each at different levels of semantic maturity and FAIR require-
ments. The International Classification of Diseases (ICD-11)[31] is a large tax-
onomy of disease and is the global standard for diagnostic information, disease
definitions and synonyms. The Gene Ontology (GO)[12] is a well established and
highly regarded and utilised biomedical resource. It contains over 43000 terms
and has been cross referenced in other classification systems, such as UniProt[13],
HAMAP[32], and InterPro[9]. GO is also a reference OBO Foundry ontology[34]
and has been reused in many other resources. The Experimental Factor Ontology
(EFO)[27], on the other hand, is an application ontology built for communities
like the Open Targets[29] for describing experimental variables.

3       Existing Vocabulary Standards
In determining features of FAIR vocabularies, we considered previous standard-
isation work by the Open Biomedical Ontology (OBO) community to determine
whether the OBO Principles[18] addressed elements of vocabulary FAIRness.
The OBO Principles aim to coordinate the development of biomedical ontolo-
gies, which focus specifically on ontologies, covering both the development of
ontologies and ontology themselves. Despite the OBO principles predating the
FAIR principles, a comparison of the two aided us in defining FVFs. Table 1
summarises the key points of the OBO Foundry principles, and assesses their
suitability as FVFs. We also noted that not all the OBO principles possess the
same level of maturity or granularity, and therefore some were unmappable and
excluded from the comparison. As a result, this analysis did not include OBO
Principle 1, 4, 6, 9 - 12 and 20. The rationale for suitability as FVFs is discussed
in detail below and further in Supplemental Table 1.

4       FAIR Vocabulary Features
Based on the analysis of OBO foundry practices and our previous experience
working with and developing ontologies, we propose eleven features for FAIR
vocabulary in Table 2, covering requirements for identifiers, access protocols,
knowledge representation, etc. Supplementary Table 2 shows the relationship
among FAIR Vocabulary Features, the FAIR principles and requirements for
FAIR vocabularies.
    Table 2also provides examples for each FAIR feature, but does not exhaus-
tively cover all current practices across the various vocabularies; each feature is
represented in different formats and at varying FAIRness levels amongst those
vocabularies. For example, for FVF-6: versioning and persistent vocabularies, of
all ontologies indexed and updated in OLS, 59.3%§ of vocabularies use a date for-
mat of ”yyyy-mm-dd” in the ”versionIRI”, such as ”http://purl.obolibrary.org/
    §
        See details in Supplementary Material 3
4       F.Xu, N.Juty et al.

Table 1. An analysis of the OBO Foundry principles as putative FAIR Vocabulary
Features

ID                 OBO Principle Summary                                          Suitable
                                                                                  as
                                                                                  FAIR
                                                                                  Vocab-
                                                                                  ulary
                                                                                  Fea-
                                                                                  ture?
Principle 1: Open The ontology MUST be openly available to be used by No
                    all without any constraint other than (a) its origin must
                    be acknowledged and (b) it is not to be altered and sub-
                    sequently redistributed in altered form under the original
                    name or with the same identifiers.
Principle 2: Com- The ontology is made available in a common formal lan- Yes
mon Format          guage in an accepted concrete syntax.
Principle        3: Each class and relation (property) in the ontology must Yes
URI/Identifier      have a unique URI identifier.
Space
Principle 4: Ver- The ontology provider has documented procedures for ver- No
sioning             sioning the ontology, and different versions of ontology are
                    marked, stored, and officially released.
Principle 5: Scope The scope of an ontology is the extent of the domain or Yes
                    subject matter it intends to cover. The ontology must have
                    a clearly specified scope and content that adheres to that
                    scope.
Principle 6: Tex- The ontology has textual definitions for the majority of No
tual Definitions its classes and for top level terms in particular
Principle 7: Rela- Relations should be reused from the Relations Ontology Yes
tions               (RO).
Principle 8: Docu- The owners of the ontology should strive to provide Yes
mentation           as much documentation as possible. The documentation
                    should detail the different processes specific to an ontology
                    life cycle and target various audiences (users or develop-
                    ers).
Principle 9: Docu- The ontology developers should document that the ontol- No
mented Plurality ogy is used by multiple independent people or organiza-
of Users            tions.
Principle       10: OBO Foundry ontology development, in common with Yes
Commitment to many other standards-oriented scientific activities, should
Collaboration       be carried out in a collaborative fashion.
Principle 11: Lo- There should be a person who is responsible for communi- No
cus of Authority cations between the community and the ontology develop-
                    ers, for communicating with the Foundry on all Foundry-
                    related matters, for mediating discussions involving main-
                    tenance in the light of scientific advance, and for ensuring
                    that all user feedback is addressed.
Principe 12: Nam- Naming conventions are used                                     No
ing Conventions
Principle       16: The ontology needs to reflect changes in scientific consen- Yes
Maintenance         sus to remain accurate over time.
Principle 20: Re- Ontology developers MUST offer channels for community No
sponsiveness        participation and SHOULD be responsive to requests.
ID     Features                              Description                                               Examples

FVF-   Vocabulary and constituted terms are Vocabulary itself and its constituent terms should have Examples of globally unique and persistent identifiers are PURL[19], identifiers.org[41],
1      assigned globally unique and persistent identifiers that are globally unique and persistent to en- and   w3id.org[37].   The   OBO    foundry  provides  identifier policy[2]   for biomedi-
       identifiers.                            sure that each item can be identified unambiguously over cal    ontologies   and   requires using   PURLs   as  with   standard   prefixes, such  as
                                               time.                                                      http://purl.obolibrary.org/obo/GO 0000022.

FVF-   Vocabulary and constituted terms have Vocabulary itself and its constituent terms should have Metadata of the vocabulary should provide information about the creation date, creator and
2      rich metadata.                        sufficient metadata to support discovery by both humans editor, version, licence, target domain and short descriptions. Metadata of its terms should
                                             and machines.                                           describe term editing history, definition source, and other metadata.

FVF-   Vocabulary and constituted terms can The URIs of vocabulary itself and its constituent terms http://www.ebi.ac.uk/efo/EFO 0000311 resolves to term “Cancer” in the Experimental Factor
3      be accessed using the identifiers, can be dereferenced by both humans and machines.          Ontology, which can be accessed by both humans using ontology browsers and machines through
       preferably by both humans and ma-                                                            the OLS API.
       chines.

FVF-   Vocabulary and constituted terms are Vocabulary itself and its constituent terms are registered EMBL-EBI Ontology Lookup Service and NCBI BioPortal[38]are two popular public vocabulary
4      registered or indexed in a searchable in vocabulary archives or other vocabulary management archives. Property X-Robots-Tag:index in vocabularies allows them to be indexed by search engines.
       engine or a resource.                 systems and indexed by local or/and global search en-
                                             gines.

FVF-   Vocabulary and constituted terms are Vocabulary itself and its constituent terms are retrievable Most public ontologies can be accessed using the HTTP or HTTPS protocol. For example, EFO
5      retrievable using a standardised com- using a standardised communications protocol, preferably uses HTTP protocol. The Unified Medical Language System[10] uses HTTPS protocol and only
       munications protocol, preferably open, open, free and universally implementable protocols, such allows access by authenticated users.
       free and universally implementable as HTTPS, HTTP, FTP. The protocol should also allow
       protocols, which allows for authenti- identifying the accessor and grant access based on the
       cation and authorisation, where neces- accessor privilege, when necessary.
       sary.

FVF-   Vocabulary and constituted terms are Changes in the vocabulary are reflected in different ver- Changes in EFO are included in each release and identified with versioned IRI, such as,
6      persistent over time and are appropri- sions. Vocabularies and their terms are versioned, and http://www.ebi.ac.uk/efo/releases/v3.31.0/efo.owl, which resolves to the versioned vocabulary.
       ately versioned.                       each unaltered version of the vocabulary can be identi- OBO foundry also provides guidelines[20] for ontology versioning and how different versions of
                                              fied and retrieved in perpetuity. Vocabulary metadata is the vocabularies should be labelled, stored and published.
                                              available even when the vocabulary is no longer available.

FVF-   Vocabulary and constituted terms Vocabulary itself and its constituent terms use a for- OWL-based vocabularies can be serialised using RDF-XML, or relational databases e.g.
7      use a formal, accessible and broadly mal, accessible and broadly applicable, and preferably ChEBI[23] can be converted into OWL[7]
       applicable, and preferably machine- machine-understandable language for knowledge repre-
       understandable language for knowl- sentation.
       edge representation.

FVF-   Vocabulary and constituted terms use Vocabulary reuse terms from other vocabularies when EFO reuses human anatomy terms such as “liver” from UBERON[28] (UBERON 0002107) and
8      qualified references to other vocabular- applicable, provide adequate metadata about external linked to the original UBERON term. Property Xref indicates a cross-reference relationship
       ies.                                     terms, and follow vocabulary cross-reference standards. between two vocabulary terms. MIREOT[14] defines a methodology and minimum information
                                                                                                                                                                                                        Table 2. FAIR Vocabulary Feature details


                                                                                                        requirements for importing external terms into an extant ontology.

FVF-   Vocabulary and constituted terms are Vocabulary terms include sufficient attributes, such as la- The OBO flat-file format specification[17] provides a list of recommended mandatory and op-
9      described with a plurality of accurate bels, synonyms, definitions, examples of usage, and cross- tional attributes. Each vocabulary term must have an ID and a name. The recommended at-
       and relevant attributes.               references, to support the interpretation and reuse of the tributes include definition, synonym, Xref, relationship, and etc.
                                              vocabulary terms.

FVF-   Vocabularies are released with a stan- The vocabulary includes information about how the vo- Common public data usage licences are CC-BY[1] and MIT[3]. For example, Gene Ontology
10     dard data usage licence, preferably a cabulary can be reused.                                uses Creative Commons Attribution 4.0 Unported License. SNOMEDTM [35]uses a self-defined
       machine-readable licence.
                                                                                                                                                                                                                                                   Features of a FAIR Vocabulary


                                                                                                    SNOMED CTTM affiliate license agreement.

FVF-   Vocabularies meet    domain-relevant Vocabularies cover essential terms for the specific domain, Community standards, such as minimum information requirements and data models can be
11     community standards.                 reflect knowledge of this domain and can be used in ex- found in FAIRsharing[33]. The Plant Phenotyping Experiment Ontology (PPEO)[8] implements
                                            isting data standards and data models.                      the Minimum Information about Plant Phenotyping Experiment(MIAPPE)[26]standards and
                                                                                                        covers essential attributes to describe a MIAPPE-compliant phenotype dataset.
                                                                                                                                                                                                                                                   5
6       F.Xu, N.Juty et al.

-obo/scdo/releases/2021-04-15/scdo.owl”. 2.51% of vocabularies use semantic
versioning (x.x.x) such as ”http://www.ebi.ac.uk/efo/releases/v3.34.0/efo.owl”
or other forms of numeric versioning, such as http://www.orpha.net/version3.2.
31.66% of vocabularies do not provide valid machine-readable versioned IRIs. For
FVF-1: identifiers, 74% of vocabularies use OBO-format PURLs, identifier.org,
w3id.org identifiers, as well as other domain-specific identifiers. For FVF-5: ac-
cessible using standard protocols, of all 199 selected ontologies, only one ontology
used the HTTPS protocol; the rest use HTTP protocols.


5   FAIR Vocabulary Feature Indicators

FAIR vocabulary Features outline general characteristics of a FAIR vocabulary,
however, those features need to be objectively quantified to be useful in vocab-
ulary selection, development and assessment. Hence, we propose aligning FVFs
with FAIR indicators to enable computation of a discrete FAIR score, with the
aim to offer an objective quantitative evaluation of vocabularies and to guide
subsequent improvements.
    We mapped the RDA indicators to FAIR Vocabulary Features, filtered out
indicators that do not apply to vocabularies (see details in Supplementary Table
4, specified the digital object which the indicator refers to, and identified within
each indicator the relevant standards used in corresponding domains. It is worth
noting that when mapping the RDA indicators on datasets, metadata refers to
the metadata to which the vocabulary can be applied, while in the context of
vocabularies, metadata and data refer to the description of the vocabulary and
the vocabulary information. Therefore, we combined the indicators evaluating
data and metadata in the mapping, wherever possible. The FVFs, associated
with selected indicators, can be used as indicators for FAIR Vocabulary as shown
in Table 3.


6   Assessment against Indicators for FAIR Vocabulary
    Features

We tested the FAIR Vocabulary Features and corresponding indicators on three
representative vocabularies, GO, EFO and ICD-11, as shown in Table 4. For
each FVF, three compliance levels are assigned; if a vocabulary meets the re-
quirements of all indicators, full compliance is achieved. Otherwise, depending
on the scoring within each FVF, partial compliance or no compliance results are
given. The percentages of full compliance, partial compliance and no compliance
features are also calculated. Supplementary Table 5-7 provide the assessment
details.
    From the assessment results, both the Gene Ontology and Experimental Fac-
tor Ontology are vocabularies of high FAIR level, with over 80% FVFs fulfilled.
The Gene Ontology only partially complies with ‘FVF-6: Vocabularies and their
terms are persistent over time and are appropriately versioned ’, with a Fail in
                                                              Features of a FAIR Vocabulary                      7

Table 3. Indicators for FAIR Vocabulary Features. Alignment between the FAIR Vo-
cabulary Features and RDA Data Maturity level indicators

FAIR vocabulary Feature                      RDA     indicator Indicator
                                             ID
FVF-1: Vocabulary and their terms are as-
signed globally unique and persistent identi- RDA-F1-01M         Metadata is identified by a persistent identifier
fiers.
                                              RDA-F1-01D         Data is identified by a persistent identifier
                                              RDA-F1-02M         Metadata is identified by a globally unique
                                                                 identifier
                                                 RDA-F1-02D      Data is identified by a globally unique identifier
FVF-2: Vocabularies and their terms have rich RDA-F2-01M         Rich metadata is provided to allow discovery
metadata.
FVF-3: Vocabularies and their terms can be ac-
cessed using the identifiers, preferably by both RDA-A1-01M      Metadata contains information to enable the
human and machine.                                               user to get access to the data
                                                 RDA-A1-02M      Metadata can be accessed manually (i.e. with
                                                                 human intervention)
                                             RDA-A1-02D          Data can be accessed manually (i.e. with hu-
                                                                 man intervention)
                                             RDA-A1-03M          Metadata identifier resolves to a metadata
                                                                 record
                                             RDA-A1-03D          Data identifier resolves to a digital object
                                             RDA-A1-05D          Data can be accessed automatically (i.e. by a
                                                                 computer program)
FVF-4: Vocabularies and their terms are reg- RDA-F4-01M          Metadata is offered in such a way that it can
istered or indexed in a searchable engine or a                   be harvested and indexed
resource.
FVF-5: Vocabularies and their terms are re-
trievable using a standardised communications
protocol, preferably open, free and universally
                                                RDA-A1-04M       Metadata is accessed through standardised
implementable protocols. and allows for au-
                                                                 protocol
thentication and authorisation, where neces-
sary.
                                                RDA-A1-04D       Data is accessible through standardised proto-
                                                                 col
                                             RDA-A1.1-01M        Metadata is accessible through a free access
                                                                 protocol
                                             RDA-A1.1-01D        Data is accessible through a free access protocol
                                             RDA-A1.2-01D        Data is accessible through an access protocol
                                                                 that supports authentication and authorisation
FVF-6: Vocabularies and their terms are persis-
                                                RDA-A2-01M       Metadata is guaranteed to remain available af-
tent over time and are appropriately versioned.
                                                                 ter data is no longer available
                                             RDA-R1.2-01M        Metadata includes provenance information ac-
                                                                 cording to community-specific standards
                                             RDA-R1.2-02M        Metadata includes provenance information ac-
                                                                 cording to a cross-community language
FVF-7: Vocabularies and their terms use a
formal, accessible and broadly applicable, and
                                               RDA-I1-01M        Metadata uses knowledge representation ex-
preferably machine-understandable language
                                                                 pressed in standardised format
for knowledge representation.
                                               RDA-I1-01D        Data uses knowledge representation expressed
                                                                 in standardised format
                                             RDA-I1-02M          Metadata uses machine-understandable knowl-
                                                                 edge representation
                                             RDA-I1-02D          Data uses machine-understandable knowledge
                                                                 representation
FVF-8: Vocabularies and terms use qualified
                                            RDA-I3-02D           Data includes qualified references to other data
references to other vocabularies.
                                            RDA-I3-03M           Metadata includes qualified references to other
                                                                 metadata
FVF-9: Vocabularies and terms are described RDA-R1-01M           Plurality of accurate and relevant attributes
with a plurality of accurate and relevant at-                    are provided to allow reuse
tributes.
FVF-10: Vocabularies are released with a stan-
dard data usage licence, preferably machine- RDA-R1.1-01M        Metadata includes information about the li-
readable licence.                                                cence under which the data can be reused
                                               RDA-R1.1-02M      Metadata refers to a standard reuse licence
                                               RDA-R1.1-03M      Metadata refers to a machine-understandable
                                                                 reuse licence
FVF-11: Vocabularies meet domain relevant
                                          RDA-R1.3-01M           Metadata complies with a community standard
community standards.
                                          RDA-R1.3-01D           Data complies with a community standard
                                          RDA-R1.3-02M           Metadata is expressed in compliance with a
                                                                 machine-understandable community standard
                                             RDA-R1.3-02D        Data is expressed in compliance with a
                                                                 machine-understandable community standard
8         F.Xu, N.Juty et al.


Table 4. FAIR vocabulary feature applied, assessment results of Gene ontology, Ex-
perimental factor Ontology and ICD-11

                                                                              Vocabulary

FAIR vocabulary Feature                                       Gene     Ontol- — Experimental ICD-11
                                                              ogy               Factor Ontol-
                                                                                ogy

FVF-1: Vocabulary and their terms are assigned globally unique Full Compliance    Full Compliance Partial Compli-
and persistent identifiers.                                                                       ance

FVF-2: Vocabularies and their terms have rich metadata.       Full Compliance     No Compliance Full Compliance

FVF-3: Vocabularies and their terms can be accessed using the Full Compliance     Full Compliance Partial Compli-
identifiers, preferably by both human and machine.                                                ance

FVF-4: Vocabularies and their terms are registered or indexed Full Compliance     Full Compliance No Compliance
in a searchable engine or a resource.

FVF-5: Vocabularies and their terms are retrievable using a Full Compliance       Full Compliance Full Compliance
standardised communications protocol, preferably open, free
and universally implementable protocols. and allows for authen-
tication and authorisation, where necessary.

FVF-6: Vocabularies and their terms are persistent over time Partial Compli-      Partial Compli- Partial Compli-
and are appropriately versioned.                             ance                 ance            ance

FVF-7: Vocabularies and their terms use a formal, accessible Full Compliance      Full Compliance No Compliance
and broadly applicable, and preferably machine-understandable
language for knowledge representation.

FVF-8: Vocabularies and terms use qualified references to other Full Compliance   Full Compliance Partial Compli-
vocabularies.                                                                                     ance

FVF-9: Vocabularies and terms are described with a plurality Full Compliance      Full Compliance No Compliance
of accurate and relevant attributes.

FVF-10: Vocabularies are released with a standard data usage Full Compliance      Full Compliance Full Compliance
licence, preferably machine-readable licence.

FVF-11: Vocabularies meet domain relevant community stan- Full Compliance         Full Compliance No Compliance
dards.

FAIR Vocabulary Feature summary

FVF, full compliance                                          90.91%              81.82%         27.27%

FVF, partial compliance                                       9.09%               9.09%          36.36%

FVF, no compliance                                            0.00%               9.09%          36.36%
                                            Features of a FAIR Vocabulary       9

‘Indicator RDA-R1.2-02M: Metadata includes provenance information according
to a cross-community language’. ‘FVF-2: Vocabularies and their terms have rich
metadata’ was not complied with since no general description of the ontology is
provided in the released artefact. Compared with the ontologies, the taxonomy,
ICD-11, fully complies with 18.18% FVFs, and partially complies with 36.36%
FVFs. This is because ICD-11 neither refers to other vocabularies, nor adheres
to other community standards, such as vocabulary formats. ICD-11 was selected
for this evaluation as it already offers significant FAIR improvements over ICD-
10[30], such as providing a standard licence.


7   Discussion

The FAIR Vocabulary Features we propose integrate multiple FAIR vocabu-
lary requirements and can be used as FAIR vocabulary standards to guide the
development and maintenance of vocabularies. Each FVF is associated with in-
dicators enabling its quantifiable, objective assessment. Those indicators can be
connected to related standards and extended to existing or emerging standards
in other domains. For example, FVF-8: cross-referencing other vocabularies can
be linked to the ontology cross-reference standards, MIREOT. We focused on
how FVFs can be applied to ontologies, and demonstrated the potential for using
them across other forms of vocabularies; for example, with the ICD-11, assess-
ment would inform authors on the means to enrich their resources. Because of our
expertise and requirements, this manuscript focuses on the biomedical domain;
however, we anticipate this framework could be reused in other domains.
    Integrating the FAIR vocabulary features with FAIR indicators makes it
possible to assess the FAIR level of vocabularies, identify progressive ontology
development use cases, and improve those vocabularies. We selected the RDA
indicators as it has proven to be useful in many datasets and has been referenced
by other assessment approaches in FAIRassist.org; yet, FVFs could alternatively
be aligned to other FAIR-principle based indicators which would similarly reflect
the guiding principles proposed by Wilkinson et al. Besides manual assessment,
quantifiable formal indicators are also amenable to becoming machine actionable.
Some efforts already exist, and reusing shared indicators will make it possible to
perform automated FAIR vocabulary assessments.
    The indicators were proposed to objectively measure the FAIR level of on-
tologies, yet this score does not reflect an absolute FAIR level for the vocab-
ulary. Indeed, depending on the purpose and requirements of the vocabulary,
some FVFs can be more or less important than other features. For example, for
internal vocabularies which are used and shared within an institution, having
global identifiers (FVF-1) is not a mandatory requirement. Instead of comparing
the FAIR score of different vocabularies to find the ‘FAIRer’ one, we propose
the FAIR score should be used to measure and guide the evolution of FAIR
vocabularies by successively comparing the FAIR levels of iteratively developed
versions. For example, compared to ICD-10, its successor, ICD-11 has incorpo-
10     F.Xu, N.Juty et al.

rated many features to make it FAIRer, such as providing APIs for easier access,
having a machine-readable license, etc.
    From the assessment results of the two ontologies and ICD-11, ontology-based
vocabularies follow stricter semantics and therefore fared better in the scoring of
FAIR features. For example, many ontology-related standards have been estab-
lished, including formats, such as OWL, guidelines such as the OBO principles,
minimum information standards, such as MIBBI[36], and mechanisms for cross-
references or incorporating external ontologies, such as MIREOT. This naturally
reflects in a high score for compliance with community standards, which is a core
part of FAIR Vocabulary Features and which improves the interoperability and
reusability of a vocabulary.
    The FAIR Vocabulary features and assessments provide insights on how to
improve vocabularies. For example, based on the EFO assessments, the FAIR
level of EFO could easily be improved by adding a description of the aim and
function of EFO. This way, different vocabulary management services can har-
vest the information.

Acknowledgements
This work is funded by the IMI-FAIRplus project (Grant number 802750) and
the European Molecular Biology Laboratory - European Bioinformatics Institute
core funds.


8    References
 [1] Creative commons — attribution 4.0 international — CC BY 4.0, https:
     //creativecommons.org/licenses/by/4.0/
 [2] ID policy, http://www.obofoundry.org/id-policy
 [3] The MIT license | open source initiative, https://opensource.org/
     licenses/MIT
 [4] F-UJI automated FAIR data assessment tool (2020), https://www.
     fairsfair.eu/f-uji-automated-fair-data-assessment-tool
 [5] FAIR      data     maturity     model:       specification    and     guide-
     lines       -      draft       (2020),         https://www.rd-alliance.
     org/group/fair-data-maturity-model-wg/outcomes/
     fair-data-maturity-model-specification-and-guidelines
 [6] FAIRassist.org (2021), https://fairassist.org/#!/
 [7] Antoniou, G., van Harmelen, F.: Web ontology language: OWL. pp. 67–
     92. International Handbooks on Information Systems, Springer (2004).
     https://doi.org/10.1007/978-3-540-24750-04
 [8] Arnaud, E., Cooper, L., Shrestha, R., et al.: Towards a reference plant trait
     ontology for modeling knowledge of plant traits and phenotypes (2012),
     http://wrap.warwick.ac.uk/59831/
 [9] Blum, M., Chang, H.Y., Chuguransky, S., et al.: The InterPro pro-
     tein families and domains database: 20 years on 49, D344–d354 (2021).
     https://doi.org/10.1093/nar/gkaa977
                                            Features of a FAIR Vocabulary      11

[10] Bodenreider, O.: The unified medical language system (UMLS):
     integrating     biomedical     terminology     32,      D267–270      (2004).
     https://doi.org/10.1093/nar/gkh061
[11] Burdett, T., Xu, F., Courtot, M., et al.: FAIRplus: D3.2 IMI FAIR metrics
     publication . https://doi.org/10.5281/zenodo.4428633
[12] Consortium, T.G.O.: The gene ontology resource: 20 years and still GOing
     strong 47, D330–d338 (2019). https://doi.org/10.1093/nar/gky1055
[13] Consortium, T.U.: UniProt: the universal protein knowledgebase in 2021
     49, D480–d489 (2021). https://doi.org/10.1093/nar/gkaa1100
[14] Courtot, M., Gibson, F., Lister, A., et al.: MIREOT: the minimum
     information to reference an external ontology term pp. 1–1 (2009).
     https://doi.org/10.1038/npre.2009.3576.1
[15] Cox, S.J.D., Gonzalez-Beltran, A.N., Magagna, B., et al.: Ten sim-
     ple rules for making a vocabulary FAIR 17(6), e1009041 (2021).
     https://doi.org/10.1371/journal.pcbi.1009041
[16] Drysdale, R., Cook, C.E., Petryszak, R., et al.: The ELIXIR core data
     resources: fundamental infrastructure for the life sciences 36(8), 2636–2642
     (2020). https://doi.org/10.1093/bioinformatics/btz959
[17] Foundry, T.O.: The OBO flat file format specification, version 1.2, https:
     //owlcollab.github.io/oboformat/doc/GO.format.obo-1%5F2.html
[18] Foundry, T.O.: OBO foundry principles, overview, http://www.
     obofoundry.org/principles/fp-000-summary.html
[19] Foundry, T.O.: PURL administration, https://purl.prod.archive.org/
[20] Foundry, T.O.: Versioning (principle 4), http://www.obofoundry.org/
     principles/fp-004-versioning.html
[21] Garijo, D., Corcho, O., Poveda-Villalon, M.: FOOPS!: An ontology pit-
     fall scanner for the FAIR principles p. 4 (2021), http://ceur-ws.org/
     Vol-2980/paper321.pdf
[22] Garijo, D., Poveda-Villalón, M.: Best practices for implementing FAIR
     vocabularies and ontologies on the web (2020), http://arxiv.org/abs/
     2003.13084
[23] Hastings, J., Owen, G., Dekker, A., et al.: ChEBI in 2016: Improved ser-
     vices and an expanding collection of metabolites 44, D1214–1219 (2016).
     https://doi.org/10.1093/nar/gkv1031
[24] Hugo, W., Le Franc, Y., Coen, G., et al.: D2.5 FAIR semantics recommen-
     dations second iteration (2020). https://doi.org/10.5281/zenodo.4314321
[25] Jupp, S., Burdett, T., Malone, J., et al.: A new ontology lookup service at
     EMBL-EBI p. 2 (2015), http://ceur-ws.org/Vol-1546/paper%5F29.pdf
[26] Krajewski, P., Chen, D., Ćwiek, H., et al.: Towards recommendations
     for metadata and data handling in plant phenotyping 66(18), 5417–5427
     (2015). https://doi.org/10.1093/jxb/erv271
[27] Malone, J., Holloway, E., Adamusiak, T., et al.: Modeling sample vari-
     ables with an experimental factor ontology 26(8), 1112–1118 (2010).
     https://doi.org/10.1093/bioinformatics/btq099
[28] Mungall, C.J., Torniai, C., Gkoutos, G.V., et al.: Uberon, an
     integrative multi-species anatomy ontology 13(1),               R5 (2012).
     https://doi.org/10.1186/gb-2012-13-1-r5
12     F.Xu, N.Juty et al.

[29] Ochoa, D., Hercules, A., Carmona, M., et al.: Open targets platform: sup-
     porting systematic drug–target identification and prioritisation 49, D1302–
     d1310 (2021). https://doi.org/10.1093/nar/gkaa1027
[30] Organization, W.H.: International classification of diseases for mortality
     and morbidity statistics (10h revision) (2010), https://www.who.int/
     classifications/icd/ICD10Volume2_en_2010.pdf
[31] Organization, W.H.: International classification of diseases for mortality
     and morbidity statistics (11th revision) (2021), https://icd.who.int/
     browse11/l-m/en
[32] Pedruzzi, I., Rivoire, C., Auchincloss, A.H., et al.: HAMAP in 2015: updates
     to the protein family classification and annotation system 43, D1064–d1070
     (2015). https://doi.org/10.1093/nar/gku1002
[33] Sansone, S.A., McQuilton, P., Rocca-Serra, P., et al.: FAIRsharing as a
     community approach to standards, repositories and policies 37(4), 358–367
     (2019). https://doi.org/10.1038/s4158701900808
[34] Smith, B., Ashburner, M., Rosse, C., et al.: The OBO foundry: coordinated
     evolution of ontologies to support biomedical data integration 25(11), 1251–
     1255 (2007). https://doi.org/10.1038/nbt1346
[35] Snomed: SNOMED home page, https://www.snomed.org/
[36] Taylor, C.F., Field, D., Sansone, S.A., et al.: Promoting co-
     herent minimum reporting guidelines for biological and biomed-
     ical investigations: the MIBBI project 26(8), 889–896 (2008).
     https://doi.org/https://doi.org/110.1038/nbt.1411
[37] W3id: w3id.org - permanent identifiers for the web, https://w3id.org/
[38] Whetzel, P.L., Noy, N.F., Shah, N.H., et al.: BioPortal: enhanced function-
     ality via new web services from the national center for biomedical ontology
     to access and use ontologies in software applications 39, W541–w545 (2011).
     https://doi.org/10.1093/nar/gkr469
[39] Wilkinson, M.: The FAIR maturity evaluation service (2021), https://
     fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/
[40] Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., et al.: The FAIR guid-
     ing principles for scientific data management and stewardship 3(1), 160018
     (2016). https://doi.org/10.1038/sdata.2016.18
[41] Wimalaratne, S.M., Juty, N., Kunze, J., et al.: Uniform resolu-
     tion of compact identifiers for biomedical data 5, 180029 (2018).
     https://doi.org/10.1038/sdata.2018.29
Supplementary Table 1: The suitability of OBO principles used as FAIR
Vocabulary Features

OBO-1: The ontology MUST be openly available to be
used by all without any constraint other than (a) its origin
must be acknowledged and (b) it is not to be altered and Suitable as FAIR
subsequently redistributed in altered form under the         Vocabulary Feature?
original name or with the same identifiers.                  No

While it is highly desirable to have open source semantic artefacts, as it is to have open
source code and ultimately open research. This is not a prerequisite for vocabularies to
be FAIR, nor is it well aligned with the FAIR principles which require licensing
information to be provided but do not mandate it be open source.
                                                                  Suitable as FAIR
OBO-2: The ontology is made available in a common                 Vocabulary Feature?
formal language in an accepted concrete syntax.                   Yes

Common formats define minimum standards for accessing an ontology, support using
ontologies in a FAIR capable resource, and are the foundation of interoperable
vocabulary. Many common formal languages have been proposed during the
development history of ontologies. OWL format is currently used as a W3C standard.
                                                                  Suitable as FAIR
OBO-3: Each class and relation (property) in the                  Vocabulary Feature?
ontology must have a unique URI identifier.                       Yes
Identifiability is a core FAIR principle, and in order to be fulfilled, all elements of the
ontology, such as classes and relationships, should be clearly and uniquely identified.
OBO-4: The ontology provider has documented
procedures for versioning the ontology, and different             Suitable as FAIR
versions of ontology are marked, stored, and officially           Vocabulary Feature?
released.                                                         No
Proper versioning improves the findability and reusability of the vocabulary. We
proposed a corresponding FVF to cover this aspect. Yet, this principle evaluates the
development, especially the documentation process, of the ontology rather than FAIR
vocabulary.
OBO-5: The scope of an ontology is the extent of the
domain or subject matter it intends to cover. The                 Suitable as FAIR
ontology must have a clearly specified scope and                  Vocabulary Feature?
content that adheres to that scope.                               Yes
Users must be able to determine which ontologies meet their needs in order to
implement these in FAIR capable resources and to be interoperable with other
vocabularies. A clear specification of this aids in FAIR implementation; vocabularies do
not exist in isolation.
OBO-6: The ontology has textual definitions for the            Suitable as FAIR
majority of its classes and for top level terms in             Vocabulary Feature?
particular                                                     No
While textual definitions provide human-readable content and are generally desirable,
this is not essential as an FVF as ontology content has definitions in terms of logical
axioms (e.g., position in the hierarchy) and term labels.
                                                               Suitable as FAIR
OBO-7: Relations should be reused from the Relations           Vocabulary Feature?
Ontology (RO).                                                 Yes
Relation standards promote interoperability across different ontologies. This principle
focuses on the relationships within and across ontologies. It can be adapted and used
in a broader range of FAIR vocabularies.
OBO-8: The owners of the ontology should strive to
provide as much documentation as possible. The
documentation should detail the different processes            Suitable as FAIR
specific to an ontology life cycle and target various          Vocabulary Feature?
audiences (users or developers).                               Yes
Rich metadata of the vocabulary, such as the purpose and status of the ontology,
promotes the reuse of the ontology.
OBO-9: The ontology developers should document that            Suitable as FAIR
the ontology is used by multiple independent people or         Vocabulary Feature?
organizations.                                                 No
Usage of vocabularies depends on the content and there are strategies for
interoperating ontologies in what is anyway a crowded semantic space. Evidence that
an ontology is highly used - when measurable - assumes a level of maturity that is
unlikely for some starting communities, and would not reflect fairness of the resource.
OBO-10: OBO Foundry ontology development, in
common with many other standards-oriented scientific           Suitable as FAIR
activities, should be carried out in a collaborative           Vocabulary Feature?
fashion.                                                       Yes
Vocabularies should be implementable in FAIR data resources and should reflect
community needs. Responsiveness to community needs comes via collaboration and
development should therefore be collaborative.
OBO-11: There should be a person who is responsible
for communications between the community and the
ontology developers, for communicating with the
Foundry on all Foundry-related matters, for mediating
discussions involving maintenance in the light of              Suitable as FAIR
scientific advance, and for ensuring that all user             Vocabulary Feature?
feedback is addressed.                                         No

Having a designed contact is important for the community to provide feedback and
ensures someone is having an ongoing editorial responsibility for the ontology. While it
does pertain to sustainability of the resource and its possible evolution, it doesn’t
directly speak to its level of fairness.
                                                                Suitable as FAIR
                                                                Vocabulary Feature?
OBO-12: Naming conventions are used                             No
Consistency of naming is a best practice feature of ontologies but does not detract from
deployment in support of the FAIR principles.
                                                                Suitable as FAIR
OBO-16: The ontology needs to reflect changes in                Vocabulary Feature?
scientific consensus to remain accurate over time.              Yes

Vocabularies codify knowledge. To fulfil one of their primary functions, they must
evolve. Dead ontologies fail to support FAIR capable resources to interoperate.
OBO-20: Ontology developers MUST offer channels for             Suitable as FAIR
community participation and SHOULD be responsive to             Vocabulary Feature?
requests.                                                       No
Having communication channels to collect and respond to community requirements
supports the maintenance and evolution of the ontology. However, it is not directly
linked to the FAIRness of the vocabulary.
Supplementary Table 2: FAIR Vocabulary Features mapped to FAIR
principles and FAIR vocabulary requirements
                                                Findability Accessibility Interoperability Reusability
                                                                                          FVF-2
                                                                                          FVF-6
                                                                                          FVF-9
                                                                                          FVF-10
FAIR in terms of application to FAIR data.                               FVF-11           FVF-11
                                                                                          FVF-2
                                                FVF-1      FVF-3                          FVF-6
FAIR in terms of serving as a FAIR data         FVF-4      FVF-5                          FVF-7
resource.                                       FVF-6      FVF-10        FVF-7            FVF-9
FAIR in the context of interacting with other
vocabularies.                                                            FVF-8
Supplementary Materials 3: VersionIRI analysis

We fetched ontologies indexed in the OLS repository and selected those that are
successfully loaded and up-to-date. OLS contains 266 biomedical ontologies by the time we
access the database (https://www.ebi.ac.uk/ols/api/ontologies). We filtered out ontologies
which could not be indexed automatically (without a valid loaded timestamp), and removed
inactive ontologies based on the date information in the versionIRI section. 200 ontologies
are selected based on these criteria.


We recognise the limitations of the ontology selection approaches. The filtering relys on the
metadata collected by OLS instead of the ontology itself, and therefore might not correctly
reflect the ontology status. We filted out some inactive ontologies based on the loading time
(only ontologies with a loading timestamp after 2019-01-01 are choosen) and date
information in the versioneIRI (ontologies with date before 2019-01-01 in the verionIRI are
removed). But these criterias does not ensure all vocabularies selected are up-to-date. For
example, for ontologies using semantic versioning format where no date information is
provided in the versionIRI, or some update information are collected in other metadata fields
such as 'annotation' 'editor comments', etc.

Despite the constraints of the analysis, it still provides enough information to showcase the
status of current vocabularies. A complete list of selected ontologies are provided in the
table below.
Supplementary Table 4: RDA data maturity indicators that are not
mapped to FAIR Vocabulary Features
     ID            Indicator
     RDA-F3-01M Metadata includes the identifier for the data
     RDA-I2-01M Metadata uses FAIR-compliant vocabularies
     RDA-I2-01D    Data uses FAIR-compliant vocabularies
     RDA-I3-01M Metadata includes references to other metadata
     RDA-I3-01D    Data includes references to other data
     RDA-I3-02M Metadata includes references to other data
     RDA-I3-04M Metadata include qualified references to other data
Supplementary Table 5: FAIR assessment results of Gene ontology

RDF FAIR indicators version         v0.05
Project name                        FAIR assessment Gene Ontology
Assessment date                     2021-08-02
Dataset version                     Release 2021-07-02
                                    https://github.com/geneontology/go-ontology and
Dataset link                        http://geneontology.org/


FAIR vocabulary feature summary
FVF, full compliance                              90.91%
FVF, partial compliance                             9.09%
FVF, no compliance                                  0.00%


                                                               As
                                                               ses
                                                               sm
                                                               ent
                           RDA
                                                                - Assess
 FAIR vocabulary          indicat                              RD ment -
     Feature               or ID            Indicator           A  FVF         Assessment details
                                                                             Metadata are provided
                                                                             in
                                                                             http://geneontology.org/
                                                                             docs/ontology-documen
                                                                             tation/. It can also be
                                                                             found in the OBO
                                                                             foundry repository
                                                                             https://github.com/OBO
                                                                             Foundry/OBOFoundry.g
                                                                             ithub.io/edit/master/onto
                                                                             logy/go.md. But they
                          RDA-F1- Metadata is identified by                  are not standard
                          01M     a persistent identifier      1             persistent identifiers.
                                                                             Gene ontology uses
                                                                             PURL identifiers
                          RDA-F1- Data is identified by a                    http://purl.obolibrary.org/
                          01D     persistent identifier        1             obo/go.owl
                                                                             http://geneontology.org/
                                  Metadata is identified by                  docs/ontology-documen
                          RDA-F1- a globally unique                          tation/ is globally unique
                          02M     identifier                   1             identifier.
 FVF-1: Vocabulary and
their terms are assigned                                           Full    http://purl.obolibrary.org/
  globally unique and     RDA-F1- Data is identified by a          Complia obo/go.owl is globally
  persistent identifiers. 02D     globally unique identifier   1   nce     unique identifier.
                                                                          Descriptive text is
                                                                          provided in
                                                                          http://geneontology.org.
                                                                          Rich metadata for
                                                                          indexing and reuse is
                                                                          provided in
                                                                          https://github.com/OBO
 FVF-2: Vocabularies             Rich metadata is                 Full    Foundry/OBOFoundry.g
 and their terms have    RDA-F2- provided to allow                Complia ithub.io/edit/master/onto
   rich metadata.        01M     discovery                    1   nce     logy/go.md.
                                                                           The metadata includes
                                 Metadata contains                         data download links
                                 information to enable                     http://geneontology.org/
                         RDA-A1- the user to get access                    docs/download-ontology
                         01M     to the data                  1            /.
                                 Metadata can be
                                 accessed manually (i.e.                   The metadata can be
                         RDA-A1- with human                                accessed from the gene
                         02M     intervention)                1            ontology website.
                                                                           Data can be
                                                                           downloaded from
                                 Data can be accessed                      http://geneontology.org/
                         RDA-A1- manually (i.e. with                       docs/download-ontology
                         02D     human intervention)          1            /
                                                                           http://geneontology.org/
                                 Metadata identifier                       docs/ontology-documen
                         RDA-A1- resolves to a metadata                    tation// is resolvable and
                         03M     record                       1            directs to the metadata.
                                                                           The vocabulary
                                                                           identifier
                                                                           http://purl.obolibrary.org/
                                                                           obo/go.owl resolves to
                                                                           the ontology source
                                                                           files. Identifiers such as
                                                                           http://purl.obolibrary.org/
                                                                           obo/GO_0098743
                           RDA-A1- Data identifier resolves                resolves to ontology
  FVF-3: Vocabularies
                           03D     to a digital object        1            terms.
 and their terms can be
  accessed using the                                                      Data can be
identifiers, preferably by         Data can be accessed           Full    downloaded using
    both human and         RDA-A1- automatically (i.e. by a       Complia command line tools,
         machine.          05D     computer program)          1   nce     such as curl, wget, etc.
                                                                          Gene ontology has
                                                                          been indexed by EMBL
  FVF-4: Vocabularies                                                     OLS, BioPortal and
   and their terms are         Metadata is offered in                     other semantic
registered or indexed in       such a way that it can             Full    repositories. Also it is
a searchable engine or RDA-F4- be harvested and                   Complia indexed in Google
       a resource.       01M   indexed                        1   nce     search.
                                Metadata is accessed                      The metadata can be
                        RDA-A1- through standardised                      accessed through the
                        04M     protocol                    1             HTTP protocol.
                                Data is accessible
                        RDA-A1- through standardised                      Data can be accessed
                        04D     protocol                    1             through HTTP protocol.
  FVF-5: Vocabularies             Metadata is accessible
   and their terms are    RDA-A1. through a free access                   HTTP is a free access
   retrievable using a    1-01M   protocol                  1             protocol.
      standardised
                                  Data is accessible
    communications
                          RDA-A1. through a free access                   HTTP is a free access
  protocol, preferably
                          1-01D   protocol                  1             protocol.
     open, free and
        universally                                                    HTTP HTTP allows
     implementable                Data is accessible                   access control. But
protocols. and allows for         through an access                    authentication and
   authentication and             protocol that supports       Full    authorisation are not
  authorisation, where RDA-A1. authentication and              Complia required by Gene
        necessary.        2-01D   authorisation             NA nce     Ontology.
                                Metadata is guaranteed                    Metadata and data can
                                to remain available after                 be found in version
                        RDA-A2- data is no longer                         controlled repositories
                        01M     available                   1             on Github.
                                                                          The metadata includes
                                                                          links to access different
                                Metadata includes                         snapshots of the
                                provenance information                    ontology. The
                                according to                              snapshots are in
                        RDA-R1. community-specific                        owl/obo format and has
                        2-01M   standards                   1             PURL identifiers.
 FVF-6: Vocabularies             Metadata includes
  and their terms are            provenance information
persistent over time and         according to a                 Partial
   are appropriately     RDA-R1. cross-community                Complia
       versioned.        2-02M   language                   0   nce
                                                                          The metadata is
                                                                          provided in a standard
                                Metadata uses                             format and can be
                                knowledge                                 harvested by major
                                representation                            vocabulary serivices,
                        RDA-I1- expressed in                              such as OLS and
 FVF-7: Vocabularies    01M     standardised format         1             BioPortal.
 and their terms use a
                                  Data uses knowledge
formal, accessible and
                                  representation
broadly applicable, and
                        RDA-I1-   expressed in                            The data uses OWL
       preferably
                        01D       standardised format       1             and OBO standards.
machine-understandabl
    e language for                Metadata uses                 Full
      knowledge         RDA-I1-   machine-understandabl         Complia Basic metadata is
    representation.     02M       e knowledge               1   nce     provided in OWL.
                                 representation
                                                                         The GO data uses OWL
                                Data uses                                and OBO formats,
                                machine-understandabl                    which are
                        RDA-I1- e knowledge                              machine-readable
                        02D     representation             1             community formats.
                                                                         The Gene ontology
                                                                         cross reference policy is
                                                                         here:
                                                                         http://geneontology.org/
                                                                         docs/download-mappin
                                                                         gs/ Data from other
                        RDA-I3- Data includes qualified                  vocabularies are
                        02D     references to other data   1             provided as 'xref'/
                                                                       The Gene ontology
                                                                       cross reference policy is
 FVF-8: Vocabularies                                                   provided here:
and terms use qualified         Metadata includes              Full    http://geneontology.org/
  references to other   RDA-I3- qualified references to        Complia docs/download-mappin
     vocabularies.      03M     other metadata             1   nce     gs/
FVF-9: Vocabularies
and terms are
                                                                       Gene Ontology includes
described with a                                                       sufficient term
plurality of accurate          Plurality of accurate and       Full    attributes.
and relevant            RDA-R1 relevant attributes are         Complia http://geneontology.org/
attributes.             -01M   provided to allow reuse     1   nce     docs/GO-term-elements
                                                                         Gene Ontology
                                                                         Consortium data and
                                                                         data products are
                                Metadata includes                        licensed under the
                                information about the                    Creative Commons
                        RDA-R1. licence under which the                  Attribution 4.0 Unported
                        1-01M   data can be reused         1             License.
 FVF-10: Vocabularies
                        RDA-R1. Metadata refers to a
  are released with a
                        1-02M   standard reuse licence     1
 standard data usage
  licence, preferably           Metadata refers to a           Full
  machine-readable      RDA-R1. machine-understandabl          Complia
        licence.        1-03M   e reuse licence            1   nce
                        RDA-R1. Metadata complies with
                        3-01M   a community standard       1
                        RDA-R1. Data complies with a
                        3-01D   community standard         1
                                Metadata is expressed
                                in compliance with a
                        RDA-R1. machine-understandabl
                        3-02M   e community standard       1
 FVF-11: Vocabularies                                          Full
 meet domain relevant                                          Complia
 community standards.                                          nce
        Data is expressed in
        compliance with a
RDA-R1. machine-understandabl
3-02D   e community standard    1
Supplementary Table 6: FAIR assessment results of Experimental Factor
Ontology

RDF FAIR indicators version               v0.05
Project name                              EFO assessment
Assessment date                           2021-08-02
Dataset version                           3.32.0

Dataset link                              http://www.ebi.ac.uk/efo/releases/v3.32.0/efo.owl


FAIR vocabulary feature summary
FVF, full compliance                      81.82%
FVF, partial compliance                   9.09%
FVF, no compliance                        9.09%


                                                            As
                                                            ses
                                                            sm
                                                            ent
                          RDA                               -   Assess
FAIR vocabulary           indicato                          RD ment -
Feature                   r ID     Indicator                A   FVF*   Assessment details
                                                                           Both the data and
                                                                           metadata use identifier:
                          RDA-F1- Metadata is identified by                http://www.ebi.ac.uk/efo
                          01M     a persistent identifier   1              /efo.owl
                          RDA-F1- Data is identified by a
                          01D     persistent identifier     1
                                  Metadata is identified by
                          RDA-F1- a globally unique
FVF-1: Vocabulary and
                          02M     identifier                1
their terms are assigned                                         Full
globally unique and      RDA-F1- Data is identified by a         Complia
persistent identifiers.  02D     globally unique identifier 1    nce
                                                                         Description of EFO has
                                                                         been provided in
                                                                         ontology browsers, such
                                                                         as OLS and BioPortal.
                                                                         However, the
FVF-2: Vocabularies               Rich metadata is               No      descriptions are not
and their terms have      RDA-F2- provided to allow              Complia included in the EFO
rich metadata.            01M     discovery                 0    nce     source file.
FVF-3: Vocabularies               Metadata contains              Full    EFO and its terms has
and their terms can be    RDA-A1- information to enable          Complia unique identifiers and
accessed using the        01M     the user to get access    1    nce     can be accessed.
identifiers, preferably by            to the data
both human and
                                     Metadata can be
machine.
                                     accessed manually (i.e.
                             RDA-A1- with human
                             02M     intervention)           1
                                     Data can be accessed
                             RDA-A1- manually (i.e. with
                             02D     human intervention)        1
                                     Metadata identifier
                             RDA-A1- resolves to a metadata
                             03M     record                     1
                             RDA-A1- Data identifier resolves
                             03D     to a digital object        1
                                     Data can be accessed
                             RDA-A1- automatically (i.e. by a
                             05D     computer program)          1
FVF-4: Vocabularies                                                         The metadata is
and their terms are            Metadata is offered in                       provided in OWL format
registered or indexed in       such a way that it can               Full    and has been harvested
a searchable engine or RDA-F4- be harvested and                     Complia by both OLS and
a resource.              01M   indexed                          1   nce     BioPortal
                                                                              EFO can be accessed
                                     Metadata is accessed                     using HTTP protocol,
                             RDA-A1- through standardised                     and it is an open-acess
                             04M     protocol                   1             ontology.
                                     Data is accessible
                             RDA-A1- through standardised
FVF-5: Vocabularies          04D     protocol                   1
and their terms are               Metadata is accessible
retrievable using a       RDA-A1. through a free access
standardised              1-01M   protocol                      1
communications
                                  Data is accessible
protocol, preferably
                          RDA-A1. through a free access
open, free and
                          1-01D   protocol                      1
universally
implementable                     Data is accessible
protocols. and allows for         through an access
authentication and                protocol that supports           Full
authorisation, where      RDA-A1. authentication and               Complia
necessary.                2-01D   authorisation                 NA nce
                                                                            EFO follows vocabulary
                                                                            release guidelines, and
                                                                            its versioned copies can
FVF-6: Vocabularies                                                         be found on Github. But
and their terms are              Metadata is guaranteed                     it doesn't strictly follows
persistent over time and         to remain available after          Partial cross community
are appropriately        RDA-A2- data is no longer                  Complia language standards,
versioned.               01M     available                 1        nce     such as rdfs, xmls
                                                                         standards.
                                  Metadata includes
                                  provenance information
                                  according to
                          RDA-R1. community-specific
                          2-01M   standards              1
                                  Metadata includes
                                  provenance information
                                  according to a
                          RDA-R1. cross-community
                          2-02M   language               0
                                                                         EFO can be
                                  Metadata uses                          downloaded in OWL
                                  knowledge                              and OBO, which are
                                  representation                         standardised format and
                          RDA-I1- expressed in                           machine-understandabl
                          01M     standardised format     1              e.
                                  Data uses knowledge
                                  representation
                          RDA-I1- expressed in
                          01D     standardised format     1
FVF-7: Vocabularies
                                  Metadata uses
and their terms use a
                                  machine-understandabl
formal, accessible and
                          RDA-I1- e knowledge
broadly applicable, and
                          02M     representation        1
preferably
machine-understandabl             Data uses
e language for                    machine-understandabl        Full
knowledge                 RDA-I1- e knowledge                  Complia
representation.           02D     representation        1      nce
                                                                         EFO reuses terms from
                                                                         other vocabularies and
                                                                         provides suffient
                                                                         reference, such as
                          RDA-I3- Data includes qualified                source of the external
                          02D     references to other data 1             term.
FVF-8: Vocabularies
and terms use qualified         Metadata includes              Full
references to other     RDA-I3- qualified references to        Complia
vocabularies.           03M     other metadata            1    nce
FVF-9: Vocabularies
and terms are described
with a plurality of            Plurality of accurate and       Full
accurate and relevant   RDA-R1 relevant attributes are         Complia
attributes.             -01M   provided to allow reuse 1       nce
FVF-10: Vocabularies
are released with a               Metadata includes
standard data usage               information about the        Full
licence, preferably       RDA-R1. licence under which the      Complia
machine-readable          1-01M   data can be reused      1    nce
licence.               RDA-R1. Metadata refers to a
                       1-02M   standard reuse licence   1
                               Metadata refers to a
                       RDA-R1. machine-understandabl
                       1-03M   e reuse licence       1
                                                                      EFO uses the standard
                                                                      OWL format, complies
                                                                      with OBO principles and
                       RDA-R1. Metadata complies with                 imports terms following
                       3-01M   a community standard 1                 the MIREOT standards.
                       RDA-R1. Data complies with a
                       3-01D   community standard       1
                               Metadata is expressed
                               in compliance with a
                       RDA-R1. machine-understandabl
                       3-02M   e community standard 1
                               Data is expressed in
FVF-11: Vocabularies           compliance with a            Full
meet domain relevant   RDA-R1. machine-understandabl        Complia
community standards.   3-02D   e community standard 1       nce
Supplementary Table 7: FAIR assessment results of ICD-11

Project name                              ICD-11 FAIR assessment
Assessment date                           2021-08-02
Dataset version                           05/2021
                                          ICD-11 browser and ICD11 print
                                          version:https://icd.who.int/en print
                                          version:https://icd.who.int/browse11/Downloads/Download?
Dataset link                              fileName=print_en.zip


FAIR vocabulary feature summary
FVF, full compliance                                            27.27%
FVF, partial compliance                                         36.36%
FVF, no compliance                                              36.36%


                                                            As
                                                            ses
                                                            sm
                                                            ent
                          RDA                               -   Assess
FAIR vocabulary           indicato                          RD ment -
Feature                   r ID     Indicator                A   FVF    Assessment details
                          RDA-F1- Metadata is identified by
                          01M     a persistent identifier   0              No metadata identifier.
                                                                           Example data identifier:
                                                                           2C25.1 Small cell
                          RDA-F1- Data is identified by a                  carcinoma of bronchus
                          01D     persistent identifier     1              or lung.
                                  Metadata is identified by
                          RDA-F1- a globally unique
                          02M     identifier                0
                                                                         The identifiers in ICD-11
                                                                         has been through
                                                                         several iterations.
                                                                         Currently, ICD-11
                                                                         provides identifiers such
                                                                         as (1C60-1C62.Z), a
                                                                         more persistent
                                                                         identifier system,
                                                                         http://id.who.int/icd/entit
                                                                         y/911707612 is still
FVF-1: Vocabulary and                                                    under development.
their terms are assigned                                         Partial This assessment is
globally unique and      RDA-F1- Data is identified by a         Complia based on the
persistent identifiers.  02D     globally unique identifier 0    nce     1C60-1C62.Z system
FVF-2: Vocabularies              Rich metadata is                 Full
and their terms have     RDA-F2- provided to allow                Complia
rich metadata.           01M     discovery                    1   nce
                                 Metadata contains
                                 information to enable
                         RDA-A1- the user to get access
                         01M     to the data                  1
                                 Metadata can be
                                 accessed manually (i.e.
                         RDA-A1- with human
                         02M     intervention)           1
                                 Data can be accessed
                         RDA-A1- manually (i.e. with
                         02D     human intervention)          1
                                 Metadata identifier
                         RDA-A1- resolves to a metadata
                         03M     record                       0
FVF-3: Vocabularies
                           RDA-A1- Data identifier resolves
and their terms can be                                                    ICD-11 has provided
                           03D     to a digital object        1
accessed using the                                                        API, web browser, and
identifiers, preferably by         Data can be accessed           Partial pdf documents for
both human and             RDA-A1- automatically (i.e. by a       Complia human and machine
machine.                   05D     computer program)          1   nce     access.
FVF-4: Vocabularies                                                       ICD-11 metadata is
and their terms are            Metadata is offered in                     provided in word
registered or indexed in       such a way that it can             No      documents and can not
a searchable engine or RDA-F4- be harvested and                   Complia be directly indexed by
a resource.              01M   indexed                        0   nce     vocabulary databases.
                                 Metadata is accessed
                         RDA-A1- through standardised
                         04M     protocol                     1
                                                                            ICD-11 uses HTTPS
                                                                            protocol.https://icd.who.i
                                                                            nt/browse11/l-m/en#http
                                 Data is accessible                         %3a%2f%2fid.who.int%
                         RDA-A1- through standardised                       2ficd%2fentity%2f91170
FVF-5: Vocabularies      04D     protocol                     1             7612
and their terms are               Metadata is accessible
retrievable using a       RDA-A1. through a free access
standardised              1-01M   protocol                    1
communications
                                  Data is accessible
protocol, preferably
                          RDA-A1. through a free access
open, free and
                          1-01D   protocol                    1
universally
implementable                     Data is accessible
protocols. and allows for         through an access
authentication and                protocol that supports         Full
authorisation, where      RDA-A1. authentication and             Complia
necessary.                2-01D   authorisation               NA nce
                                                                          Previous versions of
                                                                          ICD-11 can access at
                                                                          https://icd.who.int/brows
                                                                          e11/l-m/en/releases.
                                  Metadata is guaranteed                  However, the versioning
                                  to remain available after               style does not follow
                          RDA-A2- data is no longer                       common community
                          01M     available                 1             standards.
                                  Metadata includes
                                  provenance information
                                  according to
                          RDA-R1. community-specific
                          2-01M   standards              1
FVF-6: Vocabularies              Metadata includes
and their terms are              provenance information
persistent over time and         according to a                 Partial
are appropriately        RDA-R1. cross-community                Complia
versioned.               2-02M   language               0       nce
                                  Metadata uses                           ICD-11 is published
                                  knowledge                               mainly as a pdf
                                  representation                          document and doesn't
                          RDA-I1- expressed in                            use standard
                          01M     standardised format      0              vocabulary formats.
                                  Data uses knowledge
                                  representation
                          RDA-I1- expressed in
                          01D     standardised format      0
FVF-7: Vocabularies
                                  Metadata uses
and their terms use a
                                  machine-understandabl
formal, accessible and
                          RDA-I1- e knowledge
broadly applicable, and
                          02M     representation        0
preferably
machine-understandabl             Data uses
e language for                    machine-understandabl         No
knowledge                 RDA-I1- e knowledge                   Complia
representation.           02D     representation        0       nce
                          RDA-I3- Data includes qualified                 Terms in ICD-11 doesn't
                          02D     references to other data 0              refer to other terms.
FVF-8: Vocabularies
and terms use qualified         Metadata includes               Partial The ICD-11 description
references to other     RDA-I3- qualified references to         Complia refers to other projects
vocabularies.           03M     other metadata             1    nce     and publications.
FVF-9: Vocabularies
and terms are described
with a plurality of            Plurality of accurate and        No      ICD-11 contains only a
accurate and relevant   RDA-R1 relevant attributes are          Complia minimum description of
attributes.             -01M   provided to allow reuse 0        nce     each disease.
                                                                      ICD11 provides
                                                                      licensing
                                                                      documentation.
                                                                      https://icd.who.int/en/do
                                                                      cs/ICD11-license.pdf
                                                                      https://icd.who.int/brows
                                                                      e11. Licensed under
                               Metadata includes                      Creative Commons
                               information about the                  Attribution-NoDerivative
                       RDA-R1. licence under which the                s 3.0 IGO licence (CC
                       1-01M   data can be reused      1              BY-ND 3.0 IGO).
FVF-10: Vocabularies
                       RDA-R1. Metadata refers to a
are released with a
                       1-02M   standard reuse licence   1
standard data usage
licence, preferably            Metadata refers to a         Full
machine-readable       RDA-R1. machine-understandabl        Complia
licence.               1-03M   e reuse licence       1      nce
                       RDA-R1. Metadata complies with
                       3-01M   a community standard 0
                       RDA-R1. Data complies with a
                       3-01D   community standard       0
                               Metadata is expressed
                               in compliance with a
                       RDA-R1. machine-understandabl
                       3-02M   e community standard 0
                               Data is expressed in
FVF-11: Vocabularies           compliance with a            No
meet domain relevant   RDA-R1. machine-understandabl        Complia
community standards.   3-02D   e community standard 0       nce