Features of a FAIR Vocabulary Fuqi Xu1,∗[0000−0002−5923−3859] , Nick Juty2,*[0000−0002−2036−8350] , Carole Goble2[0000−0003−1219−2137] , Simon Jupp3[0000−0002−0643−3144] , Helen Parkinson1[0000−0003−3035−4195] , and Mélanie Courtot1,†[0000−0002−9551−6370] 1 European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom mcourtot@gmail.com 2 University of Manchester, , Manchester M13 9PL, United Kingdom 3 SciBite, BioData Innovation Centre, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, United Kingdom Abstract. The FAIR Principles explicitly require the use of FAIR vo- cabularies, but what precisely constitutes a FAIR vocabulary remains unclear. Here we provide definitions for FAIR vocabularies, examine the application of the FAIR Principles to vocabularies, align their re- quirements with the Open Biomedical Ontologies (OBO) Principles, and propose FAIR Vocabulary Features (FVFs). We also design assessment approaches for FAIR vocabularies by mapping the FVFs with existing FAIR assessment indicators. Finally, we demonstrate how FVFs can be used for evaluating and improving vocabularies using exemplar biomed- ical vocabularies. Keywords: FAIR principles · Vocabulary · Ontology · Assessment 1 Introduction The Findable, Accessible, Interoperable and Reusable (FAIR) Principles [40] have gained traction in the biomedical community since their publication in 2016, with many groups attempting to improve their data quality, develop FAIR capable data resources, and design generic FAIR assessment tools for biomedical data[6] [16] [39]. Due to the heterogeneous nature and broad scope of biomedical data, from molecules to human studies via interdisciplinary analysis, stringent requirements for its FAIRness need to be met to ensure its usefulness toward benefiting human health. While assessing the FAIR level of datasets and data resources [11], we noted a futile cycle with respect to the ‘Interoperable’ FAIR Principle, ”I2 - (Meta)data use vocabularies that follow FAIR principles”. To comply with that principle, datasets need to use FAIR vocabularies, which them- selves need to be FAIR. FAIR vocabularies promote the exchange of biomedical ∗ These authors contributed equally to this paper. † To whom correspondence should be addressed. Copyright © 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 F.Xu, N.Juty et al. data, which are usually generated, annotated and used by different groups of re- searchers. FAIR vocabularies promote biomedical data FAIRness throughout the data life cycle, during the data generation, curation, and distribution processes, and support data exchange and integration across data resources. Multiple efforts have been made to develop standards for FAIR vocabular- ies. The FAIRsFAIR recommendations provide guidance[24] on FAIR seman- tic artefacts, as well as supporting vocabulary search engines and repositories. Garijo and Poveda-Villalon[22] discussed detailed requirements of ontology URI and versioning strategies, as well as the formatting of the ontologies. Ten sim- ple rules[15] for converting print-based or other forms of legacy vocabularies to FAIR vocabularies have also been proposed. Researchers have also developed approaches to assess the FAIR level of digital objects both manually and automatically; FAIRsharing hosts FAIR indicators for automated tests in their FAIR Maturity Evaluation Service[39], FAIR met- rics in the F-UJI Automated FAIR Data Assessment Tool[4], and the Research Data Alliance(RDA) Data Maturity Model Specification and Guidelines[5], also known as the RDA indicators. Among them, the RDA indicators are a set of representative and descriptive indicators to evaluate the FAIR level of data and have been used in many projects and with many types of data. Some automated assessments of the FAIRness of vocabulary have been developed to measure the FAIR level of public, machine-readable vocabularies, such as FOOPS![21] To the best of our knowledge, there have not yet been quantifiable FAIR assess- ment approaches developed to measure the FAIR level of different formats of vocabularies objectively. Therefore, in this paper, we distinguish the concepts of FAIR data, FAIR data resources, and FAIR vocabularies, and propose a set of general FAIR Vocabulary Features (FVFs) as a set of satisfiable features for vocabularies. We also adapted the RDA indicators to measure the FAIR level of vocabularies. Further, we provide example assessments based on selected ontologies available from the EMBL-EBI Ontology Lookup Service (OLS)[25] and other vocabulary resources. 2 FAIR Data and FAIR Vocabulary In the execution of this work, we note the distinction between FAIR data and FAIR capable data resources. In our analysis, data can be FAIR, to a greater or lesser extent, and data resources and data vocabularies are capable of supporting FAIRness (FAIR capable) at different levels. Data vocabularies are designed to support FAIR data, and they can also be considered as FAIR data resources. The orthogonality of these concepts is an important context for this work when determining the features of a FAIR vocabulary. We must also determine whether a vocabulary itself is 1) FAIR in terms of its application to FAIR data 2) FAIR in the context of FAIR capable resources 3) FAIR in the context of other vocabu- laries. A FAIR vocabulary has a set of FAIR features and have a list of associated FAIR indicators. It is usable for annotation, analysis and presentation of data, and is deployable in the context of a FAIR capable data resource or tools. It also Features of a FAIR Vocabulary 3 serves ’aggregation’ use cases where data originates from different domain, and enables data interoperability where different vocabularies are used. Vocabularies come in different forms, such as lists, thesaurus, taxonomies, and ontologies; each at different levels of semantic maturity and FAIR require- ments. The International Classification of Diseases (ICD-11)[31] is a large tax- onomy of disease and is the global standard for diagnostic information, disease definitions and synonyms. The Gene Ontology (GO)[12] is a well established and highly regarded and utilised biomedical resource. It contains over 43000 terms and has been cross referenced in other classification systems, such as UniProt[13], HAMAP[32], and InterPro[9]. GO is also a reference OBO Foundry ontology[34] and has been reused in many other resources. The Experimental Factor Ontology (EFO)[27], on the other hand, is an application ontology built for communities like the Open Targets[29] for describing experimental variables. 3 Existing Vocabulary Standards In determining features of FAIR vocabularies, we considered previous standard- isation work by the Open Biomedical Ontology (OBO) community to determine whether the OBO Principles[18] addressed elements of vocabulary FAIRness. The OBO Principles aim to coordinate the development of biomedical ontolo- gies, which focus specifically on ontologies, covering both the development of ontologies and ontology themselves. Despite the OBO principles predating the FAIR principles, a comparison of the two aided us in defining FVFs. Table 1 summarises the key points of the OBO Foundry principles, and assesses their suitability as FVFs. We also noted that not all the OBO principles possess the same level of maturity or granularity, and therefore some were unmappable and excluded from the comparison. As a result, this analysis did not include OBO Principle 1, 4, 6, 9 - 12 and 20. The rationale for suitability as FVFs is discussed in detail below and further in Supplemental Table 1. 4 FAIR Vocabulary Features Based on the analysis of OBO foundry practices and our previous experience working with and developing ontologies, we propose eleven features for FAIR vocabulary in Table 2, covering requirements for identifiers, access protocols, knowledge representation, etc. Supplementary Table 2 shows the relationship among FAIR Vocabulary Features, the FAIR principles and requirements for FAIR vocabularies. Table 2also provides examples for each FAIR feature, but does not exhaus- tively cover all current practices across the various vocabularies; each feature is represented in different formats and at varying FAIRness levels amongst those vocabularies. For example, for FVF-6: versioning and persistent vocabularies, of all ontologies indexed and updated in OLS, 59.3%§ of vocabularies use a date for- mat of ”yyyy-mm-dd” in the ”versionIRI”, such as ”http://purl.obolibrary.org/ § See details in Supplementary Material 3 4 F.Xu, N.Juty et al. Table 1. An analysis of the OBO Foundry principles as putative FAIR Vocabulary Features ID OBO Principle Summary Suitable as FAIR Vocab- ulary Fea- ture? Principle 1: Open The ontology MUST be openly available to be used by No all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and sub- sequently redistributed in altered form under the original name or with the same identifiers. Principle 2: Com- The ontology is made available in a common formal lan- Yes mon Format guage in an accepted concrete syntax. Principle 3: Each class and relation (property) in the ontology must Yes URI/Identifier have a unique URI identifier. Space Principle 4: Ver- The ontology provider has documented procedures for ver- No sioning sioning the ontology, and different versions of ontology are marked, stored, and officially released. Principle 5: Scope The scope of an ontology is the extent of the domain or Yes subject matter it intends to cover. The ontology must have a clearly specified scope and content that adheres to that scope. Principle 6: Tex- The ontology has textual definitions for the majority of No tual Definitions its classes and for top level terms in particular Principle 7: Rela- Relations should be reused from the Relations Ontology Yes tions (RO). Principle 8: Docu- The owners of the ontology should strive to provide Yes mentation as much documentation as possible. The documentation should detail the different processes specific to an ontology life cycle and target various audiences (users or develop- ers). Principle 9: Docu- The ontology developers should document that the ontol- No mented Plurality ogy is used by multiple independent people or organiza- of Users tions. Principle 10: OBO Foundry ontology development, in common with Yes Commitment to many other standards-oriented scientific activities, should Collaboration be carried out in a collaborative fashion. Principle 11: Lo- There should be a person who is responsible for communi- No cus of Authority cations between the community and the ontology develop- ers, for communicating with the Foundry on all Foundry- related matters, for mediating discussions involving main- tenance in the light of scientific advance, and for ensuring that all user feedback is addressed. Principe 12: Nam- Naming conventions are used No ing Conventions Principle 16: The ontology needs to reflect changes in scientific consen- Yes Maintenance sus to remain accurate over time. Principle 20: Re- Ontology developers MUST offer channels for community No sponsiveness participation and SHOULD be responsive to requests. ID Features Description Examples FVF- Vocabulary and constituted terms are Vocabulary itself and its constituent terms should have Examples of globally unique and persistent identifiers are PURL[19], identifiers.org[41], 1 assigned globally unique and persistent identifiers that are globally unique and persistent to en- and w3id.org[37]. The OBO foundry provides identifier policy[2] for biomedi- identifiers. sure that each item can be identified unambiguously over cal ontologies and requires using PURLs as with standard prefixes, such as time. http://purl.obolibrary.org/obo/GO 0000022. FVF- Vocabulary and constituted terms have Vocabulary itself and its constituent terms should have Metadata of the vocabulary should provide information about the creation date, creator and 2 rich metadata. sufficient metadata to support discovery by both humans editor, version, licence, target domain and short descriptions. Metadata of its terms should and machines. describe term editing history, definition source, and other metadata. FVF- Vocabulary and constituted terms can The URIs of vocabulary itself and its constituent terms http://www.ebi.ac.uk/efo/EFO 0000311 resolves to term “Cancer” in the Experimental Factor 3 be accessed using the identifiers, can be dereferenced by both humans and machines. Ontology, which can be accessed by both humans using ontology browsers and machines through preferably by both humans and ma- the OLS API. chines. FVF- Vocabulary and constituted terms are Vocabulary itself and its constituent terms are registered EMBL-EBI Ontology Lookup Service and NCBI BioPortal[38]are two popular public vocabulary 4 registered or indexed in a searchable in vocabulary archives or other vocabulary management archives. Property X-Robots-Tag:index in vocabularies allows them to be indexed by search engines. engine or a resource. systems and indexed by local or/and global search en- gines. FVF- Vocabulary and constituted terms are Vocabulary itself and its constituent terms are retrievable Most public ontologies can be accessed using the HTTP or HTTPS protocol. For example, EFO 5 retrievable using a standardised com- using a standardised communications protocol, preferably uses HTTP protocol. The Unified Medical Language System[10] uses HTTPS protocol and only munications protocol, preferably open, open, free and universally implementable protocols, such allows access by authenticated users. free and universally implementable as HTTPS, HTTP, FTP. The protocol should also allow protocols, which allows for authenti- identifying the accessor and grant access based on the cation and authorisation, where neces- accessor privilege, when necessary. sary. FVF- Vocabulary and constituted terms are Changes in the vocabulary are reflected in different ver- Changes in EFO are included in each release and identified with versioned IRI, such as, 6 persistent over time and are appropri- sions. Vocabularies and their terms are versioned, and http://www.ebi.ac.uk/efo/releases/v3.31.0/efo.owl, which resolves to the versioned vocabulary. ately versioned. each unaltered version of the vocabulary can be identi- OBO foundry also provides guidelines[20] for ontology versioning and how different versions of fied and retrieved in perpetuity. Vocabulary metadata is the vocabularies should be labelled, stored and published. available even when the vocabulary is no longer available. FVF- Vocabulary and constituted terms Vocabulary itself and its constituent terms use a for- OWL-based vocabularies can be serialised using RDF-XML, or relational databases e.g. 7 use a formal, accessible and broadly mal, accessible and broadly applicable, and preferably ChEBI[23] can be converted into OWL[7] applicable, and preferably machine- machine-understandable language for knowledge repre- understandable language for knowl- sentation. edge representation. FVF- Vocabulary and constituted terms use Vocabulary reuse terms from other vocabularies when EFO reuses human anatomy terms such as “liver” from UBERON[28] (UBERON 0002107) and 8 qualified references to other vocabular- applicable, provide adequate metadata about external linked to the original UBERON term. Property Xref indicates a cross-reference relationship ies. terms, and follow vocabulary cross-reference standards. between two vocabulary terms. MIREOT[14] defines a methodology and minimum information Table 2. FAIR Vocabulary Feature details requirements for importing external terms into an extant ontology. FVF- Vocabulary and constituted terms are Vocabulary terms include sufficient attributes, such as la- The OBO flat-file format specification[17] provides a list of recommended mandatory and op- 9 described with a plurality of accurate bels, synonyms, definitions, examples of usage, and cross- tional attributes. Each vocabulary term must have an ID and a name. The recommended at- and relevant attributes. references, to support the interpretation and reuse of the tributes include definition, synonym, Xref, relationship, and etc. vocabulary terms. FVF- Vocabularies are released with a stan- The vocabulary includes information about how the vo- Common public data usage licences are CC-BY[1] and MIT[3]. For example, Gene Ontology 10 dard data usage licence, preferably a cabulary can be reused. uses Creative Commons Attribution 4.0 Unported License. SNOMEDTM [35]uses a self-defined machine-readable licence. Features of a FAIR Vocabulary SNOMED CTTM affiliate license agreement. FVF- Vocabularies meet domain-relevant Vocabularies cover essential terms for the specific domain, Community standards, such as minimum information requirements and data models can be 11 community standards. reflect knowledge of this domain and can be used in ex- found in FAIRsharing[33]. The Plant Phenotyping Experiment Ontology (PPEO)[8] implements isting data standards and data models. the Minimum Information about Plant Phenotyping Experiment(MIAPPE)[26]standards and covers essential attributes to describe a MIAPPE-compliant phenotype dataset. 5 6 F.Xu, N.Juty et al. -obo/scdo/releases/2021-04-15/scdo.owl”. 2.51% of vocabularies use semantic versioning (x.x.x) such as ”http://www.ebi.ac.uk/efo/releases/v3.34.0/efo.owl” or other forms of numeric versioning, such as http://www.orpha.net/version3.2. 31.66% of vocabularies do not provide valid machine-readable versioned IRIs. For FVF-1: identifiers, 74% of vocabularies use OBO-format PURLs, identifier.org, w3id.org identifiers, as well as other domain-specific identifiers. For FVF-5: ac- cessible using standard protocols, of all 199 selected ontologies, only one ontology used the HTTPS protocol; the rest use HTTP protocols. 5 FAIR Vocabulary Feature Indicators FAIR vocabulary Features outline general characteristics of a FAIR vocabulary, however, those features need to be objectively quantified to be useful in vocab- ulary selection, development and assessment. Hence, we propose aligning FVFs with FAIR indicators to enable computation of a discrete FAIR score, with the aim to offer an objective quantitative evaluation of vocabularies and to guide subsequent improvements. We mapped the RDA indicators to FAIR Vocabulary Features, filtered out indicators that do not apply to vocabularies (see details in Supplementary Table 4, specified the digital object which the indicator refers to, and identified within each indicator the relevant standards used in corresponding domains. It is worth noting that when mapping the RDA indicators on datasets, metadata refers to the metadata to which the vocabulary can be applied, while in the context of vocabularies, metadata and data refer to the description of the vocabulary and the vocabulary information. Therefore, we combined the indicators evaluating data and metadata in the mapping, wherever possible. The FVFs, associated with selected indicators, can be used as indicators for FAIR Vocabulary as shown in Table 3. 6 Assessment against Indicators for FAIR Vocabulary Features We tested the FAIR Vocabulary Features and corresponding indicators on three representative vocabularies, GO, EFO and ICD-11, as shown in Table 4. For each FVF, three compliance levels are assigned; if a vocabulary meets the re- quirements of all indicators, full compliance is achieved. Otherwise, depending on the scoring within each FVF, partial compliance or no compliance results are given. The percentages of full compliance, partial compliance and no compliance features are also calculated. Supplementary Table 5-7 provide the assessment details. From the assessment results, both the Gene Ontology and Experimental Fac- tor Ontology are vocabularies of high FAIR level, with over 80% FVFs fulfilled. The Gene Ontology only partially complies with ‘FVF-6: Vocabularies and their terms are persistent over time and are appropriately versioned ’, with a Fail in Features of a FAIR Vocabulary 7 Table 3. Indicators for FAIR Vocabulary Features. Alignment between the FAIR Vo- cabulary Features and RDA Data Maturity level indicators FAIR vocabulary Feature RDA indicator Indicator ID FVF-1: Vocabulary and their terms are as- signed globally unique and persistent identi- RDA-F1-01M Metadata is identified by a persistent identifier fiers. RDA-F1-01D Data is identified by a persistent identifier RDA-F1-02M Metadata is identified by a globally unique identifier RDA-F1-02D Data is identified by a globally unique identifier FVF-2: Vocabularies and their terms have rich RDA-F2-01M Rich metadata is provided to allow discovery metadata. FVF-3: Vocabularies and their terms can be ac- cessed using the identifiers, preferably by both RDA-A1-01M Metadata contains information to enable the human and machine. user to get access to the data RDA-A1-02M Metadata can be accessed manually (i.e. with human intervention) RDA-A1-02D Data can be accessed manually (i.e. with hu- man intervention) RDA-A1-03M Metadata identifier resolves to a metadata record RDA-A1-03D Data identifier resolves to a digital object RDA-A1-05D Data can be accessed automatically (i.e. by a computer program) FVF-4: Vocabularies and their terms are reg- RDA-F4-01M Metadata is offered in such a way that it can istered or indexed in a searchable engine or a be harvested and indexed resource. FVF-5: Vocabularies and their terms are re- trievable using a standardised communications protocol, preferably open, free and universally RDA-A1-04M Metadata is accessed through standardised implementable protocols. and allows for au- protocol thentication and authorisation, where neces- sary. RDA-A1-04D Data is accessible through standardised proto- col RDA-A1.1-01M Metadata is accessible through a free access protocol RDA-A1.1-01D Data is accessible through a free access protocol RDA-A1.2-01D Data is accessible through an access protocol that supports authentication and authorisation FVF-6: Vocabularies and their terms are persis- RDA-A2-01M Metadata is guaranteed to remain available af- tent over time and are appropriately versioned. ter data is no longer available RDA-R1.2-01M Metadata includes provenance information ac- cording to community-specific standards RDA-R1.2-02M Metadata includes provenance information ac- cording to a cross-community language FVF-7: Vocabularies and their terms use a formal, accessible and broadly applicable, and RDA-I1-01M Metadata uses knowledge representation ex- preferably machine-understandable language pressed in standardised format for knowledge representation. RDA-I1-01D Data uses knowledge representation expressed in standardised format RDA-I1-02M Metadata uses machine-understandable knowl- edge representation RDA-I1-02D Data uses machine-understandable knowledge representation FVF-8: Vocabularies and terms use qualified RDA-I3-02D Data includes qualified references to other data references to other vocabularies. RDA-I3-03M Metadata includes qualified references to other metadata FVF-9: Vocabularies and terms are described RDA-R1-01M Plurality of accurate and relevant attributes with a plurality of accurate and relevant at- are provided to allow reuse tributes. FVF-10: Vocabularies are released with a stan- dard data usage licence, preferably machine- RDA-R1.1-01M Metadata includes information about the li- readable licence. cence under which the data can be reused RDA-R1.1-02M Metadata refers to a standard reuse licence RDA-R1.1-03M Metadata refers to a machine-understandable reuse licence FVF-11: Vocabularies meet domain relevant RDA-R1.3-01M Metadata complies with a community standard community standards. RDA-R1.3-01D Data complies with a community standard RDA-R1.3-02M Metadata is expressed in compliance with a machine-understandable community standard RDA-R1.3-02D Data is expressed in compliance with a machine-understandable community standard 8 F.Xu, N.Juty et al. Table 4. FAIR vocabulary feature applied, assessment results of Gene ontology, Ex- perimental factor Ontology and ICD-11 Vocabulary FAIR vocabulary Feature Gene Ontol- — Experimental ICD-11 ogy Factor Ontol- ogy FVF-1: Vocabulary and their terms are assigned globally unique Full Compliance Full Compliance Partial Compli- and persistent identifiers. ance FVF-2: Vocabularies and their terms have rich metadata. Full Compliance No Compliance Full Compliance FVF-3: Vocabularies and their terms can be accessed using the Full Compliance Full Compliance Partial Compli- identifiers, preferably by both human and machine. ance FVF-4: Vocabularies and their terms are registered or indexed Full Compliance Full Compliance No Compliance in a searchable engine or a resource. FVF-5: Vocabularies and their terms are retrievable using a Full Compliance Full Compliance Full Compliance standardised communications protocol, preferably open, free and universally implementable protocols. and allows for authen- tication and authorisation, where necessary. FVF-6: Vocabularies and their terms are persistent over time Partial Compli- Partial Compli- Partial Compli- and are appropriately versioned. ance ance ance FVF-7: Vocabularies and their terms use a formal, accessible Full Compliance Full Compliance No Compliance and broadly applicable, and preferably machine-understandable language for knowledge representation. FVF-8: Vocabularies and terms use qualified references to other Full Compliance Full Compliance Partial Compli- vocabularies. ance FVF-9: Vocabularies and terms are described with a plurality Full Compliance Full Compliance No Compliance of accurate and relevant attributes. FVF-10: Vocabularies are released with a standard data usage Full Compliance Full Compliance Full Compliance licence, preferably machine-readable licence. FVF-11: Vocabularies meet domain relevant community stan- Full Compliance Full Compliance No Compliance dards. FAIR Vocabulary Feature summary FVF, full compliance 90.91% 81.82% 27.27% FVF, partial compliance 9.09% 9.09% 36.36% FVF, no compliance 0.00% 9.09% 36.36% Features of a FAIR Vocabulary 9 ‘Indicator RDA-R1.2-02M: Metadata includes provenance information according to a cross-community language’. ‘FVF-2: Vocabularies and their terms have rich metadata’ was not complied with since no general description of the ontology is provided in the released artefact. Compared with the ontologies, the taxonomy, ICD-11, fully complies with 18.18% FVFs, and partially complies with 36.36% FVFs. This is because ICD-11 neither refers to other vocabularies, nor adheres to other community standards, such as vocabulary formats. ICD-11 was selected for this evaluation as it already offers significant FAIR improvements over ICD- 10[30], such as providing a standard licence. 7 Discussion The FAIR Vocabulary Features we propose integrate multiple FAIR vocabu- lary requirements and can be used as FAIR vocabulary standards to guide the development and maintenance of vocabularies. Each FVF is associated with in- dicators enabling its quantifiable, objective assessment. Those indicators can be connected to related standards and extended to existing or emerging standards in other domains. For example, FVF-8: cross-referencing other vocabularies can be linked to the ontology cross-reference standards, MIREOT. We focused on how FVFs can be applied to ontologies, and demonstrated the potential for using them across other forms of vocabularies; for example, with the ICD-11, assess- ment would inform authors on the means to enrich their resources. Because of our expertise and requirements, this manuscript focuses on the biomedical domain; however, we anticipate this framework could be reused in other domains. Integrating the FAIR vocabulary features with FAIR indicators makes it possible to assess the FAIR level of vocabularies, identify progressive ontology development use cases, and improve those vocabularies. We selected the RDA indicators as it has proven to be useful in many datasets and has been referenced by other assessment approaches in FAIRassist.org; yet, FVFs could alternatively be aligned to other FAIR-principle based indicators which would similarly reflect the guiding principles proposed by Wilkinson et al. Besides manual assessment, quantifiable formal indicators are also amenable to becoming machine actionable. Some efforts already exist, and reusing shared indicators will make it possible to perform automated FAIR vocabulary assessments. The indicators were proposed to objectively measure the FAIR level of on- tologies, yet this score does not reflect an absolute FAIR level for the vocab- ulary. Indeed, depending on the purpose and requirements of the vocabulary, some FVFs can be more or less important than other features. For example, for internal vocabularies which are used and shared within an institution, having global identifiers (FVF-1) is not a mandatory requirement. Instead of comparing the FAIR score of different vocabularies to find the ‘FAIRer’ one, we propose the FAIR score should be used to measure and guide the evolution of FAIR vocabularies by successively comparing the FAIR levels of iteratively developed versions. For example, compared to ICD-10, its successor, ICD-11 has incorpo- 10 F.Xu, N.Juty et al. rated many features to make it FAIRer, such as providing APIs for easier access, having a machine-readable license, etc. From the assessment results of the two ontologies and ICD-11, ontology-based vocabularies follow stricter semantics and therefore fared better in the scoring of FAIR features. For example, many ontology-related standards have been estab- lished, including formats, such as OWL, guidelines such as the OBO principles, minimum information standards, such as MIBBI[36], and mechanisms for cross- references or incorporating external ontologies, such as MIREOT. This naturally reflects in a high score for compliance with community standards, which is a core part of FAIR Vocabulary Features and which improves the interoperability and reusability of a vocabulary. The FAIR Vocabulary features and assessments provide insights on how to improve vocabularies. For example, based on the EFO assessments, the FAIR level of EFO could easily be improved by adding a description of the aim and function of EFO. This way, different vocabulary management services can har- vest the information. Acknowledgements This work is funded by the IMI-FAIRplus project (Grant number 802750) and the European Molecular Biology Laboratory - European Bioinformatics Institute core funds. 8 References [1] Creative commons — attribution 4.0 international — CC BY 4.0, https: //creativecommons.org/licenses/by/4.0/ [2] ID policy, http://www.obofoundry.org/id-policy [3] The MIT license | open source initiative, https://opensource.org/ licenses/MIT [4] F-UJI automated FAIR data assessment tool (2020), https://www. fairsfair.eu/f-uji-automated-fair-data-assessment-tool [5] FAIR data maturity model: specification and guide- lines - draft (2020), https://www.rd-alliance. org/group/fair-data-maturity-model-wg/outcomes/ fair-data-maturity-model-specification-and-guidelines [6] FAIRassist.org (2021), https://fairassist.org/#!/ [7] Antoniou, G., van Harmelen, F.: Web ontology language: OWL. pp. 67– 92. International Handbooks on Information Systems, Springer (2004). https://doi.org/10.1007/978-3-540-24750-04 [8] Arnaud, E., Cooper, L., Shrestha, R., et al.: Towards a reference plant trait ontology for modeling knowledge of plant traits and phenotypes (2012), http://wrap.warwick.ac.uk/59831/ [9] Blum, M., Chang, H.Y., Chuguransky, S., et al.: The InterPro pro- tein families and domains database: 20 years on 49, D344–d354 (2021). https://doi.org/10.1093/nar/gkaa977 Features of a FAIR Vocabulary 11 [10] Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology 32, D267–270 (2004). https://doi.org/10.1093/nar/gkh061 [11] Burdett, T., Xu, F., Courtot, M., et al.: FAIRplus: D3.2 IMI FAIR metrics publication . https://doi.org/10.5281/zenodo.4428633 [12] Consortium, T.G.O.: The gene ontology resource: 20 years and still GOing strong 47, D330–d338 (2019). https://doi.org/10.1093/nar/gky1055 [13] Consortium, T.U.: UniProt: the universal protein knowledgebase in 2021 49, D480–d489 (2021). https://doi.org/10.1093/nar/gkaa1100 [14] Courtot, M., Gibson, F., Lister, A., et al.: MIREOT: the minimum information to reference an external ontology term pp. 1–1 (2009). https://doi.org/10.1038/npre.2009.3576.1 [15] Cox, S.J.D., Gonzalez-Beltran, A.N., Magagna, B., et al.: Ten sim- ple rules for making a vocabulary FAIR 17(6), e1009041 (2021). https://doi.org/10.1371/journal.pcbi.1009041 [16] Drysdale, R., Cook, C.E., Petryszak, R., et al.: The ELIXIR core data resources: fundamental infrastructure for the life sciences 36(8), 2636–2642 (2020). https://doi.org/10.1093/bioinformatics/btz959 [17] Foundry, T.O.: The OBO flat file format specification, version 1.2, https: //owlcollab.github.io/oboformat/doc/GO.format.obo-1%5F2.html [18] Foundry, T.O.: OBO foundry principles, overview, http://www. obofoundry.org/principles/fp-000-summary.html [19] Foundry, T.O.: PURL administration, https://purl.prod.archive.org/ [20] Foundry, T.O.: Versioning (principle 4), http://www.obofoundry.org/ principles/fp-004-versioning.html [21] Garijo, D., Corcho, O., Poveda-Villalon, M.: FOOPS!: An ontology pit- fall scanner for the FAIR principles p. 4 (2021), http://ceur-ws.org/ Vol-2980/paper321.pdf [22] Garijo, D., Poveda-Villalón, M.: Best practices for implementing FAIR vocabularies and ontologies on the web (2020), http://arxiv.org/abs/ 2003.13084 [23] Hastings, J., Owen, G., Dekker, A., et al.: ChEBI in 2016: Improved ser- vices and an expanding collection of metabolites 44, D1214–1219 (2016). https://doi.org/10.1093/nar/gkv1031 [24] Hugo, W., Le Franc, Y., Coen, G., et al.: D2.5 FAIR semantics recommen- dations second iteration (2020). https://doi.org/10.5281/zenodo.4314321 [25] Jupp, S., Burdett, T., Malone, J., et al.: A new ontology lookup service at EMBL-EBI p. 2 (2015), http://ceur-ws.org/Vol-1546/paper%5F29.pdf [26] Krajewski, P., Chen, D., Ćwiek, H., et al.: Towards recommendations for metadata and data handling in plant phenotyping 66(18), 5417–5427 (2015). https://doi.org/10.1093/jxb/erv271 [27] Malone, J., Holloway, E., Adamusiak, T., et al.: Modeling sample vari- ables with an experimental factor ontology 26(8), 1112–1118 (2010). https://doi.org/10.1093/bioinformatics/btq099 [28] Mungall, C.J., Torniai, C., Gkoutos, G.V., et al.: Uberon, an integrative multi-species anatomy ontology 13(1), R5 (2012). https://doi.org/10.1186/gb-2012-13-1-r5 12 F.Xu, N.Juty et al. [29] Ochoa, D., Hercules, A., Carmona, M., et al.: Open targets platform: sup- porting systematic drug–target identification and prioritisation 49, D1302– d1310 (2021). https://doi.org/10.1093/nar/gkaa1027 [30] Organization, W.H.: International classification of diseases for mortality and morbidity statistics (10h revision) (2010), https://www.who.int/ classifications/icd/ICD10Volume2_en_2010.pdf [31] Organization, W.H.: International classification of diseases for mortality and morbidity statistics (11th revision) (2021), https://icd.who.int/ browse11/l-m/en [32] Pedruzzi, I., Rivoire, C., Auchincloss, A.H., et al.: HAMAP in 2015: updates to the protein family classification and annotation system 43, D1064–d1070 (2015). https://doi.org/10.1093/nar/gku1002 [33] Sansone, S.A., McQuilton, P., Rocca-Serra, P., et al.: FAIRsharing as a community approach to standards, repositories and policies 37(4), 358–367 (2019). https://doi.org/10.1038/s4158701900808 [34] Smith, B., Ashburner, M., Rosse, C., et al.: The OBO foundry: coordinated evolution of ontologies to support biomedical data integration 25(11), 1251– 1255 (2007). https://doi.org/10.1038/nbt1346 [35] Snomed: SNOMED home page, https://www.snomed.org/ [36] Taylor, C.F., Field, D., Sansone, S.A., et al.: Promoting co- herent minimum reporting guidelines for biological and biomed- ical investigations: the MIBBI project 26(8), 889–896 (2008). https://doi.org/https://doi.org/110.1038/nbt.1411 [37] W3id: w3id.org - permanent identifiers for the web, https://w3id.org/ [38] Whetzel, P.L., Noy, N.F., Shah, N.H., et al.: BioPortal: enhanced function- ality via new web services from the national center for biomedical ontology to access and use ontologies in software applications 39, W541–w545 (2011). https://doi.org/10.1093/nar/gkr469 [39] Wilkinson, M.: The FAIR maturity evaluation service (2021), https:// fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/ [40] Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., et al.: The FAIR guid- ing principles for scientific data management and stewardship 3(1), 160018 (2016). https://doi.org/10.1038/sdata.2016.18 [41] Wimalaratne, S.M., Juty, N., Kunze, J., et al.: Uniform resolu- tion of compact identifiers for biomedical data 5, 180029 (2018). https://doi.org/10.1038/sdata.2018.29 Supplementary Table 1: The suitability of OBO principles used as FAIR Vocabulary Features OBO-1: The ontology MUST be openly available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and Suitable as FAIR subsequently redistributed in altered form under the Vocabulary Feature? original name or with the same identifiers. No While it is highly desirable to have open source semantic artefacts, as it is to have open source code and ultimately open research. This is not a prerequisite for vocabularies to be FAIR, nor is it well aligned with the FAIR principles which require licensing information to be provided but do not mandate it be open source. Suitable as FAIR OBO-2: The ontology is made available in a common Vocabulary Feature? formal language in an accepted concrete syntax. Yes Common formats define minimum standards for accessing an ontology, support using ontologies in a FAIR capable resource, and are the foundation of interoperable vocabulary. Many common formal languages have been proposed during the development history of ontologies. OWL format is currently used as a W3C standard. Suitable as FAIR OBO-3: Each class and relation (property) in the Vocabulary Feature? ontology must have a unique URI identifier. Yes Identifiability is a core FAIR principle, and in order to be fulfilled, all elements of the ontology, such as classes and relationships, should be clearly and uniquely identified. OBO-4: The ontology provider has documented procedures for versioning the ontology, and different Suitable as FAIR versions of ontology are marked, stored, and officially Vocabulary Feature? released. No Proper versioning improves the findability and reusability of the vocabulary. We proposed a corresponding FVF to cover this aspect. Yet, this principle evaluates the development, especially the documentation process, of the ontology rather than FAIR vocabulary. OBO-5: The scope of an ontology is the extent of the domain or subject matter it intends to cover. The Suitable as FAIR ontology must have a clearly specified scope and Vocabulary Feature? content that adheres to that scope. Yes Users must be able to determine which ontologies meet their needs in order to implement these in FAIR capable resources and to be interoperable with other vocabularies. A clear specification of this aids in FAIR implementation; vocabularies do not exist in isolation. OBO-6: The ontology has textual definitions for the Suitable as FAIR majority of its classes and for top level terms in Vocabulary Feature? particular No While textual definitions provide human-readable content and are generally desirable, this is not essential as an FVF as ontology content has definitions in terms of logical axioms (e.g., position in the hierarchy) and term labels. Suitable as FAIR OBO-7: Relations should be reused from the Relations Vocabulary Feature? Ontology (RO). Yes Relation standards promote interoperability across different ontologies. This principle focuses on the relationships within and across ontologies. It can be adapted and used in a broader range of FAIR vocabularies. OBO-8: The owners of the ontology should strive to provide as much documentation as possible. The documentation should detail the different processes Suitable as FAIR specific to an ontology life cycle and target various Vocabulary Feature? audiences (users or developers). Yes Rich metadata of the vocabulary, such as the purpose and status of the ontology, promotes the reuse of the ontology. OBO-9: The ontology developers should document that Suitable as FAIR the ontology is used by multiple independent people or Vocabulary Feature? organizations. No Usage of vocabularies depends on the content and there are strategies for interoperating ontologies in what is anyway a crowded semantic space. Evidence that an ontology is highly used - when measurable - assumes a level of maturity that is unlikely for some starting communities, and would not reflect fairness of the resource. OBO-10: OBO Foundry ontology development, in common with many other standards-oriented scientific Suitable as FAIR activities, should be carried out in a collaborative Vocabulary Feature? fashion. Yes Vocabularies should be implementable in FAIR data resources and should reflect community needs. Responsiveness to community needs comes via collaboration and development should therefore be collaborative. OBO-11: There should be a person who is responsible for communications between the community and the ontology developers, for communicating with the Foundry on all Foundry-related matters, for mediating discussions involving maintenance in the light of Suitable as FAIR scientific advance, and for ensuring that all user Vocabulary Feature? feedback is addressed. No Having a designed contact is important for the community to provide feedback and ensures someone is having an ongoing editorial responsibility for the ontology. While it does pertain to sustainability of the resource and its possible evolution, it doesn’t directly speak to its level of fairness. Suitable as FAIR Vocabulary Feature? OBO-12: Naming conventions are used No Consistency of naming is a best practice feature of ontologies but does not detract from deployment in support of the FAIR principles. Suitable as FAIR OBO-16: The ontology needs to reflect changes in Vocabulary Feature? scientific consensus to remain accurate over time. Yes Vocabularies codify knowledge. To fulfil one of their primary functions, they must evolve. Dead ontologies fail to support FAIR capable resources to interoperate. OBO-20: Ontology developers MUST offer channels for Suitable as FAIR community participation and SHOULD be responsive to Vocabulary Feature? requests. No Having communication channels to collect and respond to community requirements supports the maintenance and evolution of the ontology. However, it is not directly linked to the FAIRness of the vocabulary. Supplementary Table 2: FAIR Vocabulary Features mapped to FAIR principles and FAIR vocabulary requirements Findability Accessibility Interoperability Reusability FVF-2 FVF-6 FVF-9 FVF-10 FAIR in terms of application to FAIR data. FVF-11 FVF-11 FVF-2 FVF-1 FVF-3 FVF-6 FAIR in terms of serving as a FAIR data FVF-4 FVF-5 FVF-7 resource. FVF-6 FVF-10 FVF-7 FVF-9 FAIR in the context of interacting with other vocabularies. FVF-8 Supplementary Materials 3: VersionIRI analysis We fetched ontologies indexed in the OLS repository and selected those that are successfully loaded and up-to-date. OLS contains 266 biomedical ontologies by the time we access the database (https://www.ebi.ac.uk/ols/api/ontologies). We filtered out ontologies which could not be indexed automatically (without a valid loaded timestamp), and removed inactive ontologies based on the date information in the versionIRI section. 200 ontologies are selected based on these criteria. We recognise the limitations of the ontology selection approaches. The filtering relys on the metadata collected by OLS instead of the ontology itself, and therefore might not correctly reflect the ontology status. We filted out some inactive ontologies based on the loading time (only ontologies with a loading timestamp after 2019-01-01 are choosen) and date information in the versioneIRI (ontologies with date before 2019-01-01 in the verionIRI are removed). But these criterias does not ensure all vocabularies selected are up-to-date. For example, for ontologies using semantic versioning format where no date information is provided in the versionIRI, or some update information are collected in other metadata fields such as 'annotation' 'editor comments', etc. Despite the constraints of the analysis, it still provides enough information to showcase the status of current vocabularies. A complete list of selected ontologies are provided in the table below. Supplementary Table 4: RDA data maturity indicators that are not mapped to FAIR Vocabulary Features ID Indicator RDA-F3-01M Metadata includes the identifier for the data RDA-I2-01M Metadata uses FAIR-compliant vocabularies RDA-I2-01D Data uses FAIR-compliant vocabularies RDA-I3-01M Metadata includes references to other metadata RDA-I3-01D Data includes references to other data RDA-I3-02M Metadata includes references to other data RDA-I3-04M Metadata include qualified references to other data Supplementary Table 5: FAIR assessment results of Gene ontology RDF FAIR indicators version v0.05 Project name FAIR assessment Gene Ontology Assessment date 2021-08-02 Dataset version Release 2021-07-02 https://github.com/geneontology/go-ontology and Dataset link http://geneontology.org/ FAIR vocabulary feature summary FVF, full compliance 90.91% FVF, partial compliance 9.09% FVF, no compliance 0.00% As ses sm ent RDA - Assess FAIR vocabulary indicat RD ment - Feature or ID Indicator A FVF Assessment details Metadata are provided in http://geneontology.org/ docs/ontology-documen tation/. It can also be found in the OBO foundry repository https://github.com/OBO Foundry/OBOFoundry.g ithub.io/edit/master/onto logy/go.md. But they RDA-F1- Metadata is identified by are not standard 01M a persistent identifier 1 persistent identifiers. Gene ontology uses PURL identifiers RDA-F1- Data is identified by a http://purl.obolibrary.org/ 01D persistent identifier 1 obo/go.owl http://geneontology.org/ Metadata is identified by docs/ontology-documen RDA-F1- a globally unique tation/ is globally unique 02M identifier 1 identifier. FVF-1: Vocabulary and their terms are assigned Full http://purl.obolibrary.org/ globally unique and RDA-F1- Data is identified by a Complia obo/go.owl is globally persistent identifiers. 02D globally unique identifier 1 nce unique identifier. Descriptive text is provided in http://geneontology.org. Rich metadata for indexing and reuse is provided in https://github.com/OBO FVF-2: Vocabularies Rich metadata is Full Foundry/OBOFoundry.g and their terms have RDA-F2- provided to allow Complia ithub.io/edit/master/onto rich metadata. 01M discovery 1 nce logy/go.md. The metadata includes Metadata contains data download links information to enable http://geneontology.org/ RDA-A1- the user to get access docs/download-ontology 01M to the data 1 /. Metadata can be accessed manually (i.e. The metadata can be RDA-A1- with human accessed from the gene 02M intervention) 1 ontology website. Data can be downloaded from Data can be accessed http://geneontology.org/ RDA-A1- manually (i.e. with docs/download-ontology 02D human intervention) 1 / http://geneontology.org/ Metadata identifier docs/ontology-documen RDA-A1- resolves to a metadata tation// is resolvable and 03M record 1 directs to the metadata. The vocabulary identifier http://purl.obolibrary.org/ obo/go.owl resolves to the ontology source files. Identifiers such as http://purl.obolibrary.org/ obo/GO_0098743 RDA-A1- Data identifier resolves resolves to ontology FVF-3: Vocabularies 03D to a digital object 1 terms. and their terms can be accessed using the Data can be identifiers, preferably by Data can be accessed Full downloaded using both human and RDA-A1- automatically (i.e. by a Complia command line tools, machine. 05D computer program) 1 nce such as curl, wget, etc. Gene ontology has been indexed by EMBL FVF-4: Vocabularies OLS, BioPortal and and their terms are Metadata is offered in other semantic registered or indexed in such a way that it can Full repositories. Also it is a searchable engine or RDA-F4- be harvested and Complia indexed in Google a resource. 01M indexed 1 nce search. Metadata is accessed The metadata can be RDA-A1- through standardised accessed through the 04M protocol 1 HTTP protocol. Data is accessible RDA-A1- through standardised Data can be accessed 04D protocol 1 through HTTP protocol. FVF-5: Vocabularies Metadata is accessible and their terms are RDA-A1. through a free access HTTP is a free access retrievable using a 1-01M protocol 1 protocol. standardised Data is accessible communications RDA-A1. through a free access HTTP is a free access protocol, preferably 1-01D protocol 1 protocol. open, free and universally HTTP HTTP allows implementable Data is accessible access control. But protocols. and allows for through an access authentication and authentication and protocol that supports Full authorisation are not authorisation, where RDA-A1. authentication and Complia required by Gene necessary. 2-01D authorisation NA nce Ontology. Metadata is guaranteed Metadata and data can to remain available after be found in version RDA-A2- data is no longer controlled repositories 01M available 1 on Github. The metadata includes links to access different Metadata includes snapshots of the provenance information ontology. The according to snapshots are in RDA-R1. community-specific owl/obo format and has 2-01M standards 1 PURL identifiers. FVF-6: Vocabularies Metadata includes and their terms are provenance information persistent over time and according to a Partial are appropriately RDA-R1. cross-community Complia versioned. 2-02M language 0 nce The metadata is provided in a standard Metadata uses format and can be knowledge harvested by major representation vocabulary serivices, RDA-I1- expressed in such as OLS and FVF-7: Vocabularies 01M standardised format 1 BioPortal. and their terms use a Data uses knowledge formal, accessible and representation broadly applicable, and RDA-I1- expressed in The data uses OWL preferably 01D standardised format 1 and OBO standards. machine-understandabl e language for Metadata uses Full knowledge RDA-I1- machine-understandabl Complia Basic metadata is representation. 02M e knowledge 1 nce provided in OWL. representation The GO data uses OWL Data uses and OBO formats, machine-understandabl which are RDA-I1- e knowledge machine-readable 02D representation 1 community formats. The Gene ontology cross reference policy is here: http://geneontology.org/ docs/download-mappin gs/ Data from other RDA-I3- Data includes qualified vocabularies are 02D references to other data 1 provided as 'xref'/ The Gene ontology cross reference policy is FVF-8: Vocabularies provided here: and terms use qualified Metadata includes Full http://geneontology.org/ references to other RDA-I3- qualified references to Complia docs/download-mappin vocabularies. 03M other metadata 1 nce gs/ FVF-9: Vocabularies and terms are Gene Ontology includes described with a sufficient term plurality of accurate Plurality of accurate and Full attributes. and relevant RDA-R1 relevant attributes are Complia http://geneontology.org/ attributes. -01M provided to allow reuse 1 nce docs/GO-term-elements Gene Ontology Consortium data and data products are Metadata includes licensed under the information about the Creative Commons RDA-R1. licence under which the Attribution 4.0 Unported 1-01M data can be reused 1 License. FVF-10: Vocabularies RDA-R1. Metadata refers to a are released with a 1-02M standard reuse licence 1 standard data usage licence, preferably Metadata refers to a Full machine-readable RDA-R1. machine-understandabl Complia licence. 1-03M e reuse licence 1 nce RDA-R1. Metadata complies with 3-01M a community standard 1 RDA-R1. Data complies with a 3-01D community standard 1 Metadata is expressed in compliance with a RDA-R1. machine-understandabl 3-02M e community standard 1 FVF-11: Vocabularies Full meet domain relevant Complia community standards. nce Data is expressed in compliance with a RDA-R1. machine-understandabl 3-02D e community standard 1 Supplementary Table 6: FAIR assessment results of Experimental Factor Ontology RDF FAIR indicators version v0.05 Project name EFO assessment Assessment date 2021-08-02 Dataset version 3.32.0 Dataset link http://www.ebi.ac.uk/efo/releases/v3.32.0/efo.owl FAIR vocabulary feature summary FVF, full compliance 81.82% FVF, partial compliance 9.09% FVF, no compliance 9.09% As ses sm ent RDA - Assess FAIR vocabulary indicato RD ment - Feature r ID Indicator A FVF* Assessment details Both the data and metadata use identifier: RDA-F1- Metadata is identified by http://www.ebi.ac.uk/efo 01M a persistent identifier 1 /efo.owl RDA-F1- Data is identified by a 01D persistent identifier 1 Metadata is identified by RDA-F1- a globally unique FVF-1: Vocabulary and 02M identifier 1 their terms are assigned Full globally unique and RDA-F1- Data is identified by a Complia persistent identifiers. 02D globally unique identifier 1 nce Description of EFO has been provided in ontology browsers, such as OLS and BioPortal. However, the FVF-2: Vocabularies Rich metadata is No descriptions are not and their terms have RDA-F2- provided to allow Complia included in the EFO rich metadata. 01M discovery 0 nce source file. FVF-3: Vocabularies Metadata contains Full EFO and its terms has and their terms can be RDA-A1- information to enable Complia unique identifiers and accessed using the 01M the user to get access 1 nce can be accessed. identifiers, preferably by to the data both human and Metadata can be machine. accessed manually (i.e. RDA-A1- with human 02M intervention) 1 Data can be accessed RDA-A1- manually (i.e. with 02D human intervention) 1 Metadata identifier RDA-A1- resolves to a metadata 03M record 1 RDA-A1- Data identifier resolves 03D to a digital object 1 Data can be accessed RDA-A1- automatically (i.e. by a 05D computer program) 1 FVF-4: Vocabularies The metadata is and their terms are Metadata is offered in provided in OWL format registered or indexed in such a way that it can Full and has been harvested a searchable engine or RDA-F4- be harvested and Complia by both OLS and a resource. 01M indexed 1 nce BioPortal EFO can be accessed Metadata is accessed using HTTP protocol, RDA-A1- through standardised and it is an open-acess 04M protocol 1 ontology. Data is accessible RDA-A1- through standardised FVF-5: Vocabularies 04D protocol 1 and their terms are Metadata is accessible retrievable using a RDA-A1. through a free access standardised 1-01M protocol 1 communications Data is accessible protocol, preferably RDA-A1. through a free access open, free and 1-01D protocol 1 universally implementable Data is accessible protocols. and allows for through an access authentication and protocol that supports Full authorisation, where RDA-A1. authentication and Complia necessary. 2-01D authorisation NA nce EFO follows vocabulary release guidelines, and its versioned copies can FVF-6: Vocabularies be found on Github. But and their terms are Metadata is guaranteed it doesn't strictly follows persistent over time and to remain available after Partial cross community are appropriately RDA-A2- data is no longer Complia language standards, versioned. 01M available 1 nce such as rdfs, xmls standards. Metadata includes provenance information according to RDA-R1. community-specific 2-01M standards 1 Metadata includes provenance information according to a RDA-R1. cross-community 2-02M language 0 EFO can be Metadata uses downloaded in OWL knowledge and OBO, which are representation standardised format and RDA-I1- expressed in machine-understandabl 01M standardised format 1 e. Data uses knowledge representation RDA-I1- expressed in 01D standardised format 1 FVF-7: Vocabularies Metadata uses and their terms use a machine-understandabl formal, accessible and RDA-I1- e knowledge broadly applicable, and 02M representation 1 preferably machine-understandabl Data uses e language for machine-understandabl Full knowledge RDA-I1- e knowledge Complia representation. 02D representation 1 nce EFO reuses terms from other vocabularies and provides suffient reference, such as RDA-I3- Data includes qualified source of the external 02D references to other data 1 term. FVF-8: Vocabularies and terms use qualified Metadata includes Full references to other RDA-I3- qualified references to Complia vocabularies. 03M other metadata 1 nce FVF-9: Vocabularies and terms are described with a plurality of Plurality of accurate and Full accurate and relevant RDA-R1 relevant attributes are Complia attributes. -01M provided to allow reuse 1 nce FVF-10: Vocabularies are released with a Metadata includes standard data usage information about the Full licence, preferably RDA-R1. licence under which the Complia machine-readable 1-01M data can be reused 1 nce licence. RDA-R1. Metadata refers to a 1-02M standard reuse licence 1 Metadata refers to a RDA-R1. machine-understandabl 1-03M e reuse licence 1 EFO uses the standard OWL format, complies with OBO principles and RDA-R1. Metadata complies with imports terms following 3-01M a community standard 1 the MIREOT standards. RDA-R1. Data complies with a 3-01D community standard 1 Metadata is expressed in compliance with a RDA-R1. machine-understandabl 3-02M e community standard 1 Data is expressed in FVF-11: Vocabularies compliance with a Full meet domain relevant RDA-R1. machine-understandabl Complia community standards. 3-02D e community standard 1 nce Supplementary Table 7: FAIR assessment results of ICD-11 Project name ICD-11 FAIR assessment Assessment date 2021-08-02 Dataset version 05/2021 ICD-11 browser and ICD11 print version:https://icd.who.int/en print version:https://icd.who.int/browse11/Downloads/Download? Dataset link fileName=print_en.zip FAIR vocabulary feature summary FVF, full compliance 27.27% FVF, partial compliance 36.36% FVF, no compliance 36.36% As ses sm ent RDA - Assess FAIR vocabulary indicato RD ment - Feature r ID Indicator A FVF Assessment details RDA-F1- Metadata is identified by 01M a persistent identifier 0 No metadata identifier. Example data identifier: 2C25.1 Small cell RDA-F1- Data is identified by a carcinoma of bronchus 01D persistent identifier 1 or lung. Metadata is identified by RDA-F1- a globally unique 02M identifier 0 The identifiers in ICD-11 has been through several iterations. Currently, ICD-11 provides identifiers such as (1C60-1C62.Z), a more persistent identifier system, http://id.who.int/icd/entit y/911707612 is still FVF-1: Vocabulary and under development. their terms are assigned Partial This assessment is globally unique and RDA-F1- Data is identified by a Complia based on the persistent identifiers. 02D globally unique identifier 0 nce 1C60-1C62.Z system FVF-2: Vocabularies Rich metadata is Full and their terms have RDA-F2- provided to allow Complia rich metadata. 01M discovery 1 nce Metadata contains information to enable RDA-A1- the user to get access 01M to the data 1 Metadata can be accessed manually (i.e. RDA-A1- with human 02M intervention) 1 Data can be accessed RDA-A1- manually (i.e. with 02D human intervention) 1 Metadata identifier RDA-A1- resolves to a metadata 03M record 0 FVF-3: Vocabularies RDA-A1- Data identifier resolves and their terms can be ICD-11 has provided 03D to a digital object 1 accessed using the API, web browser, and identifiers, preferably by Data can be accessed Partial pdf documents for both human and RDA-A1- automatically (i.e. by a Complia human and machine machine. 05D computer program) 1 nce access. FVF-4: Vocabularies ICD-11 metadata is and their terms are Metadata is offered in provided in word registered or indexed in such a way that it can No documents and can not a searchable engine or RDA-F4- be harvested and Complia be directly indexed by a resource. 01M indexed 0 nce vocabulary databases. Metadata is accessed RDA-A1- through standardised 04M protocol 1 ICD-11 uses HTTPS protocol.https://icd.who.i nt/browse11/l-m/en#http Data is accessible %3a%2f%2fid.who.int% RDA-A1- through standardised 2ficd%2fentity%2f91170 FVF-5: Vocabularies 04D protocol 1 7612 and their terms are Metadata is accessible retrievable using a RDA-A1. through a free access standardised 1-01M protocol 1 communications Data is accessible protocol, preferably RDA-A1. through a free access open, free and 1-01D protocol 1 universally implementable Data is accessible protocols. and allows for through an access authentication and protocol that supports Full authorisation, where RDA-A1. authentication and Complia necessary. 2-01D authorisation NA nce Previous versions of ICD-11 can access at https://icd.who.int/brows e11/l-m/en/releases. Metadata is guaranteed However, the versioning to remain available after style does not follow RDA-A2- data is no longer common community 01M available 1 standards. Metadata includes provenance information according to RDA-R1. community-specific 2-01M standards 1 FVF-6: Vocabularies Metadata includes and their terms are provenance information persistent over time and according to a Partial are appropriately RDA-R1. cross-community Complia versioned. 2-02M language 0 nce Metadata uses ICD-11 is published knowledge mainly as a pdf representation document and doesn't RDA-I1- expressed in use standard 01M standardised format 0 vocabulary formats. Data uses knowledge representation RDA-I1- expressed in 01D standardised format 0 FVF-7: Vocabularies Metadata uses and their terms use a machine-understandabl formal, accessible and RDA-I1- e knowledge broadly applicable, and 02M representation 0 preferably machine-understandabl Data uses e language for machine-understandabl No knowledge RDA-I1- e knowledge Complia representation. 02D representation 0 nce RDA-I3- Data includes qualified Terms in ICD-11 doesn't 02D references to other data 0 refer to other terms. FVF-8: Vocabularies and terms use qualified Metadata includes Partial The ICD-11 description references to other RDA-I3- qualified references to Complia refers to other projects vocabularies. 03M other metadata 1 nce and publications. FVF-9: Vocabularies and terms are described with a plurality of Plurality of accurate and No ICD-11 contains only a accurate and relevant RDA-R1 relevant attributes are Complia minimum description of attributes. -01M provided to allow reuse 0 nce each disease. ICD11 provides licensing documentation. https://icd.who.int/en/do cs/ICD11-license.pdf https://icd.who.int/brows e11. Licensed under Metadata includes Creative Commons information about the Attribution-NoDerivative RDA-R1. licence under which the s 3.0 IGO licence (CC 1-01M data can be reused 1 BY-ND 3.0 IGO). FVF-10: Vocabularies RDA-R1. Metadata refers to a are released with a 1-02M standard reuse licence 1 standard data usage licence, preferably Metadata refers to a Full machine-readable RDA-R1. machine-understandabl Complia licence. 1-03M e reuse licence 1 nce RDA-R1. Metadata complies with 3-01M a community standard 0 RDA-R1. Data complies with a 3-01D community standard 0 Metadata is expressed in compliance with a RDA-R1. machine-understandabl 3-02M e community standard 0 Data is expressed in FVF-11: Vocabularies compliance with a No meet domain relevant RDA-R1. machine-understandabl Complia community standards. 3-02D e community standard 0 nce