A Taxonomy of Tools and Approaches for FAIRification Dario Mangione, Leonardo Candela and Donatella Castelli Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” - Consiglio Nazionale delle Ricerche, Via G. Moruzzi 1, Pisa, 56121, Italy Abstract The FAIR principles have drawn a lot of attention since their publication in 2016. A broad range of stakeholders is confronting the implementation of these guiding principles in diverse contexts. This paper identifies and discusses the tools and approaches emerging from stakeholders’ experiences adopting the FAIR principles in practice. In particular, 225 open access grey literature papers (namely, deliverables, milestones and data management plans) on FAIRification have been scrutinised to infer tools and approaches in use. The wealth of emerging tools (477) has been carefully analysed and organised into a comprehensive map highlighting the significant classes of instruments supporting FAIRification. A critical discussion on this collection of tools and approaches and the FAIRification completes the paper. Keywords FAIR, Survey and overview, Systematic literature review 1. Introduction FAIR principles [1] have been introduced as an essential guide for data producers and publishers in implementing good data management supporting manual and automated deposition, explo- ration, sharing, and reusing data. In this context, ‘data’ also refers to all scholarly digital research objects, e.g. algorithms, tools, and workflows leading to a research result, and ‘metadata’ play a key role. The Digital Library community has a long lasting experience on topics related to FAIRness, e.g. on metadata quality [2]. The initial FAIR principles have been revamped to match better the peculiarity of software [3, 4] and computational workflows [5]. Applying these principles to a significant part of the outputs of the research process is meant to ensure transparency, reproducibility, and reusability. The need for FAIR data management is widely recognised, and there is a lot of discussion on the identification of actions required to make it the standard practice in science. Some are already in place at a different level of spreading and practice. For example, many funders demand their projects to produce data management plans according to these principles. Tools are emerging that measure the repositories’ ability to comply with these principles, and services publicly describe their level of fairness as a measure of quality. These concrete activities are based on IRCDL 2022: 18th Italian Research Conference on Digital Libraries, February 24–25, 2022, Padova, Italy $ dario.mangione@isti.cnr.it (D. Mangione); leonardo.candela@isti.cnr.it (L. Candela); donatella.castelli@isti.cnr.it (D. Castelli)  0000-0002-4101-0593 (D. Mangione); 0000-0002-7279-2727 (L. Candela) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) their interpretation of the FAIR objectives and principles to a greater or lesser implementation extent. Indeed, FAIR principles have been introduced to guide approaches for improving the find- ability, accessibility, interoperability, and reusability of digital resources [6]. A few years after publishing these principles, due to the growing of meanings associated with them, the need has emerged to clarify what FAIR is not. In the literature and the different presentations on the subject, it is now explained, for example, that “FAIR is not a standard . . . FAIR is not equal to RDF, Linked Data, or the Semantic Web . . . FAIR is not just about humans being able to find, access, reformat and finally reuse data . . . FAIR is not equal to Open . . . FAIR is not a Life Science hobby” [7]. All in all, communities are called to implement “their own” solution for responding to the FAIR principles. This degree of freedom has led to a proliferation of approaches and technologies around the FAIRification process, i.e. the process of making digital resources FAIR. Although generic workflows governing it have been defined [8], there are no supporting means for communities in identifying suitable technologies and approaches that can be leveraged to respond to community-specific FAIRification needs. This paper introduces and discusses a taxonomy of tools and approaches exploited in concrete FAIRification activities by analysing a corpus of recently published 225 open access grey litera- ture (namely, data management plans, deliverables, and milestones), leading to the identification of 477 tools. The identified tools are different and support various FAIRification process phases. The produced taxonomy is intended to be a tool itself, supporting communities involved in FAIRification tasks by suggesting common approaches and technologies successfully used by others and gaps be filled by new ones. The paper also critically discusses the current state of the art emerging from such a study, identifying steps still needed to spread the FAIR approach. The paper is organised as follows. Section 2 describes the methodology underlying this study. Section 3 classifies and describes the approaches and methods for FAIRification exploited by diverse projects and initiatives. Section 4 critically discusses the findings emerging from the investigation. Finally, Section 5 concludes the paper. 2. Methodology A systematic mapping study approach [9] has been exploited to achieve the study’s goals. In particular, the study is implemented by a structured process where existing literature relevant to the research topic is identified, categorised, and analysed. Like any systematic mapping study, the first step consisted in identifying the literature of interest. In particular, the study focused on grey literature. FAIRification is a practical process often documented by Technical Reports, Project Deliverables or Data Management Plans. Moreover, it was decided to leverage OpenAIRE contents to identify the literature of interest because FAIRification is a duty of research projects funded by the European Commission and other funding bodies. These projects are called to make their deliverables available via OpenAIRE. This decision also allows the collection of software artefacts somehow correlated to FAIRification processes. To identify the literature of interest, we started with the most straightforward query and searched for the terms ‘FAIR’ and ‘FAIRification’ and focused on publications having type ‘Report’, ‘Project Deliverable’, and ‘Project Milestone’. The term ‘FAIR’ brings in a lot on “false positive”; thus, we excluded terms like ‘trade’, ‘value’, ‘play’, ‘treatment’, ‘price’, ‘indivisibility’, ‘division’, ‘africa’, ‘tax’, ‘equality’, ‘trial’, ‘admission’, ‘agreement’, ‘payment’. We also restrict the time range of the results to the period 2016-2021 because FAIR principles were officially published in 2016. This process resulted in 487 publications. To improve the recall and the precision of the search, we contacted OpenAIRE colleagues to be provided with the complete list of keywords and subjects accompanying the selected publications. After revising this list of keywords, we asked for the papers annotated with the chosen keywords. Compared with the previous ones, the results of this “snowballing” allowed us to enrich the initial corpus. The number of identified publications becomes 567 plus 389 software entries explicitly annotated with the selected keywords and subjects. Published Studies 150 100 50 0 16 17 8 19 20 21 1 20 20 20 20 20 20 Figure 1: Papers in the corpus by publication year The resulting corpus of grey literature and software is documented by a file accompanying this paper [10]. Not knowing the extent of the references to tools employed for achieving the FAIRness of resources in the identified corpus, whose dates range from 2021 to 2016, we decided to conduct the analysis backwards, starting from the documents published in 2021 until obtaining a significant set of entries that would allow us to build and test a taxonomy of tools. The sample on which the taxonomy is based consists of 477 unique elements, further reduced to 277 items of immediate interest to the analysis. The difference between the number of items collected and the elements considered valid for the study depended on the willingness to consider tools usually cited in different contexts, as well as the inclusion of elements currently not accessible, not identifiable by their name or acronym, removed or simply with descriptions that did not allow us to assess the relevance of their functions concerning the FAIR principles. 3. Analysis The analysis is based on the manual scrutiny of a refined and final corpus consisting of 225 publications published between 2020 and 20211 , and 95 software entries, which produced a list of 477 items among tools and services related to the FAIRification process of (meta)data, software and workflows. The entries were initially organised using the self-assessed information characterising the resources declared by the resource owner, if any. In particular: (i) the tool or service category; (ii) the reference to the FAIR guiding principle the tool or service responds to by using the corresponding identifier of the FAIR principle; (iii) the domain in which the tool or service is used; (iv) the FAIRness scope of the tool or service, differentiating among (meta)data [1], software [3, 4], and workflows [5]. Because of the heterogeneity of the results obtained, it became necessary to create an ini- tial classification for normalising the tool/service categories, which became the basis for the development of the taxonomy. As for the normalisation of the declared domains, the Frascati framework is used [11] (see Figure 2 for the actual domain distribution of the tools and services deemed valid for creating the taxonomy). Social sciences Natural sciences Medical and health sciences Humanities and the arts Engineering and technology Agricultural and veterinary sciences Cross-domain 0 50 0 0 10 15 Figure 2: Domain distribution of tools and services 3.1. Initial FAIR principle-based categories Each FAIR principle was analysed from the point of view of the class of tools and services required for its implementation (see Table 1 below). The discussions and clarifications given in [6] were taken into account. Eight of the fifteen FAIR principles (namely, F1, F2, F3, F4, I1, R1.1, R1.2, and R1.3) led us to create five candidate classes: (i) GUPRI creation and management service; (ii) Metadata helper; (iii) Indexing and discovery service; (iv) Licence helper; (v) Converter. To the first five FAIR principles-driven classes, we added (vi) Assessment tool to include the tools and services used to assess a resource’s overall FAIRness and, consequently, interesting all FAIR recommendations 1 We observed that publications preceding it were mainly referring to either tools already referred by publications in the corpus or tools no longer existing or superseded by new ones. Table 1 FAIR principles and corresponding initial categories of tools and services FAIR Principle Tool/service category F1: (Meta) data are assigned globally unique and persistent identifiers GUPRI helper F2: Data are described with rich metadata Metadata helper F3: Metadata clearly and explicitly include the identifier of the data Metadata helper they describe F4: (Meta)data are registered or indexed in a searchable resource Indexing & discovery service A1: (Meta)data are retrievable by their identifier using a standardised communication protocol A1.1: The protocol is open, free and universally implementable A1.2: The protocol allows for an authentication and authorisation where necessary A2: Metadata should be accessible even when the data is no longer available I1: (Meta)data use a formal, accessible, shared, and broadly applicable Metadata helper, Converter language for knowledge representation I2: (Meta)data use vocabularies that follow the FAIR principles I3: (Meta)data include qualified references to other (meta)data Metadata helper R1: (Meta)data are richly described with a plurality of accurate and relevant attributes R1.1: (Meta)data are released with a clear and accessible data usage Licence helper licence R1.2: (Meta)data are associated with detailed provenance Metadata helper R1.3: (Meta)data meet domain-relevant community standards Metadata helper, Converter Table 2 Initial main categories of tool and services Tool/service main category FAIR Principle reference GUPRI helper F1 Metadata helper F2, F3 Indexing and discovery service F4 Metadata helper Converter I1, I3 Licence helper R1.1 Metadata helper R1.2 Converter R1.3 Assessment tool FAIR (see Table 2). The following sections present the resulting taxonomy and describe the identified classes. Figure 3: Taxonomy of FAIR tools 3.2. Taxonomy of FAIR tools By analysing and integrating the 477 tool and service entries obtained through the scrutiny of the grey literature, we have created a taxonomy of FAIR tools structured in seven main classes: (i) GUPRI helper; (ii) Metadata helper; (iii) Indexing and discovery service; (iv) Converter; (v) Licence helper; (vi) Assessment tool; (vii) DMP tool. The seventh category, DMP tool, was added at a later stage (i.e. during the analysis of the collected items) to accommodate the tools that allow the creation of data management plans. Although debatable, this decision was taken because, despite these tools not directly impact on the FAIRness of a resource, they nevertheless provide a coherent reference framework for its management and consequently the possibility of its adherence to the FAIR guidelines. The seven main classes are generally disjoined except for Metadata helper and Converter, since there are tools that fall under the first category that also include a converter function, i.e. a metadata helper can also be a converter, although this functionality is not its focal role. Figure 3 also associates the categories of tools with the FAIR principles they contribute to. 3.2.1. GUPRI helper Globally unique, persistent and machine-resolvable identifiers (GUPRI) are the primary elements to be set for both data and metadata being suggested by the F1 principle. Every FAIRness activity is called to use tools and approaches helping to generate unambiguous identifiers that will continue to work even when the “asset” target of the FAIRification activity is going to disappear, thus becoming no longer available. Because of this role, services and technologies for GUPRI should guarantee the long term availability of the identifiers assigned to the FAIRified assets. The need to have identifiers with specific characteristics was discussed in previous works [12, 13, 14]. In particular, these earlier works highlighted how web-based identifiers were with us well before the FAIR advent and that their core role and efficacy is guaranteed only under some settings (namely, the sustainability and governance of the specific system). This category consists of 33 instances divided between (i) GUPRI creation and management service (32 entries); and (ii) GUPRI finder (1 entry). The first category includes the providers offering GUPRI registration services, with Handle system implementations (e.g. DOI, ePIC, EUDAT B2HANDLE) and Open Researcher and Contributor ID (ORCID) being the most cited. The latter consist of the services that offer a registry providing indexing and search capabilities for finding a GUPRI-related provider, with PID Services Registry [15] being its only instance. The Handle System [16] is a proprietary registry of the Corporation for National Research Initiatives (CNRI) administered by the DONA Foundation, providing a resolution system upon which single Handle.Net software instances are built. The DOI System [17], developed and maintained by the International DOI Foundation, is the most cited implementation of the Handle system and among GUPRIs in the corpus. It ensures the persistence of the handles through a federation of registration agencies, among which the most cited in the documents are Crossref [18] and DataCite [19], mainly via Zenodo. 3.2.2. Metadata helper While, ontologically speaking, the existence of a GUPRI is enough to assert that something unique exists and being resolvable it also allows to locate it, metadata enables to find and identify the associated resource by filtering among similar resources through the accumulation of characteristics. Metadata helpers allow defining the metadata accompanying a resource or altering already existing ones, both manually or automatically, whether they are embedded or stored as a separate file. They also allow the identification of possible metadata directly from the resource to enrich its description, to suggest the appropriate value for a metadata field or check the conformance of the existing metadata against a standard. Based on the 65 tools and services we found in the grey literature and on the functions above, we distinguish metadata helpers among five subclasses, namely (i) Metadata editor (38 entries); (ii) Metadata extractor (6 entries); (iii) Metadata tracker (13 entries); (iv) Metadata validator (7 entries); and (v) Metadata assistant (1 entry). Following the FAIR principles order, at the most basic level the metadata editors allow uncontrolled user input, enabling F2, F3 and R1.2, but they can include functions to enable I1, I3 and/or R1.3 by referring to knowledge representation languages and community standards and by validating and/or suggesting the input. They can also include functions to convert between knowledge representation languages, metadata schemas or file formats, thus also pertaining to the converter category. OpenRefine [20], for example, can be seen as a metadata editor, allowing to simply alter a description template or, by using an RDF (Resource Description Framework) extension, as a metadata converter since it enables its user to export the structured data to RDF. The metadata editor category is quite heterogeneous since it encompasses tools from spread- sheets template generators (like the SIOS Excel template generator [21] for creating Darwin Core compliant descriptions) to mapping tools (like the CIDOC CRM-oriented FORTH-ICS Mapping Memory Manager (3M) [22], which uses the 3M Editor for assisting in the creation of the mappings, suggesting and validating the user input). Moreover, it is not uncommon for metadata editors to offer an integrated validation tool for checking the compliance of the output to a metadata schema, and generally for metadata helpers to share functions. For these reasons, the subclasses of the metadata helpers are usually not disjoint; their instances are classified on a prime function basis. Metadata extractor encompasses the tools that enable parsing, identification and extraction of metadata, ranging from general-purpose extractors like Apache Tika [23] to metadata specific ones, like Nanopub JupyterLab Extension [24], which helps to extract Nanopublications from a python notebook. The metadata tracker category is strictly linked with the R1.2 principle since it consists of tools that allow metadata collection in concomitance with the data workflow, including information on data acquisition, generation and processing. While the functionalities of these tools may vary, all of them share the capability of enabling an environment that fosters traceability and reproducibility at least at some point during the data lifecycle. The category includes nbcomet [25], which is a Jupyter notebook extension that regularly saves snapshots and logs every action performed in the notebook, but also workflow managers like Taverna [26] and data management systems like openBIS (open Biology Information System) [27], that automates (meta)data ingestion while providing data provenance tracking and which is designed to be integrated with workflow managers. Metadata validators check the conformance of a resource to a metadata standard, contributing to the enablement of the I1 and R1.3 principles. This category encompasses single-standard oriented validators, for instance (a) the METS Validator [28] provided by the Finnish Digital Preservation Service for Research Data, that checks a METS file against the national digital preservation specification, or (b) the CF (Climate and Forecasts) Checker [29], that validates a netCDF file against the CF metadata conventions, and multi-format checkers like OCTOPUS [30], which validates against the SeaDataNet ODV, netCDF and MedAtlas standards. The latter is also an example of a multifunctional tool, combining a validator, a converter from and to SeaDataNet standards, an editor for splitting a SeaDataNet file and an extractor. The last subclass of the metadata helper category is metadata assistant. This category includes tools that help insert metadata based on controlled vocabularies, contributing to the standardisation of the descriptions produced and, consequently, the fulfilment of the R1.3 principle. While some metadata editors include an auto-completion feature, the only instance found in the examined literature that is dedicated to this task is CEDAR OnDemand [31], a browser extension that helps standardise repository descriptions by recognising the input fields and suggesting the relevant terms from NCBO (National Center for Biomedical Ontology) BioPortal vocabularies. 3.2.3. Indexing and discovery service The Indexing and Discovery service category is by far the most populated category in the taxonomy, consisting of 123 entries distributed in three subclasses; (i) Registry (54 entries); (ii) Repository (67 entries); and (iii) Indexing and discovery service finder (4 entries). It includes all services and tools used for indexing metadata, for discovering the related resources, ultimately enabling the F4 principle and covering the whole accessibility, as well as those that allow the indexing and discovery of the services themselves (e.g. re3data [32] enabling data repositories discovery). These tools are characterised by the modalities of access to the resources they provide and the type of resources managed. By modality of access, it is possible to distinguish between repositories (e.g. Zenodo [33] or FigShare [34]), which store, index and allow the discovery of resources, and registries (e.g. BARTOC.org [35] or bio.tools [36]), which index metadata and allow the discovery of resources from different repositories. Registries do not store resources on their own rather, they refer to the specific repositories for access and ultimately constitute a catalogue consisting of the metadata describing the resources. The distinction between repositories and registries become blurred when referring to semantic artefacts [37]. In this case, the metadata registry and metadata repository categories tend to overlap since metadata schemas can be stored in a database by registering the element sets and their constituting elements. Still, it was adopted as a distinguishing characteristic of a semantic artefact repository the access to single elements of a semantic artefact. Based on this distinction, the above-mentioned BARTOC.org (Basic Register of Thesauri, Ontologies & Classifications) is considered a registry for semantic artefacts. At the same time, the NERC Vocabulary Server (NVS) [38] is categorised as a semantic artefact repository. The analysis showed that there are four types of resources managed by indexing and discovery services, namely (i) data, (ii) semantic artefacts, defined as “machine-actionable and -readable formalisation of a conceptualisation, enabling sharing and reuse by humans and machines” [37] (e.g. thesauri, ontologies), (iii) software, and (iv) workflows. It is possible to specialise the repository and registry classes further to highlight the types of resources they focus on, although catch-all ones exist. WorkflowHub [39] is an example of a registry dedicated to computational workflows, while GitHub [40] is a representative case of a software repository. Zenodo is, without any doubt, the most cited repository solution in the corpus. It is an example of a catch-all repository accepting all of the mentioned types of resources. By assigning a DOI to every registered resource, it is also used as a FAIRifying solution in combination with other services like GitHub, which does not reserve a GUPRI for the deposited code, or ARGOS [41], that uses Zenodo for publishing DMPs. 3.2.4. Converter The converter category includes the tools and services that convert data or metadata between models and formats, enabling the transition to community-adopted standards and the combina- tion of resources across different domains and organisations. This class of tools is linked to the I1 and R1.3 principles. It consists of 39 tools that, following the distinction between (meta)data formats and data models, are arranged in two subclasses: (i) data converter (26 entries), transforming data between file formats, enabling R1.3, and (ii) metadata converter (13 entries), that enables I1 and/or R1.3 principles by allowing the transformation to and between knowledge representation languages. Elements of the first subclass are ImageMagick, which among its functions allows converting images from and to a multitude of formats, and Tabula [42], which transforms data rows in a textual PDF file into a CSV. Examples of the second subclass are OpenRefine and the similar excel2rdf [43]. The latter tool allows the conversion of an Excel-based vocabulary to a SKOS RDF one, and between metadata formats, like CMD2DC [44], which transforms the CMD resource descriptions used by the CLARIN’s Component Metadata Infrastructure (CMDI) to Dublin Core. 3.2.5. Licence helper This category, consisting of 4 entries, includes the tools that help choose a licence for a resource by answering a questionnaire, , facilitating the R1.1 principle. The questionnaire can be more or less detailed, depending on the number of licences taken into consideration, including the type of resource, actual ownership, identification, use and distribution requirements, and providing in some instances the possibility to obtain a machine-readable version of it. For example, the Creative Commons’ License Chooser [45] lets users select the most appropriate licence among the six Creative Commons licence types. In contrast, the EUDAT B2SHARE license selector [46] allows users to choose among twenty-two different licences, also distinguishing between software and data-oriented licences. Both produce a machine-readable version of the chosen licence, the first in XMP format, the latter in JSON. 3.2.6. Assessment tool The assessment of the FAIRness of a resource is not strictly a FAIRifying function per se. It does not contribute to implementing any of the fifteen guiding lines. However, assessment tools allow the uptake of the FAIR principles by validating the overall conformity of a resource to a set of criteria or metrics. Based on the clarity, granularity and measurability of the implemented metrics, and ultimately on their machine-actionability, the evaluation process can be automated or manual. In both cases, it consists in following a questionnaire-like approach. For automated tools, the objectivity of the evaluation criteria allows effective feedback on the resource FAIRness. For manual tools, being a self-assessment, it is more a matter of encouraging the data curators’ awareness of the FAIR principles. Tools and services in this category are consequently cross-principles and specialised into (i) Automated assessment tool (4 entries), when the machine-accessible metadata of the resource is automatically compared with predefined metrics following the submission of the GUPRIs identifying the resources; (ii) Manual assessment tool (7 entries), if the FAIRness score of a resource is based on manually filling in a questionnaire; and ( (iii) Assessment tool finder (1 entry), encompassing the services dedicated to the discovery of assessment tools. F-UJI and FAIR-Aware [47] respectively exemplify well the first two categories. Developed by FAIRsFAIR and based on fifteen core metrics that, being aligned with the FAIR principles and the CoreTrustSeal requirements, systematically measure the extent to which research data objects are FAIR. F-UJI is an automated assessment tool that evaluates the FAIRness of datasets against the FAIRsFAIR Data Object Assessment Metrics, according to the aggregated resource metadata, through GUPRI creation and management services and repository indexing and discovery services. FAIR-Aware helps researchers understand how to increase the FAIRness of a dataset before depositing it in a repository by a ten-step questionnaire. Finally, the assessment tool finder category encompasses the indexing and discovery services dedicated to the FAIR assessment tool, with FAIRassist [48] (developed as a component of FAIRsharing [49]) as the only FAIR assessment tool registry found in the examined corpus. 3.2.7. DMP tool As previously mentioned, the inclusion in the presented taxonomy of a DMP tool category may be questionable since its services do not act directly on the resources. However, we decided to incorporate it because, by creating a reference for the resource lifecycle management, establishing how the resources have to be stored, curated, shared and preserved, these tools affect the FAIRness of a resource, particularly its reusability [50, 51]. This class includes eight services that noticeably share the same main function of provid- ing step-by-step guidance in creating a data management plan by filling in annotated forms. Still, they may vary appreciably in additional functions, for instance by supporting machine actionability, allowing to simultaneously collaborate to the realisation of the DMP, offering customisable templates or providing a platform to share them. For example, the ARGOS tool developed by OpenAIRE and based on OpenDMP allows the collaborative creation of a machine- actionable DMP by supporting its versioning, the assignment of a licence and a DOI (through Zenodo) and its publishing. They can also vary in domain coverage like the ARIADNEplus DMP Researcher Template for Archaeological Datasets that is manifestly domain-specific. 4. Discussion The analysis conducted in this study highlights how FAIRness approaches are envisaged and implemented in recent projects and initiatives. Although it is not possible to claim that the conducted research is exhaustive of the topic of FAIRness, the implemented methodology guarantees high coverage concerning FAIRness initiatives and related tools. The problem addressed by the study is intrinsically complex because interpretations of FAIR principles and solutions often depend on settings characterising the application context. A deeper analysis would need to complement the knowledge obtained from the selected documents with details collected “in the field” on how the various communities have practically decided to organise themselves to implement the FAIR principles. The “interpretability” of the FAIR principles [6] introduces vagueness making it challenging to compare diverse experiences and uses of the tools. The continuous evolution represents another element to be taken into account. Communities’ plans and approaches might be rethought to meet better FAIR principles and expectations resulting from early attempts to make data (more generally, resources) FAIR and to exploit released data. As a consequence of these evolutions, new tools may emerge. The nature and heterogeneity of the examined material made it possible to observe trends that would otherwise be difficult to appreciate. There is a tendency to include concepts that are not strictly pertinent to the FAIR principles, although related. For example, accessibility often overlaps with open accessibility, a concept related to open access that does not find a direct acknowledgement in the FAIRness since FAIR data does not mean open data [7]. Open accessibility is linked more to the licences associated with the resources, hence to reusability, than to accessibility itself, which is defined without mentioning open access. Similarly, it is likely to see the concept of reusability intuitively associated with preservation, pursued by depositing the resources into a repository, even if the latter is not mentioned in the definition of reuse given by the principles. Privileging certain elements of FAIRness over others is another clear trend. For instance, in the case of reuse point R1.1, and therefore the aspect related to the licence, it is mentioned while neglecting the others. It is also possible to see unexpected tools mentioned in a FAIRness context, e.g. tools linked to collaboration or dissemination, including content management systems, wikis, infrastructures and journals. By not specifying implementations, the fifteen points in which the guidelines are articulated remain substantially open to interpretation [52] from which this study is no exception. The categories in which the taxonomy is structured result from an analysis of the FAIR principles, which led us to create an equivalence relation between the tool functions and the fifteen guidelines. While the F2 principle states that the concept of rich metadata is defined in R1, thus creating a direct link between findability and reusability, our interpretation of the F2 principle limits the “rich metadata” only to those that allow to find a resource and to distinguish it from similar ones. In fact, it is arguable that metadata enabling findability should be a subset of those contributing to the reusability of a resource and as such the minimum requirements for its description, that should be defined on a community basis. This view is based on a vision of the FAIR principles as incremental tasks mainly build upon the metadata associated with the resources. Therefore, the F2-related tools and services are not always associated also with reusability. Similarly, the distinction between tools enabling I1 and those that foster R1.3 is founded on the difference between knowledge representation languages, including their serialisations, and metadata formats. Following this rationale, a tool enabling machine-actionability, without specifying the metadata standard used, is considered just I1-related. Moreover, some principles are just cited as a general reference, as in the case of accessibility, or are not associated with any tool, such as I2. That is because accessibility depends directly on F1 and on the protocols and policies implemented by registries and repositories rather than a category of tools, while I2 introduces a recursive element among the principles without creating the need for additional categories. An interesting point highlighted by this study, albeit not unexpected given the number of different implementation contexts and the inherent complexity of the matter, is the lack of unique solutions for ensuring the FAIRness of resources. Even the most integrated solutions found in the corpus (for example, Fairdata.fi [53], which is a nationwide solution promoted by the Finnish Ministry of Education and Culture providing services for storing, describing, indexing, searching and publishing research data) can not cover all of the specific needs arising from the different resource types and scientific workflows. The need to support the sharing of FAIRness implementation experiences and solutions is actual, and concrete approaches are needed, as demonstrated in [52, 54]. The FAIR Convergence Matrix [52] is a collaborative online resource aiming at creating machine-actionable descriptions of FAIR implementation choices made by different domain communities. The concept of FAIR Implementation Profile [54] aims at capturing by a FAIR object itself the comprehensive set of implementation choices made at the discretion of individual communities of practice. These tools are envisaged to be used to track the evolving landscape of FAIR implementations and inform about them. 5. Conclusion and Prospects Several initiatives are promoting the FAIRification of science assets to improve their findability, accessibility, interoperability and reusability for both humans and machines. Communities responsible for these assets are implementing diverse strategies and approaches to respond to the FAIR guiding principles. This paper reported the result of a systematic collection and analysis of the tools developed and exploited by scientific communities in their FAIRification activities. A taxonomy organising the various tools into seven major classes is introduced and discussed. This taxonomy is a helpful instrument for understanding the state of the art regarding FAIRification activities, supporting communities of practice confronting with FAIRification activities and helping to develop innovative solutions and strategies improving and filling existing gaps in supporting FAIRification processes. To further develop the taxonomy, it is planned to systematically assess its efficacy in concrete settings by establishing a dialogue with communities of practice and initiatives engaged in FAIRness activities and systematically analysing the FAIRness approaches documented by FAIR Implementation Profiles [54]. Acknowledgments This work has received funding from the European Union’s Horizon 2020 research and innova- tion programme under Blue Cloud project (grant agreement No. 862409), DESIRA project (grant agreement No. 818194), EOSC-Pillar project (grant agreement No. 857650), and SoBigData- PlusPlus (grant agreement No. 871042). Author Contributions According to CRediT taxonomy, authors contributed as follows: DM performed Methodology, Investigation, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization; LC performed Conceptualization, Methodology, Writing - Review & Editing, Supervision, and Funding acquisition; DC performed Conceptualization, Writing - Review & Editing, Supervision, and Funding acquisition. References [1] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez- Beltran, A. J. G. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. C. ’t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S.-A. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M. A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, B. Mons, The FAIR guiding principles for scientific data management and stewardship, Scientific Data 3 (2016) 160018. doi:10.1038/sdata.2016.18. [2] A. Tani, L. Candela, D. Castelli, Dealing with metadata quality: The legacy of digital library efforts, Inf. Process. Manag. 49 (2013) 1194–1205. URL: https://doi.org/10.1016/j.ipm.2013. 05.003. doi:10.1016/j.ipm.2013.05.003. [3] A.-L. Lamprecht, L. Garcia, M. Kuzak, C. Martinez, R. Arcila, E. Martin Del Pico, V. Dominguez Del Angel, S. van de Sandt, J. Ison, P. A. Martinez, P. McQuilton, A. Va- lencia, J. Harrow, F. Psomopoulos, J. L. Gelpi, N. Chue Hong, C. Goble, S. Capella- Gutierrez, Towards FAIR principles for research software, Data Science 3 (2020) 37–59. doi:10.3233/DS-190026. [4] D. S. Katz, M. Gruenpeter, T. Honeyman, Taking a fresh look at FAIR for research software, Patterns 2 (2021) 100222. doi:10.1016/j.patter.2021.100222. [5] C. Goble, S. Cohen-Boulakia, S. Soiland-Reyes, D. Garijo, Y. Gil, M. R. Crusoe, K. Peters, D. Schober, FAIR Computational Workflows, Data Intelligence 2 (2020) 108–121. doi:10. 1162/dint_a_00033. [6] A. Jacobsen, R. de Miranda Azevedo, N. Juty, D. Batista, S. Coles, R. Cornet, M. Courtot, M. Crosas, M. Dumontier, C. T. Evelo, C. Goble, G. Guizzardi, K. K. Hansen, A. Hasnain, K. Hettne, J. Heringa, R. W. Hooft, M. Imming, K. G. Jeffery, R. Kaliyaperumal, M. G. Kersloot, C. R. Kirkpatrick, T. Kuhn, I. Labastida, B. Magagna, P. McQuilton, N. Meyers, A. Montesanti, M. van Reisen, P. Rocca-Serra, R. Pergl, S.-A. Sansone, L. O. B. da Silva Santos, J. Schneider, G. Strawn, M. Thompson, A. Waagmeester, T. Weigel, M. D. Wilkinson, E. L. Willighagen, P. Wittenburg, M. Roos, B. Mons, E. Schultes, FAIR Principles: Interpretations and Implementation Considerations, Data Intelligence 2 (2020) 10–29. doi:10.1162/ dint_r_00024. [7] B. Mons, C. Neylon, J. Velterop, M. Dumontier, L. O. B. da Silva Santos, M. D. Wilkinson, Cloudy, increasingly FAIR; revisiting the FAIR data guiding principles for the european open science cloud, Information Services & Use 37 (2017) 49–56. doi:10.3233/ISU-170824. [8] A. Jacobsen, R. Kaliyaperumal, L. O. B. da Silva Santos, B. Mons, E. Schultes, M. Roos, M. Thompson, A Generic Workflow for the Data FAIRification Process, Data Intelligence 2 (2020) 56–65. doi:10.1162/dint_a_00028. [9] K. Petersen, R. Feldt, S. Mujtaba, M. Mattsson, Systematic mapping studies in software engineering, in: Proceedings of the 12th International Conference on Evaluation and As- sessment in Software Engineering, EASE’08, BCS Learning & Development Ltd., Swindon, GBR, 2008, pp. 68–77. [10] D. Mangione, L. Candela, D. Castelli, A taxonomy of tools and approaches for fairification, 2022. doi:10.5281/zenodo.6037508. [11] OECD, Frascati Manual 2015, Guidelines for Collecting and Reporting Data on Research and Experimental Development„ OECD Publishing, 2015. URL: https://www.oecd-ilibrary. org/content/publication/9789264239012-en. doi:10.1787/9789264239012-en. [12] J. A. McMurry, N. Juty, N. Blomberg, T. Burdett, T. Conlin, N. Conte, M. Courtot, J. Deck, M. Dumontier, D. K. Fellows, A. Gonzalez-Beltran, P. Gormanns, J. Grethe, J. Hastings, J.-K. Hériché, H. Hermjakob, J. C. Ison, R. C. Jimenez, S. Jupp, J. Kunze, C. Laibe, N. Le Novère, J. Malone, M. J. Martin, J. R. McEntyre, C. Morris, J. Muilu, W. Müller, P. Rocca-Serra, S.-A. Sansone, M. Sariyar, J. L. Snoep, S. Soiland-Reyes, N. J. Stanford, N. Swainston, N. Washington, A. R. Williams, S. M. Wimalaratne, L. M. Winfree, K. Wolstencroft, C. Goble, C. J. Mungall, M. A. Haendel, H. Parkinson, Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data, PLOS Biology 15 (2017) 1–18. URL: https://doi.org/10.1371/journal.pbio.2001414. doi:10.1371/journal.pbio.2001414. [13] N. Juty, S. M. Wimalaratne, S. Soiland-Reyes, J. Kunze, C. A. Goble, T. Clark, Unique, Persistent, Resolvable: Identifiers as the Foundation of FAIR, Data Intelligence 2 (2020) 30–39. URL: https://doi.org/10.1162/dint_a_00025. doi:10.1162/dint_a_00025. [14] J. Klump, R. Huber, 20 years of persistent identifiers – which systems are here to stay?, Data Science Journal 16 (2017). doi:10.5334/dsj-2017-009. [15] DataCite, 2022, PID services registry, URL: pidservices.org. [16] Corporation for National Research Initiatives, 2022, Handle.net registry, URL: handle.net. [17] International DOI Foundation, 2022, The doi system. [18] G. Hendricks, D. Tkaczyk, J. Lin, P. Feeney, Crossref: The sustainable source of community- owned scholarly metadata, Quantitative Science Studies 1 (2020) 414–427. doi:10.1162/ qss_a_00022. [19] J. Brase, Datacite - a global registration agency for research data, in: 2009 Fourth International Conference on Cooperation and Promotion of Information Resources in Science and Technology, 2009, pp. 257–261. doi:10.1109/COINFO.2009.66. [20] R. Verborgh, M. De Wilde, Using OpenRefine, Packt, 2013. [21] Svalbard Integrated Arctic Earth Observing System, 2022, Nansen legacy excel template generator, URL: https://sios-svalbard.org/cgi-bin/darwinsheet/index.cgi. [22] Y. Marketakis, N. Minadakis, H. Kondylakis, K. Konsolaki, G. Samaritakis, M. Theodoridou, G. Flouris, M. Doerr, X3ML mapping framework for information integration in cultural heritage and beyond, International Journal on Digital Libraries 18 (2017) 301–319. URL: https://doi.org/10.1007/s00799-016-0179-1. doi:10.1007/s00799-016-0179-1. [23] C. A. Mattmann, J. L. Zitting, Tika in Action, Manning, 2011. [24] R. Richardson, 2021, Nanopubjl, URL: https://github.com/fair-workflows/NanopubJL/tree/ v0.3.0. [25] A. Rule, 2022, nbcomet, URL: https://github.com/activityhistory/nbcomet. [26] T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, P. Li, Taverna: a tool for the composition and enactment of bioinformatics workflows, Bioinformatics 20 (2004) 3045–3054. URL: https://doi.org/10. 1093/bioinformatics/bth361. doi:10.1093/bioinformatics/bth361. [27] A. Bauch, I. Adamczyk, P. Buczek, F.-J. Elmer, K. Enimanev, P. Glyzewski, M. Kohler, T. Pylak, A. Quandt, C. Ramakrishnan, C. Beisel, L. Malmström, R. Aebersold, B. Rinn, openBIS: a flexible framework for managing and analyzing complex data in biology re- search, BMC Bioinformatics 12 (2011) 468. URL: https://doi.org/10.1186/1471-2105-12-468. doi:10.1186/1471-2105-12-468. [28] National digital preservation services, 2022, METS Validation Tool, URL: https://www. digitalpreservation.fi/en/mets-validator. [29] R. Hatcher, 2022, Cf checker, URL: https://github.com/cedadev/cf-checker. [30] SeaDataNet, 2022, OCTOPUS, URL: https://www.seadatanet.org/Software/OCTOPUS. [31] S. A. C. Bukhari, M. Martínez-Romero, M. J. O’Connor, A. L. Egyedi, D. Willrett, J. Graybeal, M. A. Musen, K.-H. Cheung, S. H. Kleinstein, CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata, BMC Bioinformatics 19 (2018) 268. URL: https://doi.org/10.1186/s12859-018-2247-6. doi:10.1186/s12859-018-2247-6. [32] H. Pampel, P. Vierkant, F. Scholze, R. Bertelmann, M. Kindling, J. Klump, H.-J. Goebelbecker, J. Gundlach, P. Schirmbacher, U. Dierolf, Making research data repositories visible: The re3data.org registry, PLOS ONE 8 (2013) 1–10. URL: https://doi.org/10.1371/journal.pone. 0078080. doi:10.1371/journal.pone.0078080. [33] European Organization For Nuclear Research, OpenAIRE, Zenodo, 2013. URL: https:// www.zenodo.org/. doi:10.25495/7GXK-RD71. [34] M. Hahnel, 2022, figshare, URL: figshare.com. [35] J. Waeber, A. Ledl, A semantic web skos vocabulary service for open knowledge or- ganization systems, in: E. Garoufallou, F. Sartori, R. Siatri, M. Zervas (Eds.), Meta- data and Semantic Research, Springer International Publishing, Cham, 2019, pp. 3–12. doi:10.1007/978-3-030-14401-2_1. [36] J. Ison, K. Rapacki, H. Ménager, M. Kalaš, E. Rydza, P. Chmura, C. Anthon, N. Beard, K. Berka, D. Bolser, T. Booth, A. Bretaudeau, J. Brezovsky, R. Casadio, G. Cesareni, F. Cop- pens, M. Cornell, G. Cuccuru, K. Davidsen, G. D. Vedova, T. Dogan, O. Doppelt-Azeroual, L. Emery, E. Gasteiger, T. Gatter, T. Goldberg, M. Grosjean, B. Grüning, M. Helmer-Citterich, H. Ienasescu, V. Ioannidis, M. C. Jespersen, R. Jimenez, N. Juty, P. Juvan, M. Koch, C. Laibe, J.-W. Li, L. Licata, F. Mareuil, I. Mičetić, R. M. Friborg, S. Moretti, C. Morris, S. Möller, A. Nenadic, H. Peterson, G. Profiti, P. Rice, P. Romano, P. Roncaglia, R. Saidi, A. Schaf- ferhans, V. Schwämmle, C. Smith, M. M. Sperotto, H. Stockinger, R. S. Vařeková, S. C. Tosatto, V. de la Torre, P. Uva, A. Via, G. Yachdav, F. Zambelli, G. Vriend, B. Rost, H. Parkin- son, P. Løngreen, S. Brunak, Tools and data services registry: a community effort to document bioinformatics resources, Nucleic Acids Research 44 (2015) D38–D47. URL: https://doi.org/10.1093/nar/gkv1116. doi:10.1093/nar/gkv1116. [37] W. Hugo, Y. Le Franc, G. Coen, J. Parland-von Essen, L. Bonino, D2.5 FAIR Semantics Recommendations Second Iteration, Project Deliverable D2.5, FAIRsFAIR, 2020. doi:10. 5281/zenodo.4314320. [38] British Oceanographic Data Centre, 2022, The NERC Vocabulary Server, URL: https: //vocab.nerc.ac.uk/. [39] R. F. d. Silva, L. Pottier, T. Coleman, E. Deelman, H. Casanova, Workflowhub: Community framework for enabling scientific workflow research and development, in: 2020 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS), 2020, pp. 49–56. doi:10.1109/ WORKS51914.2020.00012. [40] github, GitHub, 2022. URL: github.com. [41] OpenAIRE, 2022, Argos, URL: argos.openaire.eu. [42] M. Aristarán, M. Tigas, J. B. Merrill, 2022, Tabula, URL: ManuelAristarán, MikeTigasandJeremyB.Merrill. [43] J. Graybeal, N. Vasiljevic, 2022, excel2rdf-template, URL: https://github.com/ fair-data-collective/excel2rdf-template. [44] University of Tübingen, 2022, Cmdi to dublin core transformer, URL: https://weblicht.sfs. uni-tuebingen.de/converter/Cmdi2DC/. [45] Creative Commons, 2022, License chooser, URL: https://creativecommons.org/choose/. [46] EUDAT, 2022, License selector, URL: https://github.com/ufal/public-license-selector. [47] A. Devaraju, M. Mokrane, L. Cepinskas, R. Huber, P. Herterich, J. de Vries, V. Akerman, H. L’Hours, J. Davidson, M. Diepenbroek, From conceptualization to implementation: FAIR assessment of research data objects, Data Science Journal 20 (2021). doi:10.5334/ dsj-2021-004. [48] FAIRSharing, 2022, FAIRAssist, URL: fairassist.org. [49] S.-A. Sansone, P. McQuilton, P. Rocca-Serra, A. Gonzalez-Beltran, M. Izzo, A. L. Lister, M. Thurston, the FAIRsharing Community, Fairsharing as a community approach to standards, repositories and policies, Nature Biotechnology 37 (2019) 358–367. doi:10. 1038/s41587-019-0080-8. [50] N. A. Smale, K. Unsworth, G. Denyer, E. Magatova, D. Barr, A review of the history, advocacy and efficacy of data management plans, International Journal of Digital Curation 15 (2020). doi:10.2218/ijdc.v15i1.525. [51] S. Jones, R. Pergl, R. Hooft, T. Miksa, R. Samors, J. Ungvari, R. I. Davis, T. Lee, Data management planning: How requirements and solutions are beginning to converge, Data Intelligence 2 (2020). doi:10.1162/dint_a_00043. [52] H. P. Sustkova, K. M. Hettne, P. Wittenburg, A. Jacobsen, T. Kuhn, R. Pergl, J. Slifka, P. McQuilton, B. Magagna, S.-A. Sansone, M. Stocker, M. Imming, L. Lannom, M. Musen, E. Schultes, FAIR Convergence Matrix: Optimizing the Reuse of Existing FAIR-Related Resources, Data Intelligence 2 (2020) 158–170. doi:10.1162/dint_a_00038. [53] Ministry of Education and Culture, Finland, 2022, Fairdata services, URL: fairdata.fi. [54] E. Schultes, B. Magagna, K. M. Hettne, R. Pergl, M. Suchánek, T. Kuhn, Reusable FAIR implementation profiles as accelerators of FAIR convergence, in: G. Grossmann, S. Ram (Eds.), Advances in Conceptual Modeling, Springer International Publishing, Cham, 2020, pp. 138–147. doi:10.1007/978-3-030-65847-2_13.