<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Ensuring FAIRness in Machine Learning Projects</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Efeoğlu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zongxiong Chen</string-name>
          <email>zongxiong.chen@fokus.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sonja Schimmler</string-name>
          <email>sonja.schimmler@fokus.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Machine Learning</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>FAIR ML</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ML metadata</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer FOKUS</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Freie Universität Berlin</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Technische Universität Berlin</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Subsymbolic approaches like machine learning (ML), deep learning, and Large Language Models (LLMs) have significantly advanced Artificial Intelligence, excelling in tasks such as question answering and ontology matching. Despite their success, the lack of openness in LLMs' training datasets and source codes poses challenges. For instance, some ML-based models do not share training data, limiting transparency. Current standards like schema.org provide a framework for dataset and software metadata but lack ML-specific guidelines. This position paper addresses this gap by proposing a comprehensive schema for ML model metadata aligned with the FAIR (Findability, Accessibility, Interoperability, Reusability) principles. We aim to provide insights into the necessity of an essential metadata format for ML models, demonstrate its integration into ML repository platforms, and show how this schema, combined with dataset metadata, can evaluate an ML model's adherence to the FAIR principles, fostering FAIRness in ML development.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Subsymbolic approaches such as machine learning (ML), deep learning, and recently, Large
Language Models (LLMs) have illustrated outstanding advances in Artificial Intelligence. LLMs
have achieved remarkable results in downstream tasks, such as Question Answering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
Ontology Matching [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and Image Caption Generation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Recent research in these downstream
tasks uses either general-purpose LLMs directly or fine-tunes them with specific datasets.
      </p>
      <p>
        FAIRness plays an important role in the repetition of experiments in scientific research.
Wilkinson et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] propose a guideline that clearly explains the Findability, Accessibility,
Interoperability, and Reusability (FAIR) principles. To satisfy these principles, we should ensure
that approaches using LLMs provide their weights, source codes along with their settings, and
datasets. In recent research, an ML-ready metadata format for datasets based on the dataset
schema as part of schema.org called Croissant 1 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is proposed. The integration of it as a plugin
in the Hugging Face platform provides metadata about the datasets used in ML models, such as
nEvelop-O
their validation, train and test splits, size, and description. Additionally, to reproduce machine
learning research, Vanschoren et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] recommend that three requirements should be fulfilled
all together: (i) open source software, (ii) open data, and (iii) open access paper. Therefore,
ML-based models cannot be evaluated solely by checking if their software or datasets are open
separately. To determine whether they are truly open or not, we should take into account all
their components comprehensively.
      </p>
      <p>
        1. Datasets: The Croissant metadata format has been developed for ML-ready datasets
based on the property Dataset 2 under CreativeWork at schema.org in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The Croissant
metadata format provides all metadata necessary for using a dataset in an ML model
and has been integrated into Hugging Face 3, a well-known platform in the ML field.
Challenges: The dataset schema alone is not suficient for evaluating ML models from a
FAIRness perspective. Additionally, Raza et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] introduce a pipeline for LLM FAIRness
in terms of datasets. However, having solely open datasets is insuficient to achieve
FAIRness in the development of ML models.
2. Source Code: With regard to source code, schema.org provides schemas under
CreativeWork for Software Application 4 and Source Code5. Challenges: It cannot be directly used
for ML-based software and should be extended with configuration and evaluation results,
such as metrics for ML-based software. The performance of the ML models on a dataset
is measured with metrics such as F1 score and precision, depending on the specific tasks.
      </p>
      <p>
        This should also be included in the ML schema to adhere to the FAIR principles.
3. Models: The model weights are learned during the training process. To facilitate the
repetition of a model’s evaluation on the test split of a dataset, the model must be
open and accessible [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This transparency ensures that the evaluation process can
be independently verified and replicated. Challenges: Since ML models consist of
mathematical functions, e.g. normal or uniform distribution for model weight initialization
and various sampling strategies for post-processing, the evaluation results might vary.
To ensure clarity and reproducibility in experimental models, it is essential to explicitly
state the mean, maximum, and minimum metrics used in the experiments, as well as the
standard error and the number of repetitions for each experiment [
        <xref ref-type="bibr" rid="ref6 ref8">6, 8</xref>
        ]. Unfortunately,
schema.org currently lacks metadata to represent these crucial details in its existing
framework.
      </p>
      <p>
        MLDCAT-AP has recently introduced the Machine Learning Model entity 6; however, it
lacks specifications for the hardware requirements crucial for ensuring the reusability of ML
projects. There is no complete schema defined to represent all metadata (or terminology) of ML
projects such as hardware requirements [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In this position paper, we aim to provide insights
into the necessity of metadata for the ML projects. We further provide first guidelines on how
      </p>
      <sec id="sec-2-1">
        <title>2Dataset of schema.org: https://schema.org/Dataset</title>
        <p>
          3Hugging Face: https://huggingface.co/
4Software Application: https://schema.org/SoftwareApplication
5Software Source Code: https://schema.org/SoftwareSourceCode
6MLDCAT-AP: https://semiceu.github.io/MLDCAT-AP/releases/2.0.0/#MachineLearningModel
to evaluate the ML project according to the Findability, Accessibility, Interoperability, and
Reusability (FAIR) principles [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] within a layered architecture in Section 2.2. In the remainder
of this paper, we will first present existing metadata for ML models in Section 2.1, and then
we discuss a layered architecture for an ML model metadata format in Section 2.2. Finally, we
summarize the benefits and future directions for FAIRness in ML projects in Section 3.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. FAIRness for Machine Learning</title>
      <p>This section first evaluates and discusses existing metadata related to machine learning
(ML) models, as presented in Section 2.1. Subsequently, it proposes a layered architecture for
ML projects and explores its potential integration into an ML repository platform, as detailed
in Section 2.2.</p>
      <sec id="sec-3-1">
        <title>2.1. Existing Metadata for Machine Learning Models</title>
        <p>This section first discusses which metadata can be included from Software Source Code and
Software Application schemas of schema.org. Afterwards, we discuss what metadata available
in Hugging Face model cards can be utilized.</p>
        <p>Schema.org
Schema.org has already proposed vocabulary for Software Application and Source Code under
CreativeWork. However, ML source codes include unique configurations that diferentiate them
from regular software source code. Therefore, these terms are not suficient to represent the
metadata of ML models, and an extension of their schema is required. We tried to classify
which metadata from the SoftwareSourceCode and SoftwareApplication should be included in
an ML schema, as can be seen in Table 1. However, there is no metadata in these schemas
to represent the configurations of ML models, e.g., hyperparameters, evaluation metrics, and
datasets. Therefore, these SoftwareSourceCode and SoftwareApplication schemas are not rich
enough to represent the metadata of ML models.</p>
        <p>Hugging Face Model Cards
Hugging Face provides three ways for including model-specific metadata in a model card 7:
(i) using the metadata user interface (UI) illustrated in Figure 1, (ii) editing the YAML section
of the README.md file in a model card and (iii) via the huggingface_hub 8 Python library.
We aim to combine the metadata available in the UI of Hugging Face (See Figure 1) with
model configurations from the README in the model card. This README might include
configurations such as hardware requirements, hyperparameters, and platform details for
reproducibility. Solely taking into account the metadata UI is insuficient to determine if the
model is reproducible. Additionally, we examined what metadata is semantically similar to the
metadata in SoftwareSourceCode and SoftwareApplication schemas in Table 2.</p>
        <sec id="sec-3-1-1">
          <title>7https://huggingface.co/docs/hub/model-cards#model-card-metadata 8 https://huggingface.co/docs/huggingface_hub/index</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. A Layered Architecture for Machine Learning Projects</title>
        <p>
          We propose a layered architecture for an ML model metadata format inspired by Croissant [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
to support both datasets and ML model source code across the following four layers:
Metadata Layer: This layer contains general information about the dataset and the ML model
source code with its settings, including its name, description, runtime requirements, software
requirements, and license. Both Table 2 and Figure 1 provide information about the existing
metadata available in the schema.org format and on the Hugging Face platform.
Resources Layer: This layer describes dataset resources, the source code of the ML model,
and the learnt model weights obtained during the training process using the dataset.
Structure Layer: This layer describes and organizes the structure of the resources, adopting the
data structure defined in the Croissant ML-ready format [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Additionally, the configuration of
the ML model, including the neural network architecture and corresponding hyperparameters
(e.g., learning rate and weight decay), is detailed in this layer. The aim of the model
configuration, including hyperparameters, is to support various ML frameworks, such as PyTorch [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
and TensorFlow [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], allowing for flexible network instantiation and evaluation. These ML
frameworks are described as software requirements and are defined by library names on the
Hugging Face Model Card (see Figure 1 and Table 2).
        </p>
        <p>Semantic Layer: This layer bridges ML-specific data and model interpretations with semantics,
describing the required metadata. The Semantic Layer details the requirements of ML models
to repeat experiments, the datasets used in evaluations, tasks like text-to-text generation, and
evaluation results, including metrics and their detailed values. It is designed to be extendable,
catering to the evolving needs of the ML community and supporting domain-specific application
endpoints.</p>
        <p>Consequently, the aforementioned layers propose a comprehensive framework to evaluate
the FAIRness of ML models within an ML project. Each layer defines a diferent perspective of
the ML project with respect to its resources in terms of FAIRness. The ML project that provides
metadata for these layers can be easily evaluated using various FAIR evaluation criteria, such as
the evaluation system defined in [ 11].</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Conclusion</title>
      <p>In this paper, we propose a machine learning (ML) model metadata format to enhance the
FAIRness of ML models, addressing key challenges in adhering to FAIR principles, particularly
in sharing datasets, source code, and the ML model metadata. We expect this format to evolve
based on user feedback and the rapidly changing needs of the ML field, which is shaping the
future of artificial intelligence. Additionally, we ofer insights into the development of a layered
architecture for ML project metadata, focusing on distinct aspects of evaluating FAIRness.
The format also introduces primitives for linking the ML project metadata across existing
vocabularies, fostering interoperability. We plan to extend the MLDCAT-AP 9 by incorporating
the metadata outlined here to support the representation of ML projects on platforms like
Hugging Face 10.</p>
      <sec id="sec-4-1">
        <title>9MLDCAT-AP: https://semiceu.github.io/MLDCAT-AP/releases/2.0.0/ 10HuggingFace Models: https://huggingface.co/models</title>
        <p>This work has been funded by the German Research Foundation (DFG) under project number
460234259 (NFDI4DataScience).
X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.</p>
        <p>URL: https://www.tensorflow.org/, software available from tensorflow.org.
[11] B. Wentzel, F. Kirstein, T. Jastrow, R. Sturm, M. Peters, S. Schimmler, An extensive
methodology and framework for quality assessment of dcat-ap datasets, in: I. Lindgren,
C. Csáki, E. Kalampokis, M. Janssen, G. Viale Pereira, S. Virkar, E. Tambouris, A. Zuiderwijk
(Eds.), Electronic Government, Springer Nature Switzerland, Cham, 2023, pp. 262–278.</p>
        <sec id="sec-4-1-1">
          <title>Metadata for Models</title>
          <p>codeRepository
codeRepository, url, acquireLicensePage, license
accessibilityAPI
codeRepository, programmingLanguage, runtimePlatform,
softwareRequirements, memoryRequirements</p>
          <p>Description
any valid license identifier
list of ISO 639-1 code for your language
used libraries in models
domain used in models
datasets listed under models, e.g., imdb
evaluation metrics for a model, along with their values
indicating the type of task the model is intended for, e.g.,
Question Answering
model’s evaluation results</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Tafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <article-title>Leveraging llms in scholarly knowledge graph question answering</article-title>
          ., in: QALD/SemREC@ ISWC,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          , Olala:
          <article-title>Ontology matching with large language models</article-title>
          ,
          <source>in: Proceedings of the 12th Knowledge Capture Conference</source>
          <year>2023</year>
          ,
          <year>2023</year>
          , pp.
          <fpage>131</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bianco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Celona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Donzella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Napoletano</surname>
          </string-name>
          ,
          <article-title>Improving image captioning descriptiveness by ranking and llm-based fusion</article-title>
          ,
          <source>arXiv preprint arXiv:2306.11593</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bouwman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Brookes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Crosas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Dillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Dumon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Edmunds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Evelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Finkers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>GonzalezBeltran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J. G.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Goble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Grethe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heringa</surname>
          </string-name>
          , P. A. C. '
          <string-name>
            <surname>t Hoen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hooft</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Kuhn</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kok</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kok</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          <string-name>
            <surname>Lusher</surname>
            ,
            <given-names>M. E.</given-names>
          </string-name>
          <string-name>
            <surname>Martone</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mons</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          <string-name>
            <surname>Packer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Persson</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rocca-Serra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Roos</surname>
            , R. van Schaik,
            <given-names>S.-A.</given-names>
          </string-name>
          <string-name>
            <surname>Sansone</surname>
            , E. Schultes,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Sengstag</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Slater</surname>
            , G. Strawn,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Swertz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>J. van der</given-names>
          </string-name>
          <string-name>
            <surname>Lei</surname>
            , E. van Mulligen,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Velterop</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Waagmeester</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Wittenburg</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Wolstencroft</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mons</surname>
          </string-name>
          ,
          <article-title>The FAIR Guiding Principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          )
          <article-title>160018</article-title>
          . URL: https://www.nature.com/articles/sdata201618. doi:
          <volume>10</volume>
          .1038/sdata.
          <year>2016</year>
          .
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Benjelloun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Conforti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gijsbers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Giner-Miguelez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kuchnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Marcenac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maskey</surname>
          </string-name>
          , et al.,
          <article-title>Croissant: A metadata format for ml-ready datasets</article-title>
          ,
          <source>in: Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>M. L. B. Joaquin Vanschoren</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          <string-name>
            <surname>Ong</surname>
          </string-name>
          , Open Science in Machine Learning,
          <source>Implementing Reproducible Research</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Raza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghuge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Dolatabadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pandya</surname>
          </string-name>
          ,
          <article-title>Fair enough: How can we develop and assess a fair-compliant dataset for large language models' training?</article-title>
          ,
          <year>2024</year>
          . URL: https: //arxiv.org/abs/2401.11033. arXiv:
          <volume>2401</volume>
          .
          <fpage>11033</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>O. E.</given-names>
            <surname>Gundersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kjensmo</surname>
          </string-name>
          ,
          <article-title>State of the art: Reproducibility in artificial intelligence</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>32</volume>
          (
          <year>2018</year>
          ). URL: https: //ojs.aaai.org/index.php/AAAI/article/view/11503. doi:
          <volume>10</volume>
          .1609/aaai.v32i1.
          <fpage>11503</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          , et al.,
          <article-title>Pytorch: An imperative style, high-performance deep learning library</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Brevdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Citro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Devin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghemawat</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Harp</surname>
          </string-name>
          , G. Irving,
          <string-name>
            <given-names>M.</given-names>
            <surname>Isard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jozefowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kudlur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Levenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mané</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Monga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Murray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Olah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schuster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shlens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Talwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tucker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vanhoucke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vasudevan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Viégas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Warden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wattenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wicke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>