<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>D. Solanki);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>MLentory, an FDO registry for machine learning models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dhwani Solanki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nelson Quiñones</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dietrich Rebholz-Schuhmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leyla Jael Castro</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Cologne</institution>
          ,
          <addr-line>Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ZB MED Information Centre for Life Sciences</institution>
          ,
          <addr-line>Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Here we introduce MLentory, an FDO registry for Machine Learning models and their corresponding workflows, from creation to deployment. MLentory relies on FAIR Digital Objects (FDOs) to improve Findability, Accessibility, Interoperability, and Reusability while also improving reproducibility and transparency. MLentory aggregates, harmonizes and FAIRifies data from various ML model and model-related repositories and platforms. Here we present the initial architecture for data extraction, transformation, and loading.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Machine learning models</kwd>
        <kwd>FAIR</kwd>
        <kwd>FDOs</kwd>
        <kwd>registry1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Background</title>
    </sec>
    <sec id="sec-2">
      <title>2. MLentory</title>
      <p>
        MLentory aims at providing a registry (aka directory, inventory) for ML models and
corresponding workflows, from creation to deployment. MLentory relies on FAIR Digital
Objects [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to improve Findability, Accessibility, Interoperability, and Reusability (FAIR
layer in FDOs) while also improving reproducibility and transparency (operations layer in
FDOs). It will rely on metadata agreements reached by the aforementioned communities,
mapped (whenever needed) to schema.org [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a lightweight approach to semantics already
considered by the scientific community for datasets and software. Here we introduce
MLentory architecture together with an initial proposal for ML models metadata based on
schema.org. MLentory relies on data harvesting from third-party platforms, with
aggregation and harmonization modules for the final shape of the ML model FDOs. A
scheduler is available to keep the inventory continuously updated. The data storage
corresponds to an RDF graph with an ElasticSearch module for indexing and
communication with the front-end interface and corresponding RESTful services.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusions and future work</title>
      <p>Here we have outlined the MLentory architecture to collect, aggregate and harmonize
reporting of ML models together with the initial consideration for a possible metadata
schema. We aim to share our framework to improve ML model metadata, paving the way
for more robust and transparent ML practices.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements References</title>
      <p>This work has been partially supported by NFDI4DataScience, a consortium funded by the
German Research Foundation (DFG), project number 460234259.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Mitchell</surname>
            <given-names>M</given-names>
          </string-name>
          , et al.
          <article-title>Model Cards for Model Reporting</article-title>
          .
          <source>Proceedings of the Conference on Fairness, Accountability, and Transparency</source>
          .
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .1145/3287560.3287596
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bender</surname>
            <given-names>EM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
            <given-names>B</given-names>
          </string-name>
          .
          <article-title>Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science</article-title>
          .
          <article-title>Transactions of the Association for Computational Linguistics</article-title>
          .
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00041</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>De Smedt</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koureas</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wittenburg</surname>
            <given-names>P. FAIR</given-names>
          </string-name>
          <article-title>Digital Objects for Science: From Data Pieces to Actionable Knowledge Units</article-title>
          .
          <source>Publications</source>
          .
          <year>2020</year>
          ;
          <volume>8</volume>
          :
          <fpage>21</fpage>
          . doi:
          <volume>10</volume>
          .3390/publications8020021
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Guha</surname>
            <given-names>RV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brickley</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macbeth</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Schema</surname>
          </string-name>
          .
          <article-title>org: evolution of structured data on the web</article-title>
          .
          <source>Communications of the ACM</source>
          .
          <year>2016</year>
          ;
          <volume>59</volume>
          :
          <fpage>44</fpage>
          -
          <lpage>51</lpage>
          . doi:
          <volume>10</volume>
          .1145/2844544
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>