<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Study of the Categories used in 'Papers with Code'</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jenifer Tabita Ciuciu-Kiss</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Garijo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Politécnica de Madrid, Boadilla del Monte</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>An increasing number of machine learning developers share research software online to support their scientific investigations. In order to improve software findability, the scientific community has developed domain-specific taxonomies. However, are these taxonomies appropriate for software classification? This paper explores this question through a case study on Papers with Code, a popular platform where authors share their publications together with their software implementations. We define and apply a comparative framework with state-ofthe-art text similarity techniques (TF-IDF, Sentence-BERT, CLIP), and we assess the level of overlap between diferent software categories defined in the platform, based on the methods descriptions contained in them. Our results show significant category overlap, which may limit the efectiveness of classification algorithms. While community-defined categories provide a useful foundation, they may require refinement, such as subcategories or refined definitions, to better capture interdisciplinary methods and improve classification accuracy.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Research Software Classification</kwd>
        <kwd>Clustering Quality Analysis</kwd>
        <kwd>FAIR</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In parallel with the adoption of the Findable, Accessible, Interoperable and Reusable (FAIR) guiding
principles [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], research software [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has gained increasing recognition as a first-class research output [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Classification of research software is key for supporting findability, improving the discoverability of
software tools in scientific research, and promoting the reuse of existing solutions. With the exponential
growth in the number of software tools available, the process of finding the most appropriate and
relevant software has become more challenging for researchers [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        A well-structured taxonomy is essential to ease software findability, as it provides an agreed
framework that organizes software into distinct common categories based on functionality, domain, or other
relevant characteristics. This ensures that both researchers and automated systems can efectively filter
and compare tools, making valuable research software easier to locate and apply in diverse contexts. To
this end, diferent communities have proposed various taxonomies [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ] of their own for manual
or curated research artifact classification. However, it is often unclear whether the choice of selected
categories is appropriate for research software classification (i.e., are two categories too similar or
redundant?).
      </p>
      <p>
        In this paper, we examine this issue through a case study on Papers with Code [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a popular platform
designed to capture scientific articles and their corresponding implementations in the Machine Learning
domain. Papers with Code contains a crowdsourced software taxonomy with hundreds of diferent
software categories, which have been used to feed several existing methods for research software
classification [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8, 9, 10, 11</xref>
        ]. Our contributions include: 1) a methodology for evaluating the coherence
and separability of research software categories, including how the research software categories were
analyzed using text embeddings and clustering techniques; 2) the results of the case study, which aims to
address whether the level of noise in the categories afects their suitability for classification, highlighting
the extent of category overlap and its impact on clustering performance. The implementation of our
2nd International Workshop on Natural Scientific Language Processing and Research Knowledge Graphs (NSLP 2025), co-located
with ESWC 2025, June 01–02, 2025, Portorož, Slovenia
$ jenifer.ciuciu-kiss@alumnos.upm.es (J. T. Ciuciu-Kiss); daniel.garijo@upm.es (D. Garijo)
 https://jeniferciuciukiss.com/ (J. T. Ciuciu-Kiss); https://dgarijo.com/ (D. Garijo)
0000-0002-3170-6730 (J. T. Ciuciu-Kiss); 0000-0003-0454-7145 (D. Garijo)
      </p>
      <p>© 2025 Copyright © 2025 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>Data preparation</p>
      <p>Data Collection
Papers with Code
- Method Name
- Method Description</p>
      <p>dataflow
analysis step
technical detail
step input/output</p>
      <p>Vectorization</p>
      <p>Text Embeddings
- TD-IDF
- Sentence BERT
- CLIP</p>
      <p>Cluster quality analysis</p>
      <p>Evaluation metrics
- Silhouette Score
- Calinski-Harabasz Index
- Davies-Bouldin Index</p>
      <p>
        Visualization
Dimensionality reduction
- T-SNE
Methods
Visualization
analysis and dataset is available on GitHub [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. 1
      </p>
      <p>The remainder of the paper is structured as follows. Section 2 reviews existing approaches for
assessing category similarity and taxonomies for research software classification. Section 3 describes
the dataset, text embedding techniques, and clustering quality analysis metrics used to evaluate category
coherence. Section 4 presents the clustering quality analysis and visualization, highlighting both the
separability and overlap of community-defined categories. Finally, Section 5 summarizes our key
ifndings.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Various tools have been proposed to assess category similarity in research software classification and
knowledge organization. Ontology alignment tools compare structured taxonomies through category
labels, descriptions, and hierarchical structures to compute similarity scores [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. Knowledge
graphbased approaches leverage structured data and embeddings to identify conceptual relationships between
research topics and software categories [
        <xref ref-type="bibr" rid="ref5">5, 15</xref>
        ].
      </p>
      <p>
        Research software taxonomies help structure classification systems for retrieval and organization.
For example, the Computer Science Ontology (CSO) [16] has been proposed to structure scientific
publications and overlaps with software-related topics. The Software Ontology (SWO) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Bio.tools
[17] provide domain-specific software categorization, particularly for biomedical applications using the
EDAM ontology [17].
      </p>
      <p>
        More general taxonomies include Papers with Code [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which categorizes software implementations
in Machine Learning and explicitly links research papers to their implementations [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Additionally,
Science Knowledge Graphs, such as the AI Knowledge Graph (AI-KG) [18] and OpenAIRE [19],
contain software entities but focus primarily on documenting relationships between scientific concepts,
publications, and datasets.
      </p>
      <p>Despite these eforts, little work has systematically evaluated whether community-defined research
software categories align with natural groupings in category definitions. Existing taxonomies are often
created based on expert knowledge rather than empirical validation, raising questions about their
efectiveness for classification. This study aims to bridge this gap by assessing the coherence of research
software categories using text embeddings and clustering techniques.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>1https://github.com/kuefmz/pow_categories/tree/main
a given set of research software categories (e.g., community-defined) provide a foundation for software
classification.</p>
      <sec id="sec-3-1">
        <title>3.1. Data sources</title>
        <p>We adopt the Papers with Code platform, which has emerged as a key resource within the research
community, particularly in machine learning and artificial intelligence. This platform integrates research
publications with their respective software implementations, ofering a holistic approach that bridges
the gap between research and application. Their mission is: "to create a free and open resource with
Machine Learning papers, code, datasets, methods and evaluation tables."2</p>
        <p>In addition, Papers with Code categorizes paper-code links manually into diferent categories. Due
to its popularity and widespread use, it provides access to a manually curated set of categories and
their descriptions, which is the focus of our study. This manual curation ensures the high quality and
relevance of categories, which significantly aids in exploring the research landscape.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset</title>
        <p>As shown in Figure 1, the dataset was collected following the data model shown in Figure 2 to create
the ‘Methods Dataset’. The dataset is organized around methods associated with specific research
areas (e.g., Computer vision, Natural language processing). Each entry in this dataset includes a
method name and a detailed description, annotated under a particular research area by the community.
This structured organization provides a taxonomy of methods, allowing us to examine whether these
community-defined categories naturally form coherent clusters when represented through textual
attributes.</p>
        <p>The dataset contains 1,064 methods sourced from the Papers with Code platform. Each method
is categorized into a specific research area: Computer vision (665 methods), Natural language
processing (119 methods), Graphs (104 methods), Reinforcement learning (88 methods), Sequential
(53 methods), and Audio (35 methods). Each entry includes the name, description, and associated
research area of a method. The dataset was retrieved from Papers with Code on October 12, 2024, using
a publicly available JSON file. 3</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Vectorization</title>
        <p>To convert textual attributes into numerical representations suitable for machine learning models, we
employ and compare three types of text embeddings:
• TF-IDF [20]: A lightweight approach for representing text by assigning weights to terms based on
their frequency within a document and across the dataset. Despite its simplicity, TF-IDF highlights
the most relevant terms within each document, which can help in identifying keywords.
• Sentence-BERT (SBERT) [21]: approach for generating dense, context-aware embeddings for
sentence-level text, such as abstracts and descriptions. By capturing semantic relationships within
sentences, SBERT provides a deeper understanding of the context and meaning of words relative
to each other, rather than treating them as isolated terms.
2https://paperswithcode.com/about
3https://production-media.paperswithcode.com/about/methods.json.gz
• CLIP [22]: Originally designed for multimodal learning, CLIP’s text encoder can still generate
meaningful, contextually rich embeddings for textual tasks. By training on a vast array of web
data, CLIP has developed the ability to recognize complex language patterns and associations,
which is useful for handling diverse text data.</p>
        <p>Comparing diferent embedding techniques is important because each technique captures
semantic information diferently, which can significantly impact the performance and interpretability of
the clustering and classification tasks. These three techniques were chosen specifically because
they each address diferent aspects of textual representation: TF-IDF for term frequency-based
keyword extraction, SBERT for semantic understanding at the sentence level, and CLIP for capturing
broader, high-level contextual relationships. We used the following software versions in our
experiments to ensure reproducibility: sentence-transformers==3.1.1, transformers==4.45.1,
scikit-learn==1.4.1.post1. All experiments were conducted using Python 3.10.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Cluster quality analysis</title>
        <p>Cluster quality analysis is conducted to assess the natural grouping of research software categories
based on their textual embeddings on the analyzed method names and descriptions. The goal is to
examine whether community-defined categories form distinct clusters when represented by textual
attributes such as method names or descriptions. Clustering quality is evaluated using the following
metrics:
• Silhouette Score (SS) [23]: Measures cluster separation based on the average distance between
clusters. The score ranges from − 1 to +1, where higher scores indicate more distinct clusters.
A value close to +1 suggests that samples are well-matched to their own cluster and poorly
matched to neighboring clusters, while a value near 0 implies overlapping clusters. Negative
values indicate that samples may have been assigned to the wrong cluster.
• Calinski-Harabasz Index (CHI) [24]: Reflects the ratio of the sum of between-cluster dispersion
to within-cluster dispersion, with higher values indicating better-defined and more compact
clusters. The CHI is nonnegative and increases as the clusters become more compact and better
separated. Higher values generally imply that clusters are dense and well-separated, which is
ideal for clustering performance.
• Davies-Bouldin Index (DBI) [25]: Evaluates the average similarity ratio of each cluster with
the cluster most similar to it. The score ranges from 0 upwards, where lower values indicate better
separation and more distinct clusters. A DBI score closer to 0 implies low similarity between
clusters, suggesting efective clustering, while higher values indicate clusters that overlap or are
poorly separated.</p>
        <p>These three metrics were selected for their complementary strengths in assessing clustering quality.
The Silhouette Score evaluates how well each sample matches its own cluster versus neighboring ones,
providing insight into cluster separation. The Calinski-Harabasz Index measures cluster compactness
and separation, indicating well-defined clusters with higher values. Finally, the Davies-Bouldin Index
assesses distinctness by evaluating similarity between clusters, with lower values reflecting minimal
overlap. Together, these metrics ofer a balanced view of clustering performance by capturing separation,
cohesion, and distinctness.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Visualization</title>
        <p>T-SNE [26], a dimensionality reduction technique, is used to visualize the embeddings and assess whether
the research software attributes align with the community-defined categories. These visualizations
provide qualitative support for the quantitative evaluation of clustering and classification by illustrating
the distinctiveness of each attribute in capturing category diferences. By examining the visual clustering
patterns, we gain insights into how well the embeddings represent natural groupings, complementing
the quantitative metrics with a visual assessment of the category separability.</p>
        <p>Method Descriptions</p>
        <p>Sentence-BERT Embeddings
t-SNE
60
40
20
2
ion 0
s
en20
m
iD40
60
80
75
50
25
50
75</p>
        <p>100
0
Dimension 1
Research area</p>
        <p>25
Audio
Computer Vision</p>
        <p>Graphs
Natural Language Processing</p>
        <p>Reinforcement Learning
Sequential</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>In this study, we define a coherent cluster as a group of method descriptions that are closely grouped
in the embedding space and belong to the same category, with minimal overlap with other categories.
This coherence is indicative of a well-separated and semantically meaningful category.</p>
      <p>We used the dataset presented in Section 3.2 to determine whether community-defined categories,
represented by method names and descriptions, form distinct clusters that may serve as a solid foundation
for future classification tasks.</p>
      <p>Table 1 shows the clustering quality analysis over metric names and definitions, using diferent
embedding techniques. Method descriptions, particularly when embedded using Sentence-BERT, provide
better clustering of the community-defined research software categories than method names. The
Sentence-BERT embeddings for descriptions achieved the highest Calinski-Harabasz Index (67.37) and
the lowest Davies-Bouldin Index (3.56). However, the low Silhouette Scores across all embeddings
indicate weak separation between categories, suggesting that category boundaries may not be clearly
defined. This overlap likely reduces the classification signal, especially for categories such as Graphs and
Sequential, which show low inter-category distinction. These findings point to potential redundancy or
ambiguity in the current taxonomy. For example, a method used in Natural language processing (NLP)
may also be applicable in Computer vision (CV) when dealing with multimodal data that combines text
and images. Such examples highlight the challenge of achieving clear-cut clusters, as certain methods
are inherently versatile and cross-disciplinary.</p>
      <p>The results indicate that while the current categories provide a starting point for classification
tasks, their efectiveness is limited due to significant overlap, which introduces noise and reduces
their reliability. This suggests that classifications in this space should be interpreted with caution, as
the boundaries between categories are not well-defined. Rather than relying solely on the existing
taxonomy, further eforts are needed to improve category definitions by incorporating clearer textual
descriptions and more representative examples. Such refinements may help mitigate ambiguity and
better capture the nuances of methods that span multiple areas, potentially enhancing classification
accuracy while acknowledging the inherent limitations of the current structure.</p>
      <p>To further explore how well the categories are visually distinct, we applied t-SNE to the
SentenceBERT embeddings of the method descriptions, as these achieved the best clustering performance based
on the Calinski-Harabasz and Davies-Bouldin indexes. The t-SNE visualization in Figure 3 shows that
there is some overlap between categories in certain areas, such as Computer Vision, which tend to form
dense clusters, suggesting that some categories are more distinguishable than others. Natural language
processing and Reinforcement learning show more dispersion, reflecting the challenge in categorizing
methods that may span multiple domains. Overall, the visualization provides additional insight into
the structure of the embeddings, illustrating both the strengths and limitations of the current category
definitions.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>Automated classification of research software is essential for improving findability and supporting
reuse, particularly as research outputs become increasingly available on the Web. In this study, we
examined whether community-defined categories in Papers with Code form distinct clusters that
align with natural groupings in the data. Our results indicate that some categories, such as Computer
Vision and Natural Language Processing, exhibit clear separation, while others, including Graphs and
Reinforcement Learning, show substantial overlap. This overlap suggests that classification based on
these categories may introduce noise, even when state-of-the-art methods achieve high performance.</p>
      <p>Our clustering results indicate that while the existing taxonomy provides a useful foundation, its
efectiveness is hindered by ambiguous category boundaries. The presence of overlapping categories
suggests that certain research methods span multiple fields, making strict classification challenging.
Rather than relying solely on predefined categories, future classification eforts may explore refining
taxonomies by introducing additional subcategories or restructuring category definitions based on
empirical clustering results. Additionally, incorporating richer metadata, such as method usage context
and domain-specific relationships, may enhance classification accuracy. Furthermore, category
refinement may directly improve the usability of platforms like Papers with Code by ofering more precise
ifltering options for users. Incorporating subcategories or supporting multi-label assignments would
accommodate interdisciplinary methods, reducing misclassification and improving discoverability.</p>
      <p>Our future work will evaluate the impact of category refinement on classification performance by
testing machine learning models under diferent category structures. Another direction is to investigate
methods for systematically identifying and resolving overlapping categories, such as hierarchical
clustering approaches or semi-supervised learning techniques that integrate expert feedback. More broadly,
improving research software classification aligns with the FAIR principles, particularly Findability, by
ensuring that software tools are categorized in a way that accurately reflects their purpose and
functionality. Addressing these challenges will contribute to more reliable and interpretable research software
classification, supporting both automated discovery systems and the broader scientific community.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT for grammar checks and rewording. After
utilizing these tools, the authors reviewed and edited the content as needed and take full responsibility
for the publication’s content.
hybrid matching strategies, in: Lecture Notes in Computer Science, 2013.
[15] A. A. Salatino, F. Osborne, E. Motta, The cso classifier: Ontology-driven detection of research
topics in scholarly articles, International Journal on Digital Libraries (2020).
[16] A. A. Salatino, T. Thanapalasingam, A. Mannocci, F. Osborne, E. Motta, The computer science
ontology: a large-scale taxonomy of research areas, in: International Semantic Web Conference,
Springer, 2018, pp. 187–205.
[17] J. Ison, M. Kalas, I. Jonassen, D. Bolser, M. Uludag, H. McWilliam, J. Malone, R. Lopez, S. Pettifer,
P. Rice, Tools and data services registry: A community efort to document and share bioinformatics
resources, Nucleic Acids Research 44 (2016) D38–D47. doi:10.1093/nar/gkv1116.
[18] M. Al-Ahmad, et al., Ai knowledge graph: Large-scale knowledge graph for ai research, Journal of</p>
      <p>Web Semantics (2021). URL: https://link.springer.com/article/10.1007/s10586-021-03211-4.
[19] P. Manghi, C. Atzori, A. Bardi, M. Baglioni, H. Dimitropoulos, S. La Bruzzo, I. Foufoulas, A.
Mannocci, M. Horst, K. Iatropoulou, A. Kokogiannaki, M. De Bonis, M. Artini, A. Lempesis, A.
Ioannidis, N. Manola, P. Principe, T. Vergoulis, S. Chatzopoulos, Openaire graph dataset, 2024.
doi:10.5281/zenodo.12819872.
[20] A. Rajaraman, J. D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2011. URL:
https://infolab.stanford.edu/~ullman/mmds/book.pdf.
[21] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks, in:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019,
pp. 3982–3992.
[22] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin,
J. Clark, et al., Learning transferable visual models from natural language supervision, in:
International conference on machine learning, PMLR, 2021, pp. 8748–8763.
[23] P. J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,</p>
      <p>Journal of Computational and Applied Mathematics 20 (1987) 53–65.
[24] T. Caliński, J. Harabasz, A dendrite method for cluster analysis, Communications in Statistics 3
(1974) 1–27.
[25] D. L. Davies, D. W. Bouldin, A cluster separation measure, IEEE transactions on pattern analysis
and machine intelligence (1979) 224–227.
[26] L. Maaten, G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research (2008)
2579–2605.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bouwman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Brookes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Crosas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Dillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Dumon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Edmunds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Evelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Finkers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Beltran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J. G.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Goble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Grethe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heringa</surname>
          </string-name>
          , P. A. C. '
          <string-name>
            <surname>t Hoen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hooft</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Kuhn</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kok</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kok</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          <string-name>
            <surname>Lusher</surname>
            ,
            <given-names>M. E.</given-names>
          </string-name>
          <string-name>
            <surname>Martone</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mons</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          <string-name>
            <surname>Packer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Persson</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rocca-Serra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Roos</surname>
            , R. van Schaik,
            <given-names>S.-A.</given-names>
          </string-name>
          <string-name>
            <surname>Sansone</surname>
            , E. Schultes,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Sengstag</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Slater</surname>
            , G. Strawn,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Swertz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>J. van der</given-names>
          </string-name>
          <string-name>
            <surname>Lei</surname>
            , E. van Mulligen,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Velterop</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Waagmeester</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Wittenburg</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Wolstencroft</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mons</surname>
          </string-name>
          ,
          <article-title>The fair guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          )
          <article-title>160018</article-title>
          . doi:
          <volume>10</volume>
          .1038/sdata.
          <year>2016</year>
          .
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gruenpeter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Lamprecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Honeyman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Struck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Niehues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rabemanantsoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Chue Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martinez-Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sesink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lifers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Fouilloux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Erdmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Peroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martinez Lavanchy</surname>
          </string-name>
          , I. Todorov,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <source>Defining Research Software: a controversial discussion</source>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.5504016.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Chue Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Lamprecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. E.</given-names>
            <surname>Psomopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Harrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gruenpeter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Honeyman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Struck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Loewe</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. van Werkhoven</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Plomp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Genova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shanahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hellström</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sandström</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kuzak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Herterich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Islam,
          <string-name>
            <given-names>S.-A.</given-names>
            <surname>Sansone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pollard</surname>
          </string-name>
          , U. D.
          <string-name>
            <surname>Atmojo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Czerniak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Niehues</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          <string-name>
            <surname>Fouilloux</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Desinghu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.</given-names>
            Richard, C.
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Erdmann</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Nüst</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Tartarini</surname>
            , E. Ranguelova,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Anzt</surname>
            ,
            <given-names>I. Todorov</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>McNally</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Moldon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Burnett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Garrido-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sesink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Tovani-Palone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servillat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lifers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Miljković</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lynch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martinez Lavanchy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gesing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Martinez</given-names>
            <surname>Cuesta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Peroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soiland-Reyes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rabemanantsoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sochat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yehudi</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. F. WG</surname>
          </string-name>
          ,
          <source>FAIR Principles for Research Software (FAIR4RS Principles)</source>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .15497/RDA00068.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hucka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Graham</surname>
          </string-name>
          ,
          <article-title>Software search is not a science, even among scientists: A survey of how scientists and engineers find software</article-title>
          ,
          <source>Journal of Systems and Software</source>
          <volume>141</volume>
          (
          <year>2018</year>
          )
          <fpage>171</fpage>
          -
          <lpage>191</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dessì</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Reforgiato</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          , E. Motta, H. Sack,
          <article-title>Ai-kg: an automatically generated knowledge graph of artificial intelligence</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2020</year>
          , pp.
          <fpage>127</fpage>
          -
          <lpage>143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Malone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Lister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Ison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. E.</given-names>
            <surname>Parkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <article-title>The software ontology (swo): a resource for reproducibility in biomedical data analysis, curation and digital preservation</article-title>
          .,
          <source>J. Biomed. Semant</source>
          .
          <volume>5</volume>
          (
          <year>2014</year>
          )
          <article-title>25</article-title>
          . URL: http://dblp.uni-trier.de/db/journals/biomedsem/ biomedsem5.html#MaloneBLIHPS14.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>M. AI</surname>
          </string-name>
          , Papers with code,
          <year>2024</year>
          . URL: https://paperswithcode.com.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tsay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Braz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hirzel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shinnar</surname>
          </string-name>
          , T. Mummert, AIMMX:
          <article-title>Artificial intelligence model metadata extractor</article-title>
          ,
          <source>in: Proceedings of the 17th International Conference on Mining Software Repositories</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. F.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          , J. Han,
          <article-title>Higitclass: Keyword-driven hierarchical classification of github repositories</article-title>
          ,
          <source>in: 2019 IEEE International Conference on Data Mining (ICDM)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>876</fpage>
          -
          <lpage>885</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Färber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lamprecht</surname>
          </string-name>
          ,
          <article-title>Linked papers with code: the latest in machine learning as an rdf knowledge graph</article-title>
          ,
          <source>arXiv preprint arXiv:2310.20475</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Salatino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          , E. Motta,
          <source>Cso classifier 3</source>
          .
          <article-title>0: a scalable unsupervised method for classifying documents in terms of research topics</article-title>
          ,
          <source>International Journal on Digital Libraries</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Ciuciu-Kiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <article-title>Implementation of the analysis and dataset for research software classification</article-title>
          , https://github.com/kuefmz/pow_categories,
          <year>2025</year>
          . doi: h10.5281/zenodo.15230833, accessed: April 17,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <article-title>Agreementmaker: A flexible and eficient ontology matching system</article-title>
          ,
          <source>Journal of Web Semantics</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          , Agreementmakerlight: Boosting ontology alignment through
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>