<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/978-3-031-65794-8_11</article-id>
      <title-group>
        <article-title>Are Scientific Annotations Consistently Represented across Science Knowledge Graphs?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jenifer Tabita Ciuciu-Kiss</string-name>
          <email>jenifer.ciuciu-kiss@alumnos.upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Garijo</string-name>
          <email>daniel.garijo@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>5th International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment</institution>
          ,
          <addr-line>Nov 2024, Nara</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Science Knowledge Graphs, Comparative Analysis</institution>
          ,
          <addr-line>Metadata Quality</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Politécnica de Madrid, Boadilla del Monte</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>3977</volume>
      <fpage>345</fpage>
      <lpage>359</lpage>
      <abstract>
        <p>practices. Scientific Knowledge Graphs (SKGs) are increasingly used to annotate and interlink research outputs. However, little is known about how consistently they annotate the same publication. This paper presents a comparative analysis of category annotations across four major SKGs (ORKG, OpenAlex, OpenAIRE, and Papers with Code) using a manually curated gold-standard dataset of 70 AI-related papers. We examine diferences in annotation coverage, granularity, and semantic alignment, highlighting frequent inconsistencies such as label mismatches, overly generic terms, and coverage gaps. Our analysis reveals that manual curation ofers high-quality but sparse annotations, while automated systems achieve broader coverage at the cost of precision. This work contributes insights into the reliability of SKG metadata and outlines pathways for improving interoperability and annotation 0000-0002-3170-6730 (J. T. Ciuciu-Kiss); 0000-0003-0454-7145 (D. Garijo) Workshop Proceedings</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, Scientific Knowledge Graphs (SKGs) [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] have become essential infrastructures for
representing scholarly information [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in a machine-readable format [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. By linking research
entities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] such as publications, datasets, software, authors, and their associated annotations, SKGs
enable advanced services for scientific discovery [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], evaluation, and reuse. While these
infrastructures do not fully realize all aspects of the FAIR principles [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
        ], they fall short particularly
in interoperability and reusability. Annotation vocabularies are rarely harmonized across platforms,
documentation of classification pipelines is often incomplete, and provenance metadata is inconsistently
recorded. Nonetheless, they contribute toward findability and partial interoperability through metadata
enrichment [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], standardized identifiers, and the homogenization [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] of scholarly records. Examples
include OpenAlex [13], OpenAIRE [14, 15, 16, 17], the Open Research Knowledge Graph (ORKG) [18],
AI-KG [19], Crossref [20], and Papers with Code (PwC)1 among others, each ofering its own approach
to structuring and classifying research outputs.
      </p>
      <p>A core functionality of these graphs is the annotation of research publications [21, 22], typically
through labels such as subjects [16], concepts [13], or tasks [18]1 with or without a hierarchy. These
annotations are key for enabling semantic search [23], recommendation systems [24], benchmarking
platforms [25], and large-scale meta-analyses [26]. However, SKGs difer substantially in how they
generate and apply such labels, ranging from manual curation [18]1 to automated topic modeling [13, 18],
resulting in inconsistent representations of the same scientific work among various sources.</p>
      <p>While prior studies have explored overlaps between SKGs through quantitative methods [27], such
as measuring lexical similarity between annotations, there remains limited understanding of how SKG
category annotations difer in practice when describing the same publication across multiple SKGs.
In this paper we explore these diferences, which may stem from divergent modeling assumptions,
annotation pipelines, and classification goals. We present a comparative analysis of 70 AI-research
https://jeniferciuciukiss.com/ (J. T. Ciuciu-Kiss); https://dgarijo.com/ (D. Garijo)</p>
      <p>ceur-ws.org
papers from recent years, annotated across four major SKGs: ORKG, OpenAlex, OpenAIRE, and PwC.
We examine the types of inconsistencies that emerge when annotating the same publications, such as
mismatches in granularity, terminology, coverage, and discuss their implications for interoperability,
metadata quality, and downstream applications.</p>
      <p>To better understand SKG-based annotations, we formulate the following research questions, each
paired with the main contribution that addresses it:
• RQ1: How do category annotations difer across SKGs?</p>
      <p>We construct and release a manually curated dataset of 70 AI-related publications, each annotated
across four major SKGs: ORKG, OpenAlex, OpenAIRE, and PwC. These annotations include tasks,
methods, subjects, and other topical labels. We use this dataset to compare how SKGs difer in
annotation scope, specificity, and structural conventions. Throughout this paper, we refer to such
labels as (category) annotations.
• RQ2: How accurate are these annotations compared to a manually curated standard?
We manually reviewed the title and abstract of each paper to determine whether the
SKGassigned annotations accurately reflected the paper’s content. Based on this expert validation, we
constructed a gold-standard dataset and evaluated each SKG’s annotations in terms of precision,
recall, and F1-score [28].
• RQ3: What types of annotation inconsistencies occur most frequently?</p>
      <p>We conduct a comparative evaluation across the four SKGs, identifying frequent inconsistencies
such as mismatches in granularity, label ambiguity, incomplete coverage, and semantic
misalignment. We further analyze these issues through representative examples and quantitative
summaries.</p>
      <p>The remainder of this paper is structured as follows. Section 2 reviews related work on
SKGbased classification and metadata annotation. Section 3 outlines our methodology, including dataset
construction, annotation guidelines, and evaluation metrics. Section 4 describes the initial and
goldstandard datasets used for the comparative analysis. Section 5 presents empirical results on annotation
coverage, accuracy, and overlap across SKGs. Section 6 discusses the findings in light of key annotation
challenges and provides representative examples. Finally, Section 7 concludes with a summary of
insights and recommendations for improving annotation practices in scientific knowledge graphs.</p>
      <sec id="sec-1-1">
        <title>1.1. Background: Scientific Knowledge Graphs (SKGs)</title>
        <p>SKGs are structured representations of scholarly knowledge [29] that encode entities (e.g. publications
and concepts) and their semantic relationships in a graph-based format. Their primary aim is to
support advanced search, integration, and analysis of scientific information by making research outputs
machine-interpretable and interlinked. Depending on their design goals, SKGs difer in scope, domain
coverage, and update mechanisms, ranging from large-scale, automatically constructed graphs to
smaller, community-curated platforms. In the following, we discuss the key characteristics of some of
the most widely used SKGs that serve as the foundation for our analysis.</p>
        <sec id="sec-1-1-1">
          <title>1.1.1. OpenAlex</title>
          <p>OpenAlex [13] is an open catalog of scholarly entities that emerged as the successor of the Microsoft
Academic Graph [30]. It compiles metadata on publications, authors, institutions, venues, concepts, and
more. OpenAlex applies machine learning (ML) models trained on titles, abstracts, and citation contexts
to assign fine-grained topic annotations from a curated ontology of over 60,000 concepts. These topic
concepts are assigned probabilistically, with each publication receiving a primary concept and possibly
several secondary ones, each associated with confidence scores. The classification pipeline is
documented and accessible through the OpenAlex API2. As one of the largest open scholarly KGs, OpenAlex
2https://docs.openalex.org/api-entities/topics
prioritizes breadth and scalability but, due to its automated nature, it may introduce inconsistencies in
category granularity and semantic relevance, particularly across disciplines.</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>1.1.2. OpenAIRE - Open Access Infrastructure for Research in Europe</title>
          <p>Open Access Infrastructure for Research in Europe (OpenAIRE) [14, 15, 16, 17] is a major European
Open Science infrastructure designed to foster open scholarship and improve the accessibility and
reusability of scientific knowledge. The OpenAIRE Knowledge Graph aggregates metadata from a
broad spectrum of sources, including publications, datasets, projects, and research organizations across
Europe, thus providing an integrated view of the research landscape. The OpenAIRE APIs3 ofer access
to this aggregated metadata, enabling structured queries over scholarly content. In this work, we
used the Search API45 to retrieve category-level annotations for individual publications. OpenAIRE’s
metadata model is enriched with subject classifications based on taxonomies such as the OECD Fields
of Science (FOS), SCINOBO6 and others integrated across its pipeline. As a component of the European
Open Science Cloud, OpenAIRE benefits from frequent updates and ongoing standardization eforts. It
supports the discoverability and interoperability of scientific content through harmonized metadata
ingestion from compliant repositories and data providers.</p>
        </sec>
        <sec id="sec-1-1-3">
          <title>1.1.3. ORKG - The Open Research Knowledge Graph</title>
          <p>ORKG [18]7 ofers a semantic infrastructure for representing individual research contributions using
RDF and structured templates. Unlike fully automated SKGs, ORKG relies on manual,
communitydriven annotations where users describe publications through semantically rich triples that capture
the problem, method, and result of a study. Annotations are made using predefined templates that
align with scholarly discourse elements, enabling fine-grained semantic modeling of contributions.
The system is designed to increase interpretability and transparency of research metadata, supporting
both manual entry and semi-automated extraction tools. While its manual approach limits coverage
compared to large-scale automated graphs, the semantic depth and precision of ORKG annotations
make it especially valuable for comparative analyses.</p>
        </sec>
        <sec id="sec-1-1-4">
          <title>1.1.4. PwC - Papers with Code</title>
          <p>PwC8 is a domain-specific SKG focused on the AI/ML research landscape. It integrates scientific
publications, benchmark datasets, evaluation results, and source code into a coherent, task-driven knowledge
graph. Each paper is linked to tasks and methods, with annotations derived via a hybrid pipeline
combining automated extraction and human curation. PwC sources its papers primarily from arXiv and
Crossref, and then connects them to relevant benchmarks and method families, drawing from curated
taxonomies that reflect the evolving state of the field. The PwC dataset is regenerated daily, ensuring
that new papers and updated annotations are continuously incorporated. Labels are reviewed and
maintained by moderators and contributors from the research community, which supports high-quality and
ifne-grained annotations useful for reproducibility studies and trend analysis. Metadata and category
information are available through the PwC platform and associated GitHub repositories.9 Previous
work has examined the consistency and accuracy of existing method quality in PwC, highlighting both
the strengths and the limitations in coverage and granularity [31].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>This section describes the diverse resource categorization techniques employed by diferent SKGs,
highlighting their methodological diferences and implications for consistency in annotations. SKGs employ
varied annotation strategies to assign semantic categories to scholarly works. These strategies difer in
automation level, interpretability, and domain specificity. Below, we categorize these approaches into
four main types: rule-based/metadata, NER/linking, topic modeling/classification, and hybrid/manual
curation.</p>
      <sec id="sec-2-1">
        <title>2.1. Metadata- and taxonomy-based classification</title>
        <p>Platforms like OpenAIRE [16, 17] and Crossref [20] rely on repository metadata and established
taxonomies (e.g., OECD Fields of Science [32], SCINOBO [33], ACM CSS [34, 35]) to assign broad subject
categories to publications [14, 15]. This strategy enables scalable and harmonized annotation of research
outputs using well-established classification schemes, contributing to metadata interoperability and
integration across repositories. However, it typically produces coarse-grained labels that may lack
domain specificity. Notably, many SKGs using metadata-based strategies provide limited
documentation about how repositories map local tags to global taxonomies, which introduces opacity into the
categorization pipeline.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Named-entity recognition (NER) and entity linking</title>
        <p>Some SKGs extract entities directly from unstructured text using NER and linking to external knowledge
bases (e.g., Wikidata [36], MeSH [37]10) [38]. This approach enables direct annotation of domain-specific
entities (e.g., genes, diseases, methods), which is particularly useful in specialized fields like biomedicine,
for instance, biomedical SKGs often detect gene, disease, or method mentions via concept recognition
tools, followed by normalization to identifiers. These annotations can enhance semantic granularity
and support knowledge integration. However, such pipelines are unevenly documented, with some
SKGs omitting details about their training data and linking heuristics [19, 39]. This opacity complicates
reproducibility and comparison across systems.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Topic modeling and supervised classification</title>
        <p>Large-scale SKGs such as OpenAlex apply ML to cluster publications and assign topics based on features
extracted from titles, abstracts, venue names, and citation networks. These annotations can enhance
semantic granularity and support knowledge integration, for example, OpenAlex’s topic pipeline uses
network clustering to define topic communities, labels them via LLMs, and employs a deep learning
classifier to annotate works 11. Such approaches can surface emerging topics and latent structure in
scientific literature. However, they may result in inconsistent granularity, with some topics being overly
broad and others overly specific.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Hybrid human-in-the-loop annotation</title>
        <p>Domain-specific SKGs, such as PwC, combine automated matching of papers to predefined task/method
taxonomies with manual curation by community moderators1. This hybrid approach leverages the
scalability of automation while incorporating expert validation to improve annotation accuracy. Such
strategies strike a balance between breadth and quality: automated systems ofer scalability, while
manual review ensures semantic precision. Similarly, ORKG uses user-defined templates (e.g., method,
result, etc.), filled manually, to produce highly structured and semantically rich metadata 12. These
annotations provide detailed and interpretable representations of research contributions. This yields precise
10https://www.nlm.nih.gov/mesh/meshhome.html
11https://docs.openalex.org/api-entities/topics
12https://orkg.org/stats
results, but coverage remains limited, and the user-dependent process is not uniformly documented
across contributions. Moreover, manual curation may introduce subjective bias, as moderators may
apply labels based on individual interpretations, experience, or familiarity with specific research areas.</p>
        <sec id="sec-2-4-1">
          <title>Implications of strategy diversity on annotation consistency The diversity of annotation strate</title>
          <p>gies across SKGs introduces both strengths and challenges for metadata consistency. Metadata-based
systems produce generalized labels with limited thematic depth; topic modeling yields probabilistic but
uneven concept assignments, and NER/linking systems vary based on entity recognition quality and
KB integration. Hybrid manual approaches deliver semantically rich labels but lack scalability and full
traceability, particularly when documentation of contributor workflows is absent. Understanding the
trade-ofs of each strategy is essential for improving interoperability and annotation quality. Our work
contributes to this area by performing a fine-grained, paper-level comparison across SKGs, focusing on
semantic overlap, divergence, and contextual usage of categories.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>To assess the consistency and semantic validity of annotations across SKGs, we designed a two-stage
evaluation pipeline: (1) data collection and preparation, and (2) comparative analysis based on manual
validation of annotation correctness. All annotations were verified by a domain expert against the
content of each publication by manually inspecting titles and abstracts. To mitigate potential bias in
manual validation, we followed predefined rules (see Section 3.1), avoided adding new annotations,
and applied a conservative exclusion strategy. Future work will incorporate multiple annotators and
inter-annotator agreement to strengthen the reproducibility of our results.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Collection</title>
        <p>In the first stage of our methodology, we assembled a dataset of AI-related research papers that are jointly
annotated across four prominent Scientific Knowledge Graphs (SKGs): ORKG, OpenAlex, OpenAIRE,
PwC. These SKGs were selected to represent a spectrum of annotation strategies: from fully manual
(ORKG), to hybrid human-in-the-loop (PwC), to fully automated systems (OpenAlex and OpenAIRE).
This diversity allowed us to conduct a balanced comparison of annotation behavior across diferent
design paradigms.</p>
        <p>Each SKG defines its own scope and indexing strategy, focusing on diferent disciplines, sources,
or publication types, which naturally leads to variations in which papers are included and how they
are annotated. As a result, it is common for a paper to appear in one SKG but not another, or to be
indexed without any associated annotations. This diversity in coverage is expected and reflects the
design priorities of each graph rather than inconsistencies.</p>
        <p>To enable a controlled comparative analysis, we selected only papers that were indexed and annotated
by all four SKGs. Although the SKGs use distinct terminology for annotations, such as “tasks” and
“methods” (PwC), “research problems” (ORKG), or broader “subjects” (OpenAIRE, OpenAlex)—we refer
to all such labels uniformly as annotations throughout this paper..</p>
        <p>Constructing a dataset with full parallel annotations across multiple SKGs required scanning a large
candidate pool of papers and applying an iterative filtering process. Only those papers for which each
SKG provided at least one annotation were retained for the final analysis. The resulting dataset and its
properties are described in detail in Section 4.</p>
        <p>Paper Selection To build the dataset, we began by compiling a broad pool of AI-related research
papers published between 2023 and 2025. Candidate papers were selected based on the authors’ domain
expertise and covered a wide range of topics within Artificial Intelligence. For each paper, we attempted
to match entries across the four selected SKGs using persistent identifiers (primarily DOIs) and title
matching. Because each SKG has a diferent scope and indexing strategy, many papers were not fully
covered in all four. Some were absent from one or more SKGs, while others were indexed but lacked
relevant annotations. To ensure a fair and controlled comparison, we retained only papers that (1)
were indexed in all four SKGs and (2) had at least one annotation from each. This filtering process
was applied iteratively to an initial, approximately 200 papers, resulting in a final set of 70 papers that
satisfied the completeness criteria for comparative analysis.</p>
        <p>Categorization Retrieval Once the final set of papers was selected, we retrieved their corresponding
annotations from each of the four SKGs. The retrieval process was adapted to the access mechanisms
and data availability of each source. Specifically:
• For OpenAlex, we used the oficial API 13 to retrieve topic and concept annotations associated
with each paper’s DOI.
• For OpenAIRE, we queried the Search API14 to obtain subject classifications based on the OECD</p>
        <p>Fields of Science taxonomy.
• For ORKG, we used the public SPARQL endpoint15 to extract structured annotations based
on predefined semantic templates. In particular, we retrieved the hasResearchProblem and
hasMethod fields from each research contribution.
• For PwC, we used a local data dump16 accessed on July 1, 2025, to extract annotations for tasks
and methods associated with each paper.</p>
        <p>Annotations were collected and stored separately for each SKG, preserving their original format,
structure, and terminology. No filtering or transformation was applied during this stage, to ensure that
the data remained faithful to its source. This raw annotation set served as the input for the normalization
and comparative analysis steps described in the following sections.</p>
        <p>Initial Dataset: Normalization The initial dataset was constructed directly from the raw
annotations retrieved from each SKG. To ensure basic consistency across sources, all annotation labels were
transformed to lowercase. In addition, whenever annotations were expressed using OECD Fields of
Science (FoS) codes or non-standard descriptors, these were replaced with their corresponding standard
FoS labels. No further transformations, filtering, or reformatting were applied at this stage. This version
of the dataset preserves the original annotation behavior of each SKG and is referred to as the initial
dataset in the rest of the paper.</p>
        <p>Gold-Standard Dataset: Manual Validation To assess annotation correctness, we manually curated
a gold-standard by reviewing each paper’s title and abstract. Validation was performed by the first
author (a PhD researcher specializing in AI and SKGs with over 5 years of experience). Each annotation
was evaluated against three criteria: (i) semantic relevance to the paper’s research problem or method,
(ii) domain specificity (avoiding overly generic categories such as ‘science’), and (iii) contextual accuracy.
Borderline cases were conservatively excluded. This rule-based approach was adopted to minimize
subjective bias. Annotations that were relevant were marked as correct and therefore kept in the final
gold-standard dataset, while those that were of-topic, overly generic, overly specific, or misleading
were marked as incorrect. In borderline cases, we adopted a conservative approach and excluded such
annotations from the gold-standard.</p>
        <p>During this process, we also identified cases where the abstracts retrieved from certain SKGs were
incomplete, incorrect, or contained metadata artifacts. These cases were corrected manually using
the oficial abstracts from publisher websites or arXiv [ 40] to ensure that our validation was based on
accurate representations of the paper content.
13https://api.openalex.org/works
14https://api.openaire.eu/search/publications
15https://orkg.org/sparql
16https://paperswithcode.com/about</p>
        <p>No new annotations were introduced during this step; the gold-standard only reflects corrections to
existing labels and underlying metadata. This version of the dataset [41] is used in our evaluation of
annotation accuracy in Section 5.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Comparative Analysis</title>
        <p>To evaluate the consistency and semantic appropriateness of category annotations across SKGs, we
conducted a comparative analysis that combines quantitative metrics with qualitative interpretation.
This twofold approach allowed us to assess both the overall annotation performance of each SKG and
the nature of discrepancies that emerge when multiple SKGs describe the same publication.</p>
        <p>Using the gold-standard dataset, we first evaluated annotation correctness in terms of precision,
recall, and F1-score for each SKG. These metrics quantify the alignment between the annotations and
expert-validated labels, forming the basis of the results presented in Section 5.</p>
        <p>To complement the evaluation, we then analyzed the types of inconsistencies that commonly arise
across SKGs. For this purpose, we read and interpreted the title and abstract of each paper to identify
recurring annotation issues and anomalies. In particular, we distinguish four main types of
inconsistencies:
• Coverage inconsistency: Cases where a paper was present in all four SKGs but one or more
SKGs provided no annotation. Importantly, no new categories were introduced during our process;
coverage was judged solely based on whether each SKG ofered at least one valid label.
• Label mismatch: Use of diferent terms to describe the same concept (e.g., “NER” vs. “Named</p>
        <p>Entity Recognition”), reflecting diferences in vocabulary and annotation conventions.
• Granularity diference : One SKG uses broad categories (e.g., “Computer Vision”) while another
applies fine-grained concepts (e.g., “Panoptic Segmentation”), complicating direct comparison.
• Incorrect category assignment: A category is clearly misaligned with the content of the
paper—such as labeling an NLP paper as “Computer Vision”—often due to automatic inference
errors or misinterpreted metadata.</p>
        <p>Each inconsistency was documented at the paper level, and summary statistics were compiled to
capture their distribution across the dataset. Representative examples and edge cases are discussed
in Section 6 to illustrate common pitfalls, semantic drift, and limitations in current SKG annotation
practices. All code used for dataset construction and analysis is available on GitHub [42] and also
published as a snapshot on Zenodo [41].</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset</title>
      <p>The two datasets used in our analysis were constructed following the methodology described in
Section 3. Both the initial dataset and the manually curated gold-standard dataset are publicly available
on Zenodo [41]. Both datasets include the exact same set of 70 AI-related research papers from
2023–2025, each annotated by all four SKGs. Table 1 presents key statistics for the initial dataset and
the manually curated gold-standard dataset.</p>
      <p>Initial dataset reflects the raw annotations retrieved from each SKG, with only minimal normalization
applied, such as converting to lowercase and replacing classification codes when necessary. This version
of the dataset contains a total of 2,756 annotations, which corresponds to an average of 39.37 annotations
per paper and 9.84 annotations per paper per SKG. The dataset includes 728 unique category labels,
illustrating the broad topical coverage and terminological diversity across SKGs. However, this volume
also introduced substantial noise, redundancy, and inconsistency, particularly in cases where overly
generic or highly specific terms inflated the annotation count.</p>
      <p>The Gold-standard dataset builds on the initial version by incorporating manual validation of each
annotation. Using the title and abstract of each paper, we assessed whether the assigned categories
accurately reflected the main research topic or contribution. Annotations deemed of-topic, overly
broad, overly specific, or ambiguous were removed. In a few cases, missing/incorrect abstracts were also
corrected manually. This refinement reduced the dataset to 1,046 total annotations—an average of 14.94
annotations per paper and 3.78 per paper per SKG. The number of unique category labels decreased to
300, resulting in a cleaner and more semantically coherent label set suitable for evaluation purposes.</p>
      <p>The contrast between the two datasets highlights the tendency of automated and hybrid SKG pipelines
to overgenerate annotations. While the initial dataset captures the full breadth of current SKG outputs,
the gold-standard version provides a human-validated benchmark that filters out noise and prioritizes
interpretability.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>This section presents the results of our analysis, structured around the research questions (RQs)
introduced in Section 1. Each subsection restates the corresponding RQ and provides a detailed response
based on comparative findings on the proposed gold-standard dataset consisting of 70 AI-related
research papers, annotated across the four SKGs.</p>
      <sec id="sec-5-1">
        <title>5.1. RQ1: How do annotation strategies difer across SKGs?</title>
        <p>To examine how annotation practices vary across SKGs, we analyzed the average number of annotations
per paper and the number of unique category labels for each graph, both before and after gold-standard
curation. Table 2 summarizes these results. The initial dataset reveals significant diferences in
annotation strategies across SKGs. PwC assigns the highest number of categories per paper (16.73 on
average), followed by OpenAlex (12.39), OpenAIRE (7.43), and ORKG (2.83). While OpenAlex exhibits
the broadest vocabulary with 277 unique categories, ORKG, despite its lower per-paper
average—maintains 133 distinct labels, pointing to a focused yet diverse annotation strategy. After manual curation,
annotation counts dropped significantly across all SKGs, ranging from 4.66 (PwC) to 1.93 (ORKG),
reflecting reductions of over 50% in all cases. Overall, the results confirm that SKGs difer widely in both
the volume and nature of their annotations. Automated systems like OpenAlex and OpenAIRE aim for
broad coverage, but difer in granularity and topical precision. OpenAlex produces more annotations
and a broader vocabulary, while OpenAIRE remains more conservative. PwC, despite being hybrid,
assigns the highest number of categories per paper, suggesting a bias toward exhaustive labeling that
introduces redundancy. In contrast, ORKG assigns far fewer annotations on average, but maintains a
high number of distinct category labels, indicating a more targeted and semantically diverse annotation
strategy.</p>
        <p>To further explore how SKGs align in their annotation choices, we computed the number of
overlapping categories assigned per paper for each pair and triplet of SKGs. Table 3 summarizes the total
number of shared annotations observed across 70 papers. Overlaps were relatively sparse, underscoring
the inconsistency in how diferent SKGs annotate the same paper. The highest pairwise agreement
occurred between OpenAlex and OpenAIRE (71 overlapping categories), likely due to their shared
reliance on automated subject classification at a broad scope. In contrast, overlap between ORKG and
OpenAIRE was negligible (1 overlap), reflecting their distinct coverage and semantic focus. Notably,
there were zero papers where all four SKGs assigned at least one identical category, and only two triple
combinations (PwC–OpenAlex–ORKG and OpenAlex–OpenAIRE–ORKG) resulted in even a single
shared category across 70 papers. These results reinforce the conclusion that while SKGs may annotate
the same papers, they do so using divergent taxonomies and strategies, limiting interoperability and
semantic alignment.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. RQ2: How accurate are the annotations compared to a manually curated gold-standard?</title>
        <p>To assess annotation correctness, we evaluated how well the labels assigned by each SKG aligned with
the manually curated gold-standard. For each of the 70 AI-related papers, the gold-standard contains
only those annotations that were present in the original SKG outputs and judged to be semantically
correct based on the paper’s title and abstract. No new categories were added—evaluation was performed
purely within the set of originally retrieved labels.</p>
        <p>As shown in Table 4, SKGs vary significantly in annotation quality. PwC and OpenAlex achieve nearly
perfect recall (0.99 and 1.00, respectively), meaning most relevant labels are included in their original
outputs. However, both sufer from low precision, 0.27 for PwC and 0.39 for OpenAlex, indicating a
high proportion of irrelevant, redundant, or overly specific annotations. These results reflect their
emphasis on breadth and automated category expansion.</p>
        <p>ORKG, in contrast, produces far fewer annotations but with significantly higher semantic alignment,
yielding the highest precision (0.66) and F1-score (0.79). OpenAIRE falls between the two extremes,
ofering a trade-of between coverage and correctness with a precision of 0.35 and recall of 0.72.</p>
        <p>Labeling methodology. An annotation was marked as correct if it matched a label in the
goldstandard exactly (case-insensitive). All comparisons were made using string matching; minor spelling
diferences (e.g., modeling vs. modelling) were considered incorrect. Duplicates were collapsed and not
penalized. Importantly, the evaluation focuses solely on correctness relative to the gold-standard, it
does not assess whether the remaining categories are optimal or comprehensive. For instance, PwC
might include appropriate categories that are semantically correct yet filtered out due to being overly
specific or redundant.</p>
        <p>Metric computation. Metrics were computed using the scikit-learn functions precision_score,
recall_score, and f1_score, with zero_division=0. For each SKG and paper, we created a binary
vector over all labels (union of predicted and gold) to indicate presence/absence. These vectors were
concatenated across the 70 papers and aggregated into global metrics, ofering a strict but comparable
evaluation of annotation quality across SKGs.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. RQ3: What types of annotation inconsistencies occur most frequently?</title>
        <p>Since the gold-standard dataset was created by manually filtering the initial annotations, the most
frequent inconsistency was overannotation—labels that did not align with the paper’s content. No new
categories were added; instead, irrelevant, redundant, overly generic, or excessively specific annotations
were removed. A small number of coarse-grained categories were refined into more precise terms, and
typographical variants (e.g., British vs. American spelling) were harmonized for consistency.</p>
        <p>Table 5 summarizes the extent of label filtering per SKG. For each graph, we report the number
of initial annotations, retained gold-standard annotations, incorrect labels removed, and the average
number of incorrect annotations per paper. Except for two isolated cases, all papers contained at least
one correct annotation per SKG, validating the application of the evaluation metrics.</p>
        <p>The results show that PwC and OpenAlex contributed the most noise, with 779 and 490 incorrect
annotations respectively—averaging over 12 and nearly 8 errors per paper. OpenAIRE followed with
moderate overannotation, while ORKG was the most conservative, averaging fewer than one incorrect
label per paper. Overall, 1,581 out of 2,482 initial annotations (64%) were removed, highlighting the need
for improved quality control, context-sensitive categorization, and curated vocabularies to enhance
SKG reliability.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>This section reflects on the findings in light of the three RQs, emphasizing key patterns and providing
representative examples to illustrate coverage, accuracy, and consistency diferences across SKGs.</p>
      <sec id="sec-6-1">
        <title>6.1. RQ1 – Category annotations across SKGs</title>
        <p>The SKGs in our study difered significantly in annotation strategies and category granularity. PwC and
OpenAlex assigned more categories per paper (17 and 12 on average, respectively), while OpenAIRE
provided broader disciplinary coverage, and ORKG applied minimal but more conservative annotations.</p>
        <p>A central finding was the trade-of between quantity and specificity. For instance, the paper “Gemini
1.5” (DOI: 10.48550/arxiv.2403.05530) received rich model-specific labels from PwC, while ORKG applied
only “finding pre-trained large language model” and OpenAIRE relied on broad terms such as “computer”
and “information sciences”. Similarly, the paper “MiniCPM” (DOI: 10.48550/arxiv.2404.06395) was
annotated with task-specific terms like “domain adaptation” in PwC, but only “generic” in ORKG.</p>
        <p>In hybrid SKGs like PwC, taxonomy cleanliness emerged as a concern. For example, “Generating
Benchmarks for Factuality Evaluation” (DOI: 10.48550/arxiv.2307.06908) was annotated with both
“language modeling” and “language modelling”, highlighting the absence of normalization. Duplicate
entries like these complicate downstream semantic analysis.</p>
        <p>Surface-level term matching occasionally led to significant misclassifications. The paper “Pride and
Prejudice: LLM Amplifies Self-Bias” (DOI: 10.18653/v1/2024.acl-long.826) triggered legal and political
science categories in OpenAlex due to title keywords like “prejudice”, despite being an ML study.</p>
        <p>Annotation coverage, specificity, and taxonomic coherence difer substantially across SKGs. While
automated and hybrid systems ofer breadth, they introduce term redundancy and thematic drift. Manual
systems like ORKG provide focused but sparse coverage. Context-aware disambiguation and taxonomy
standardization are needed to improve interoperability. Notably, annotation types difer substantially
across SKGs. ORKG emphasizes problem/method/result triples, OpenAIRE and OpenAlex assign broad
subject categories, while PwC ofers task- and method-level keywords. Although ORKG also provides
research field classifications, we excluded them as they overlap strongly with OpenAIRE and OpenAlex
and would bias the comparison.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. RQ2 – Accuracy compared to the gold-standard</title>
        <p>To assess annotation relevance, we compared each SKG to a manually curated gold-standard. PwC
achieved the highest recall but sufered from low precision (27%). ORKG, though sparse, demonstrated
the highest precision (66%), followed by OpenAIRE and OpenAlex.</p>
        <p>Many misclassifications stemmed from overgeneralization or superficial keyword matching. For
example, OpenAlex labeled the paper “Enhancing Text-Based Knowledge Graph Completion” (DOI:
10.1016/j.knosys.2024.112155) with “paleontology” and “mechanical engineering”—labels likely
derived from unrelated co-occurring terms. In contrast, PwC provided precise annotations like “graph
embedding” and “contrastive learning”.</p>
        <p>Another case is “OLMo” (DOI: 10.18653/v1/2024.acl-long.841), where OpenAlex provided both accurate
and noisy categories such as “topic modeling” and “meteorology”. However, the presence of confidence
values in OpenAlex allowed for assessing reliability—an advantage over other SKGs.</p>
        <p>High annotation density does not guarantee high relevance. Systems like PwC maximize coverage
but include considerable noise, while ORKG captures essential concepts at the expense of completeness.
Confidence scoring and validation pipelines are key to improving accuracy.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. RQ3 – Types of Annotation Inconsistencies</title>
        <p>Our comparative analysis identified four primary types of annotation inconsistencies:
• Coverage inconsistency: Some SKGs failed to annotate otherwise well-covered papers. For
instance, ORKG labeled the paper “MiniCPM” only as “generic”, missing the technical depth
captured by PwC and OpenAlex.
• Incorrect assignment: Irrelevant labels were particularly common in automated systems.</p>
        <p>OpenAlex annotated “MiniCPM” with “meteorology” and “geography”, while the actual topic
concerns model scaling strategies.
• Granularity mismatch: Labels ranged from very general (e.g., “computer science”) in
OpenAIRE to extremely specific (e.g., “contrastive learning”) in PwC, complicating comparisons and
integration.
• Label noise and duplication: PwC’s community-driven labels sometimes included errors. The
paper “Phi-3 Technical Report” (DOI: 10.48550/arxiv.2404.14219) was accurately annotated with
terms like “attention mechanisms” but also included the nonsensical category “15 ways to contact
how can I speak to someone at delta airlines”, likely a mislabeling or spam entry.</p>
        <p>These inconsistencies were tracked per paper and summarized across the dataset. Over 60% of
all raw annotations were filtered out during gold-standard construction, underscoring the need for
quality assurance. They mainly stem from taxonomy misalignment, inconsistent curation, and limited
validation. A key ongoing research challenge is therefore how to ensure robust quality control, category
normalization, and richer annotation metadata to improve trust in these systems.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion and Future Work</title>
      <p>This study provides an in-depth comparison of categories (i.e., task and method annotations) across four
prominent Scientific Knowledge Graphs (SKGs) (ORKG, OpenAlex, OpenAIRE, and PwC) on a shared set
of 70 AI-related publications. By analyzing a manually curated dataset with parallel annotations from
each SKG, we reveal substantial variation in category annotation coverage, granularity, and semantic
alignment. Our dataset is limited in size, but to our knowledge, it is the first cross-SKG category
annotation comparison study for SKG comparison.</p>
      <p>Our findings suggest that PwC ofers the most comprehensive and fine-grained annotations, likely
due to its hybrid strategy combining automated extraction with manual validation. OpenAIRE, in
contrast, employs a coarse-grained taxonomy that emphasizes domain-level categorization. ORKG
exhibits high semantic precision where annotations are present, but its reliance on manual input results
in limited coverage. OpenAlex provides broad topic coverage through automated classification, but
occasionally assigns contextually inappropriate labels, likely due to literal keyword matching and
limited disambiguation.</p>
      <p>Although more work is needed to confirm our findings outside the pool of articles selected in our
dataset, the discrepancies found highlight the challenges of aligning annotations across SKGs and
point to broader research challenges in annotation design, vocabulary standardization, and semantic
interoperability for consistent categorization and improved cross-graph metadata integration.</p>
      <p>Future work will expand our dataset beyond 70 AI-related publications to include additional domains,
enabling broader generalization. We plan to compute inter-SKG agreement metrics and apply clustering
to uncover structural inconsistencies. We also aim to assess inter-annotator agreement for the gold
labels to validate their reproducibility. Finally, we plan to explore schema mapping and ontology
alignment strategies to reconcile diferences in labeling schemes and abstraction levels.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>All research content and ideas are original by the authors. We acknowledge the use of ChatGPT for
supervised grammar checks and minor paragraph rewording.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>The authors would like to thank the EVERSE project (GA 101129744) under the European Union’s
Horizon Europe Programme (HORIZON-INFRA-2023-EOSC-01-02).
[13] J. Priem, H. Piwowar, R. Orr, OpenAlex: A fully-open index of scholarly works, authors, venues,
institutions, and concepts, 2022. URL: http://arxiv.org/abs/2205.01833. doi:10.48550/arXiv.2205.
01833, arXiv:2205.01833 [cs].
[14] N. Rettberg, B. Schmidt, Openaire - building a collaborative open access infrastructure for european
researchers, LIBER Quarterly: The Journal of the Association of European Research Libraries 22
(2012) 160–175. URL: https://liberquarterly.eu/article/view/10641. doi:10.18352/lq.8110.
[15] N. Rettberg, B. Schmidt, Openaire: Supporting a european open access mandate, College Research
Libraries News 76 (2015) 306–310. URL: https://crln.acrl.org/index.php/crlnews/article/view/9326.
doi:10.5860/crln.76.6.9326.
[16] P. Manghi, A. Bardi, C. Atzori, M. Baglioni, N. Manola, J. Schirrwagen, P. Principe, The openaire
research graph data model, 2019. URL: https://doi.org/10.5281/zenodo.2643199. doi:10.5281/zenodo.
2643199.
[17] P. Manghi, C. Atzori, A. Bardi, M. Baglioni, J. Schirrwagen, H. Dimitropoulos, S. La Bruzzo,
I. Foufoulas, A. Mannocci, M. Horst, A. Czerniak, K. Iatropoulou, A. Kokogiannaki, M. De
Bonis, M. Artini, A. Lempesis, A. Ioannidis, N. Manola, P. Principe, T. Vergoulis, S.
Chatzopoulos, D. Pierrakos, Openaire graph dump, 2022. URL: https://doi.org/10.5281/zenodo.7488618.
doi:10.5281/zenodo.7488618.
[18] M. Y. Jaradeh, A. Oelen, K. E. Farfar, M. Prinz, J. D’Souza, G. Kismihók, M. Stocker, S. Auer, Open
research knowledge graph: Next generation infrastructure for semantic scholarly knowledge, in:
Proceedings of the 10th International Conference on Knowledge Capture, K-CAP ’19, Association
for Computing Machinery, New York, NY, USA, 2019, p. 243–246. URL: https://doi.org/10.1145/
3360901.3364435. doi:10.1145/3360901.3364435.
[19] D. Dessì, F. Osborne, D. Reforgiato Recupero, D. Buscaldi, E. Motta, H. Sack, Ai-kg: An
automatically generated knowledge graph of artificial intelligence, in: The Semantic Web
– ISWC 2020: 19th International Semantic Web Conference, Athens, Greece, November 2–6,
2020, Proceedings, Part II, Springer-Verlag, Berlin, Heidelberg, 2020, p. 127–143. URL: https:
//doi.org/10.1007/978-3-030-62466-8_9. doi:10.1007/978-3-030-62466-8_9.
[20] G. Hendricks, D. Tkaczyk, J. Lin, P. Feeney, Crossref: The sustainable source of
communityowned scholarly metadata, Quantitative Science Studies 1 (2020) 414–427. URL: https://doi.org/10.
1162/qss_a_00022. doi:10.1162/qss_a_00022.
arXiv:https://direct.mit.edu/qss/articlepdf/1/1/414/1760913/qss0 0022. .
[21] J.-P. Vergne, T. Wry, Categorizing categorization research: Review, integration, and future
directions, Journal of Management Studies 51 (2014) 56–94. URL: https://onlinelibrary.
wiley.com/doi/abs/10.1111/joms.12044. doi:https://doi.org/10.1111/joms.12044.
arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/joms.12044.
[22] E. Rosch, Principles of categorization, in: E. Rosch, B. B. Lloyd (Eds.), Cognition and Categorization,</p>
      <p>Lawrence Elbaum Associates, 1978, pp. 27–48.
[23] R. Guha, R. McCool, E. Miller, Semantic search, in: Proceedings of the 12th International Conference
on World Wide Web, WWW ’03, Association for Computing Machinery, New York, NY, USA, 2003,
p. 700–709. URL: https://doi.org/10.1145/775152.775250. doi:10.1145/775152.775250.
[24] G. Shani, A. Gunawardana, Evaluating recommendation systems, in: Recommender Systems</p>
      <p>Handbook, 2011. URL: https://api.semanticscholar.org/CorpusID:435521.
[25] R. Dattakumar, R. Jagadeesh, A review of literature on benchmarking, Benchmarking: An
International Journal 10 (2003) 176–209. URL: https://doi.org/10.1108/14635770310477744.
doi:10.1108/14635770310477744.
arXiv:https://www.emerald.com/bij/articlepdf/10/3/176/140450/14635770310477744.pdf.
[26] T. A. Trikalinos, G. Salanti, E. Zintzaras, J. P. A. Ioannidis, Meta-analysis methods, Advances in</p>
      <p>Genetics 60 (2008) 311–334. doi:10.1016/S0065-2660(07)00413-0.
[27] J. T. Ciuciu-Kiss, D. Garijo, Assessing the overlap of science knowledge graphs: A quantitative
analysis, in: Natural Scientific Language Processing and Research Knowledge Graphs: First
International Workshop, NSLP 2024, Hersonissos, Crete, Greece, May 27, 2024, Proceedings,
SpringerVerlag, Berlin, Heidelberg, 2024, p. 171–185. URL: https://doi.org/10.1007/978-3-031-65794-8_11.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Salatino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mannocci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <article-title>Detection, analysis, and prediction of research topics with scientific knowledge graphs</article-title>
          ,
          <source>in: Predicting the dynamics of research impact</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>252</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -86668-6_
          <fpage>11</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -86668-6_
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Manghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mannocci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sacharidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salatino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vergoulis</surname>
          </string-name>
          ,
          <article-title>New trends in scientific knowledge graphs and research impact assessment</article-title>
          ,
          <source>Quantitative Science Studies</source>
          <volume>2</volume>
          (
          <year>2021</year>
          )
          <fpage>1296</fpage>
          -
          <lpage>1300</lpage>
          . URL: https://doi.org/10.1162/qss_e_00160. doi:
          <volume>10</volume>
          .1162/qss_e_
          <fpage>00160</fpage>
          . arXiv:https://direct.mit.edu/qss/article-pdf/2/4/1296/2007915/qss
          <volume>0</volume>
          <fpage>0160</fpage>
          . .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Shema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bar-Ilan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Thelwall</surname>
          </string-name>
          ,
          <article-title>Research blogs and the discussion of scholarly information</article-title>
          ,
          <source>PLoS ONE 7</source>
          (
          <year>2012</year>
          )
          <article-title>e35869</article-title>
          . URL: https://doi.org/10.1371/journal.pone.0035869. doi:
          <volume>10</volume>
          .1371/journal. pone.
          <volume>0035869</volume>
          , epub 2012 May 11.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dodge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kitchin</surname>
          </string-name>
          ,
          <article-title>Codes of life: Identification codes and the machine-readable world</article-title>
          ,
          <source>Environment and Planning D: Society and Space</source>
          <volume>23</volume>
          (
          <year>2005</year>
          )
          <fpage>851</fpage>
          -
          <lpage>881</lpage>
          . URL: https://doi.org/10.1068/d378t. doi:
          <volume>10</volume>
          .1068/d378t. arXiv:https://doi.org/10.1068/d378t.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dessí</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Reforgiato</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          , E. Motta,
          <article-title>Cs-kg: A large-scale knowledge graph ofnbsp;research entities andnbsp;claims innbsp;computer science</article-title>
          ,
          <source>in: The Semantic Web - ISWC</source>
          <year>2022</year>
          : 21st International Semantic Web Conference, Virtual Event,
          <source>October 23-27</source>
          ,
          <year>2022</year>
          , Proceedings, Springer-Verlag, Berlin, Heidelberg,
          <year>2022</year>
          , p.
          <fpage>678</fpage>
          -
          <lpage>696</lpage>
          . URL: https://doi.org/10.1007/ 978-3-
          <fpage>031</fpage>
          -19433-7_
          <fpage>39</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -19433-7_
          <fpage>39</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Schickore</surname>
          </string-name>
          ,
          <article-title>Scientific discovery</article-title>
          , in: E. N.
          <string-name>
            <surname>Zalta</surname>
          </string-name>
          (Ed.),
          <source>The Stanford Encyclopedia of Philosophy</source>
          , winter 2022 ed., Metaphysics Research Lab, Stanford University,
          <year>2022</year>
          . URL: https: //plato.stanford.edu/entries/scientific-discovery/, first published March 
          <volume>6</volume>
          , 2014; substantive revision October 
          <volume>31</volume>
          , 
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Langley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Bradshaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Zytkow</surname>
          </string-name>
          ,
          <article-title>Scientific discovery: computational explorations of the creative process</article-title>
          , MIT Press, Cambridge, MA, USA,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jacobsen</surname>
          </string-name>
          , R. de Miranda Azevedo,
          <string-name>
            <given-names>N.</given-names>
            <surname>Juty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Batista</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cornet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Courtot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Crosas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Evelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Goble</surname>
          </string-name>
          , G. Guizzardi,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hasnain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hettne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heringa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Hooft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Imming</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jefery</surname>
          </string-name>
          , Keith G. an Kaliyaperumal,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Kersloot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Kirkpatrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Labastida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Magagna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>McQuilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Meyers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montesanti</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Reisen</surname>
          </string-name>
          , P. RoccaSerra,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pergl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-A.</given-names>
            <surname>Sansone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. O. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Strawn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Thompson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Waagmeester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weigel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Willighagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wittenburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mons</surname>
          </string-name>
          , E. Schultes,
          <article-title>Fair principles: Interpretations and implementation considerations</article-title>
          ,
          <source>Data Intelligence</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <fpage>10</fpage>
          -
          <lpage>29</lpage>
          . URL: https://doi.org/10.1162/dint_r_00024. doi:
          <volume>10</volume>
          .1162/dint_r_00024. arXiv:https://direct.mit.edu/dint/article-pdf/2/1-2/10/1893430/dint
          <volume>0</volume>
          <fpage>0024</fpage>
          . .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Lamprecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kuzak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Arcila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M. D.</given-names>
            <surname>Pico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. D. D.</given-names>
            <surname>Angel</surname>
          </string-name>
          , S. van de Sandt, J. Ison,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>McQuilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valencia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Harrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Psomopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Gelpi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Goble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Capella-Gutierrez</surname>
          </string-name>
          ,
          <article-title>Towards fair principles for research software, Data Science 3 (</article-title>
          <year>2020</year>
          )
          <fpage>37</fpage>
          -
          <lpage>59</lpage>
          . URL: https://doi.org/10.3233/DS-190026. doi:
          <volume>10</volume>
          .3233/DS-190026. arXiv:https://doi.org/10.3233/DS-190026.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Barker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. P. C.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Lamprecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martinez-Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Psomopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Harrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gruenpeter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Martinez</surname>
          </string-name>
          , T. Honeyman,
          <article-title>Introducing the fair principles for research software</article-title>
          ,
          <source>Scientific Data</source>
          <volume>9</volume>
          (
          <year>2022</year>
          )
          <article-title>622</article-title>
          . URL: https://doi.org/10.1038/s41597-022-01710-x. doi:
          <volume>10</volume>
          .1038/s41597-022-01710-x.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hagedorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chemudugunta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Smyth</surname>
          </string-name>
          ,
          <article-title>Subject metadata enrichment using statistical topic models</article-title>
          ,
          <source>in: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries</source>
          , JCDL '07,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2007</year>
          , p.
          <fpage>366</fpage>
          -
          <lpage>375</lpage>
          . URL: https://doi.org/10.1145/1255175.1255248. doi:
          <volume>10</volume>
          .1145/1255175.1255248.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tartar</surname>
          </string-name>
          ,
          <source>The General Theory of Homogenization: A Personalized Introduction, volume 7 of Lecture Notes of the Unione Matematica Italiana</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2010</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -05195-1.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>