<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Graph for Social Sciences and Humanities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andre Valdestilhas</string-name>
          <email>a.valdestilhas@vu.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ronald Siebes</string-name>
          <email>r.m.siebes@vu.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacco van Ossenbruggen</string-name>
          <email>jacco.van.ossenbruggen@vu.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Social Sciences, Humanities, Knowledge Graph, SSHOC-NL</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vrije Universiteit Amsterdam</institution>
          ,
          <addr-line>De Boelelaan 1105, 1081 HV Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>Social Sciences and (Digital) Humanities (SSH) increasingly benefit from the online availability of datasets, services, and tools provided by their peers and governmental organizations. However, the fragmented and heterogeneous nature of the metadata makes it dificult to assess the complexity of replicating results and the potential reuse of various parts (methodology, tools, and data) for follow-up research. For example, a significant efort is required to check the access requirements and quality of the data and to check if the format is compatible with tools that a researcher is familiar with. This paper outlines the vision, structure, and objectives of the Social Science and Humanities Open Cloud for the Netherlands (SSHOC-NL), focusing on its role in building and leveraging a national SSH knowledge graph. By developing a unified data environment, semantic integration tools, and interactive Data Stories, this SSHOC-NL efort transforms disparate data into interconnected scientific knowledge. This work aims to equip Dutch researchers with the tools and skills necessary to navigate complex societal challenges through a rich, semantically linked data landscape, representing a critical step towards realizing the promise of the European Open Science Cloud (EOSC) within the Dutch SSH domain.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The proliferation of digital data has catalyzed a profound transformation in the Social Sciences and
Humanities (SSH). Researchers are no longer limited to traditional qualitative and quantitative methods
but are increasingly leveraging large-scale, complex, and often sensitive datasets to investigate and
historical phenomena with unprecedented depth and scale [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This data-driven paradigm, however,
depends critically on the availability of robust, accessible, and well-managed research infrastructures
capable of transforming raw data into structured, interconnected knowledge. The concept of a
knowledge graph, which represents entities, their attributes, and their relationships in a structured format,
including a scalable, efective, and user-friendly interface, ofers a powerful approach to achieving this
integration and enhancing scientific discovery.
      </p>
      <p>
        In line with global movements towards open science, the FAIR guiding principles - which state
that data should be findable, accessible, interoperable and reusable - have become the cornerstone of
modern scientific data management [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Adherence and assessment [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to these principles is essential
for ensuring transparency, reproducibility, and the cumulative nature of scientific inquiry, all of which
are prerequisites for building efective knowledge graphs. Furthermore, the European Open Science
Cloud (EOSC) initiative seeks to federate existing research data infrastructures to provide a seamless
environment for data storage, management, analysis, and reuse across disciplines and borders [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
with knowledge graphs emerging as a key enabler for semantic interoperability within such federated
systems.
      </p>
      <p>Despite these ambitions, the Dutch SSH data landscape is characterized by significant fragmentation
and heterogeneity, which impedes the construction and utilization of a unified knowledge graph. To</p>
      <p>CEUR</p>
      <p>ceur-ws.org
address this, the Social Science and Humanities Open Cloud for the Netherlands (SSHOC-NL) was
conceived.</p>
      <p>This paper details the rationale, design, and demonstration of the SSHOC-NL Knowledge Graph. We
show the potential of Linked Data schemas and vocabularies, semantically enriching and connecting
research output, including papers, tools, and research data, integrated national infrastructure designed
to facilitate the creation and exploitation of a comprehensive knowledge graph to empower the next
generation of SSH research in the Netherlands.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The Knowledge Graph and Data Stories</title>
      <p>The website kg.odissei.nl serves as a central hub for the SSHOC-NL knowledge graph. This platform is
designed to house and present data stories, making complex social science data, publications, projects
and tool more accessible, connected and understandable.</p>
      <p>The kg.odissei.nl website currently functions as a container for 22 data stories, which are built
upon 23 datasets and utilize 242 queries. These numbers reflect the current scale of the knowledge
graph, which aims to integrate disparate data sources to facilitate comprehensive research.</p>
      <sec id="sec-2-1">
        <title>2.1. Schemas, vocabularies and graphs</title>
        <p>To illustrate how we utilize schemas, vocabularies, and graphs to enhance interoperability, our
Knowledge Graph (KG) integrates well-known and domain specific schemas and vocabularies, including
the Simple Knowledge Organization System (SKOS)1, the Bibliographic Ontology (BIBO)2, and DCMI
Metadata Terms3.</p>
        <p>A practical example of a query - available online at https://kg.odissei.nl/odissei/-/queries/
A-toolsClariahOdissei/9 - demonstrates a real case that utilizes common vocabulary from Schema.org
and DCMI Metadata Terms. This query retrieves information from two distinct graphs: the Clariah
project4 and the ODISSEI code library5. All schemas, vocabularies, and graphs used in our KG are
available online on our previously introduced website.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data Stories</title>
        <p>The data stories showcased on the platform demonstrate how various sources of information can be
combined to answer complex research questions. Here are three examples:
• The SSHOC-NL Knowledge Graph6: This particular data story provides an in-depth overview
and a clearer understanding of the type of information available within the graph. It highlights
how queries can be constructed to combine otherwise isolated information sources, thereby
revealing new insights. This story serves as an excellent introduction to the power and potential
of the SSHOC-NL knowledge graph itself.
• Exploring the Media Content Analysis Lab (MCAL) dataset7: This data story focuses on
the systematic analysis of large corpora of digital media content. This includes various forms
of media from the Netherlands, such as traditional outlets (television, newspapers), new media
(online news outlets), and social media platforms (Twitter, Instagram, Facebook). The MCAL
dataset enables researchers to delve into the nuances of media representation and discourse.
1http://www.w3.org/2004/02/skos/core
2https://dcmi.github.io/bibo/
3purl.org/dc/terms
4https://tools.clariah.nl/data/
5https://w3id.org/odissei/ns/kg/graph/codelib/cbs
6https://kg.odissei.nl/odissei/-/stories/ODISSEI-Knowledge-Graph-the-story
7https://kg.odissei.nl/odissei/-/stories/mcal-basic-data-overview-story
• Tutorial SSHOC-NL8: This data story aims to introduce knowledge graphs to members of the
ODISSEI and SSHOC-NL projects. The goal is to improve understanding of the fundamental
concepts, applications, and benefits of knowledge graphs in research. In this tutorial, we will
explore how knowledge graphs can efectively organize and represent complex data relationships,
allowing for more eficient data analysis and insights. Whether you are new to the concept or
looking to deepen your understanding, this resource will serve as a valuable starting point in
your journey with knowledge graphs.</p>
        <p>The primary objective of the SSHOC-NL platform is to simplify the process of finding, accessing,
reusing, and fully exploiting the potential of available information. By addressing the challenge of
isolated data through the abstract concept of a knowledge graph, SSHOC-NL aims to foster more
interconnected and impactful social science research.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. The proof of concept</title>
        <p>The SSHOC-NL Knowledge Graph (KG) is a prototype designed to explore the potential of knowledge
graphs to establish future research infrastructures. It serves as a structured repository that connects
various pieces of information, aiming to facilitate research and data sharing within the ODISSEI
ecosystem.</p>
        <p>The data integrated into the ODISSEI KG originates from diverse sources, as illustrated in fig. 1.
This includes information on research projects conducted within the CBS9 remote access environment,
research articles that have resulted from these projects, and research software developed for the LISS 10
and CBS11 projects. Furthermore, all data sets accessible through the Dataverse API12 are incorporated
into the knowledge graph, ensuring a comprehensive collection of relevant research data and the main
concepts illustrated in fig. 2, also available online13.</p>
        <p>The organization of the ODISSEI KG is guided by several key design principles, including the reuse of
existing linked data standards and persistent identifiers. For instance, SKOS is utilized for vocabularies
and Dublin Core for publication metadata, while persistent identifiers such as DOIs for papers and
datasets, ORCIDs for authors, and RORs for institutes are employed. Data are processed and organized
through an ETL (extract, transform, and load) pipeline, which uses specific scripts to ingest and transform
raw data into a linked data format, and this pipeline is managed via a GitLab CI/CD system14.</p>
        <p>More information about the SSHOC-NL Knowledge Graph, such as documentation, queries,
experiments reproduction, and data stories, is available at https://kg.odissei.nl/.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Discussion</title>
      <p>The development of SSHOC-NL, with its central focus on a national knowledge graph, represents a
transformative step for the Dutch research landscape. It is designed from the ground up to be a key
component of the European Open Science Cloud. By adopting international semantic standards and
fostering interoperability, SSHOC-NL aims to ensure that Dutch researchers can contribute to and
benefit from the broader European knowledge ecosystem. However, the success of the project will
depend not only on the technology but also on the active participation of the research community. The
focus on training in knowledge graph technologies and community building is therefore critical for
embedding semantic web principles and Open Science practices into the culture of SSH research.
8https://kg.odissei.nl/odissei/-/stories/Tutorial-SSHOC-NL
9https://www.cbs.nl/
10https://portal.odissei.nl/dataverse/liss
11https://portal.odissei.nl/dataverse/cbs
12https://dataverse.nl/
13https://github.com/odissei-data/odissei-kg/blob/acceptance/static/mainConcepts.ttl
14publicy available in GitHub as well at https://github.com/odissei-data/odissei-kg</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>SSHOC-NL is a timely and essential investment in the future of the Social Sciences and Humanities in
the Netherlands, specifically through its commitment to building a national knowledge graph. It directly
addresses the critical bottleneck of data fragmentation and semantic disconnect by delivering a unified,
secure, and user-centric cloud infrastructure centered around interconnected knowledge. Through its
integrated portal, semantic integration eforts, and demonstrations, we hope that the SSHOC-NL KG
will provide researchers with unprecedented access to the nation’s rich SSH data resources, tools, and
research output. Our future plans include the use of the Croissant metadata format 15, integration with
ORKG16, OpenCitations17, and the SODA code18. Additional information is available on the project
website https://sshoc.nl/.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors utilized Grammarly to correct and spell-check, as well
as to improve the text’s grammatical readability. After using the tool, the authors reviewed and edited
the content as needed to take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Borgman</surname>
          </string-name>
          ,
          <article-title>Big data, little data, no data: Scholarship in the networked world</article-title>
          , MIT press,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          , et al.,
          <article-title>The fair guiding principles for scientific data management and and stewardship</article-title>
          ,
          <source>Scientific Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Valdestilhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Windhouwer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Siebes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Comparing fair assessment tools and their alignment with fair implementation profiles using digital humanities datasets</article-title>
          , in: International Workshop of Semantic Digital Humanities,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <article-title>Turning FAIR into reality: Final report and action plan from the European Commission expert group on FAIR data</article-title>
          ,
          <source>Technical Report, Publications Ofice of the European Union</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>