<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Provenance Research with Linked Data: A Visual Approach to Knowledge Discovery</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sarah Binta Alam Shoilee</string-name>
          <email>s.b.a.shoilee@vu.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Annastiina Ahola</string-name>
          <email>annastiina.ahola@aalto.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Heikki Rantala</string-name>
          <email>heikki.rantala@aalto.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eero Hyvönen</string-name>
          <email>eero.hyvonen@aalto.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victor de Boer</string-name>
          <email>v.de.boer@vu.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacco van Ossenbruggen</string-name>
          <email>jacco.van.ossenbruggen@vu.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Susan Legene</string-name>
          <email>s.legene@vu.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aalto University, Semantic Computing Research Group (SeCo)</institution>
          ,
          <addr-line>Konemiehentie 2, 02150 Espoo</addr-line>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Helsinki, Helsinki Centre for Digital Humanities (HELDIG)</institution>
          ,
          <addr-line>Unioninkatu 40, 00170 Helsinki</addr-line>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Vrije Universiteit Amsterdam</institution>
          ,
          <addr-line>De Boelelaan 1105, 1081 HV Amsterdam</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Provenance research is critical for understanding the historical trajectories of cultural objects housed in museums, yet it is often hindered by fragmented, ambiguous, or missing data. With the increasing adoption of Linked Data (LD) in cultural heritage, new possibilities emerge for analysing provenance metadata. This paper presents the PM-Sampo demonstrator, a structured approach to analysing provenance data through Linked Data methodologies and visualisation techniques. By connecting historical events, places, and actors to object collections and analysing data with visualisation tools, PM-Sampo aims to facilitate large-scale provenance analysis, enabling domain researchers to detect patterns, inconsistencies, and hidden connections that could otherwise go unnoticed. A case study on objects from Dutch museums associated with the Aceh War (1873-1914), an armed conflict between the Netherlands and the Muslim sultanate of Aceh, illustrates the functionalities of the demonstrator, revealing gaps in acquisition records, unexpected geographical distributions, and acquisition timelines extending well beyond the formal end of the conflict. The establishment of actor-connections further brings to the surface overlooked relationships between individuals and institutions, while provenance visualisation highlights the need for more comprehensive provenance documentation by domain experts. The study underscores the opportunities of data-driven approaches in provenance research, demonstrating how visualisation tools can aid in knowledge discovery and exploring knowledge gaps.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Linked Data</kwd>
        <kwd>Visualisation</kwd>
        <kwd>Provenance Research</kwd>
        <kwd>Cultural Heritage</kwd>
        <kwd>Knowledge Discovery</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Provenance research is a critical discipline in cultural heritage studies, focussing on the trace of origins,
ownership history, and movement of objects across time and space [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It provides essential insights into
ethical, legal, and historical contexts of collections, particularly in cases of contested ownership, colonial
acquisitions, and restitution claims [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. By reconstructing an object’s past, provenance research helps
museums, scholars, and policymakers make informed decisions about collections, ensuring transparency,
accountability, and historical justice. Beyond ethical and legal considerations, provenance research
plays a crucial role in deepening our understanding of the historical, social, and economic forces that
shaped the circulation of cultural objects.
      </p>
      <p>However, provenance research is inherently complex due to the fragmented nature of historical
records, inconsistent documentation practices, and the large scale of data sources involved. Provenance
information is often dispersed across archival materials, museum databases, and historical texts, making
it a time-consuming job to consolidate and analyse systematically. With the increasing adoption of</p>
      <p>Linked Data (LD) in cultural heritage, new possibilities are emerging for structuring, connecting, and
analysing provenance metadata at scale. LD enables the integration of heterogeneous datasets, linking
historical events, places, and actors to object collections across multiple institutions. This interconnected
approach facilitates a more comprehensive analysis of provenance information, allowing researchers to
detect patterns, inconsistencies, and previously overlooked connections. This research aims to address
some of the provenance research challenges through Linked Data visualisation techniques, ofering
interactive tools, such as timeline analysis, redirect links, geospatial mapping, and faceted search to
untangle the intricate relationships between objects, collectors, places, events, and time periods.</p>
      <p>
        This current research is conducted within the framework of Pressing Matter1, a project investigating
the ownership, value, and historical significance of colonial heritage in museums. In this context,
provenance research is approached through Actor-Network Theory (ANT) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which conceptualises
objects as relational entities interconnected with people, places, and events that shape their histories. In
a broader context, this research contributes to ongoing eforts to establish a framework for dealing with
colonial heritage in museums, moving beyond traditional notions of ownership to examine broader
themes of exchange, violence, evidence, and historical accountability.
      </p>
      <p>This paper introduces the PM-Sampo demonstrator, a structured approach that leverages Linked Data
methodologies and visualisation techniques to support large-scale provenance analysis. The aim of
PM-Sampo is to enable researchers in the domain to explore the relationships between objects, collectors,
institutions, and historical contexts in a more systematic and interactive manner. The demonstrator
provides tools such as geospatial mapping, timeline analysis, and connection visualisations to enhance
provenance interpretation and knowledge discovery. To illustrate its functionalities, this study applies
PM-Sampo to a case study on objects from Wereldmuseum2 associated with the Aceh War (1873—1914),
a significant colonial conflict between armed Dutch military power and the Sultanate of Aceh, Indonesia.</p>
      <p>Concretely, this paper contributes: (1) a structured approach for querying provenance information
across people, places, events, time, and objects through faceted search, (2) a demonstration of
visualisation interfaces, such as geospatial-object networks, acquisition timelines, and connection visualisation
among entities in provenance records, and (3) a Linked Data-driven web application to enhance object
provenance reconstruction and to aid knowledge discovery. This research aims to support museums,
researchers, and policy makers in addressing the complexities of provenance studies.</p>
      <p>The remainder of this paper is structured as follows: Section 2 reviews related work, Section 3
describes the dataset used, Section 4 details the implementation of PM-Sampo, and Section 5 explores
its functionality in provenance research. Finally, Section 6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Knowledge Discovery for provenance research intersects with multiple fields, including Knowledge
Discovery in Databases (KDD), Digital Humanities, and Computational Cultural Heritage. The concept
of KDD, defined by Fayyad et al., is a process of identifying valid, novel, and potentially useful patterns
in data through data mining techniques such as classification, clustering, and association rule mining
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the context of Linked Data, knowledge discovery can be approached through graph data mining,
where data patterns are explained through interconnected links [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Another relevant approach involves
utilising background knowledge to navigate networks and uncover new information [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Both methods
are valuable for digital humanities, where computational techniques can complement expert knowledge
to facilitate discovery and interpretation in provenance research.
      </p>
      <p>
        Heritage object metadata is inherently complex, making knowledge graphs (KGs) a suitable
representational framework for structuring and analysing these relationships. However, it remains an open
challenge to assess the extent to which relational learning models can support knowledge discovery
in cultural heritage [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. While humanities research has traditionally been skeptical of automated and
      </p>
      <sec id="sec-2-1">
        <title>1Pressing Matter project homepage: https://pressingmatter.nl</title>
        <p>
          2Wereldmuseum is a Dutch museum where a large portion of their housed collections has embedded colonial past.
quantitative approaches [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], recent studies highlight the benefits of data-driven methodologies in
expanding research perspectives and scalability [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          The increasing availability of cultural heritage datasets from museums, libraries and archives has
fuelled the integration of computational approaches within digital humanities [
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ]. Big Data
methodologies now allow for large-scale analysis of historical records, shifting digital humanities
research beyond simple data visualisation to more autonomous problem solving tools [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. In the context
of provenance research, relational search (RS) or uncovering meaningful semantic associations between
entities could be a promising approach. RS has been applied in diverse domains, including national
security [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], medical research [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], and cultural heritage studies [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ]. By employing RS techniques,
provenance research can move beyond static metadata exploration to reveal dynamic relationships
among objects, people, and events.
        </p>
        <p>
          In existing work, the Sampo model [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] provides a Linked Data-driven approach to cultural
heritage research. The Sampo-UI framework [
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ] ofers a semantic portal framework that allows easy
integration of heterogeneous datasets, enabling advanced search, visualisation, and analysis across
interconnected entities. Previous implementations, such as WarSampo (for World War II data),
BiographySampo (for biographical data), and MMM Sampo (for mapping manuscript migrations), demonstrate
its applicability in structuring and exploring cultural heritage data through knowledge graphs [
          <xref ref-type="bibr" rid="ref15 ref16">16, 15</xref>
          ].
By employing semantic technologies, Sampo model enables relational searches, facilitating the discovery
of hidden connections between people, places, and events. Current research is based on the Sampo
model, adapting its principles for provenance analysis to uncover patterns in object histories, acquisition
networks, and collector relationships. We here present a Linked Data-driven demonstrator, PM-Sampo,
aimed at uncovering the various ways that heritage objects could potentially be associated with specific
historical events, places, and times through their collectors’ acquisition patterns.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data Description</title>
      <p>The PM-Sampo demonstrator primarily utilises object collection data from the Wereldmuseum 3,
published through the Colonial Collections Data Hub 4 initiative. The Wereldmuseum is a conglomerate
comprising Wereldmuseum Leiden, Wereldmuseum Amsterdam, and Wereldmuseum Rotterdam. A
large part of the collections housed within these museums originate from colonial-era commissions by
the Dutch Ministry of Colonies and scientific expeditions to former Dutch colonial territories. Many
of these objects were acquired as part of the larger Colonial Institute in the late 19th and early 20th
centuries.</p>
      <p>The Colonial Collections Data Hub previously published Wereldmuseum datasets in Linked Data
format along with the publication of the SPARQL endpoint5, making object metadata more accessible
and structured based on a shared domain ontology, that is, CIDOC-CRM ontologies [19] and following
the Linked Art recommendations6. The dataset published through the portal originates from the
Wereldmuseum’s internal collection management systems and has been refined for structured analysis
for provenance research. It captures diverse aspects of provenance, historical context, and object
metadata through multiple structured graphs.</p>
      <p>The PM-Sampo demonstrator builds upon this openly available dataset, simplifying and enriching
it where necessary to improve usability for large-scale provenance analysis. To integrate existing
dataset into the PM-Sampo demonstrator and to make query time faster, lightweight ontologies (“facet
ontologies") were developed to shorten and simplify paths of key concepts, such as acquisition records,
to historical events and to their geographic locations7. Furthermore, data enrichment processes were
applied, i.e., GeoNames data extraction, to enhance geospatial mapping by associating latitude and</p>
      <sec id="sec-3-1">
        <title>3Wereldmuseum webpages: https://\{amsterdam/leiden.rotterdam\}.wereldmuseum.nl</title>
        <p>4Colonial Collection Hub data portal: https://data.colonialcollections.nl
5Wereldmuseum Linked Data endpoint: https://api.colonialcollections.nl/datasets/nmvw/collection-archives/sparql
6Linkedart webpage: https://linked.art
7The data conversion process is documented in the GitHub repository along with the current schema: https://github.com/
Shoilee/PM-SampoDataManager
longitude coordinates with relevant production places. The new version of the data can be accessed
through the SPARQL endpoint: http://ldf.fi/pm-sampo/sparql</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. PM-Sampo Implementation</title>
      <p>
        The data service is separated completely from the PM-Sampo demonstrator; only the external SPARQL
endpoint is used to access the data within the portal. The development of PM-Sampo semantic portal
is supported by the Sampo-UI framework [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ]. The Sampo-UI is a framework that ofers software
developers a starting base to build a JavaScript web application, which can be customised with minimal
efort to create LD applications. On the technical side, the framework consists of two main components:
(1) a client-side interface built using the well-established React8 and Redux9 libraries and (2) a Node.js10
back-end developed with the Express11 framework.
      </p>
      <p>To extract the benefit from the Sampo-UI framework, developing of PM-Sampo demonstrator was
started from an existing portal demonstration12 whose configurations are then modified declaratively
to meet the requirements of provenance analysis and the data models of the current dataset. In our
case, a first small demo that provided a search and browsing interface to inspect the data was actually
implemented during the first project meeting. Subsequent improvement and alteration have been made
to meet the requirements of the provenance researchers obtained from previous research [20]. In the
PM-Sampo application, the user first lands on the landing page with several application perspectives
to the data. Perspectives are based on target entity classes of the underlying knowledge graph, i.e.,
Objects, Provenance Events, Historical Events, and Actors. The PM-Sampo demonstrator is available on
the Web at https://pmsampo.demo.seco.cs.aalto.fi/en/ and the source code for the demonstrator was
published on GitHub: https://github.com/Shoilee/PM-Sampo/releases/tag/v1.0</p>
      <p>Following the Sampo model principles, the usage cycle of each perspective can be divided into
two steps: 1) filter instances of the class(es) corresponding to the perspective and 2) create diferent
visualisations to analyse the result instances. The data is filtered using the faceted semantic search [ 21]
tools provided by the portal where the properties of the perspective class are used as facets. The results
and facet options including hit counts are updated after each selection of a facet, making it possible for
the user to precisely filter the end-result entities by diferent properties. The hit counts help the end
user to direct the search towards promising facet selections and prevent the user from ending up in dead
ends with no results (hits). After filtering the data to a desired subset, the user can analyse the result
set, i.e., a set of instances of the class corresponding to the application perspective, with integrated
data-analytic tools available as tabs on the application perspective page. In the same way, data-analytic
tabs can be integrated with instance pages that aggregate information about the individual entities of
the application perspective.</p>
      <p>A key enhancement in PM-Sampo was an advanced faceted search that extended filtering beyond
direct entity properties to include other related entities. For example, objects could be filtered based not
just on object attributes but also on associated historical events and actors involved in acquisitions,
ofering a more context-aware exploration of provenance data. In addition to that, targeted data analytics
visualisations were integrated, including a faceted search results table, a summarisation pie chart on
facet filters, a production places map for the object perspective, a provenance tab for object instances
listing provenance events with the aim of creating an object biography, and a provenance events timeline
for chronological insights.</p>
      <p>Another notable addition was the Related Tab, which inferred new relationships not explicitly defined
in the knowledge graph. Two key new connections were introduced: (1) actor-to-actor links through
shared objects and (2) actor-to-historical event links, established through SPARQL queries when actors</p>
      <sec id="sec-4-1">
        <title>8React webpage: https://reactjs.org</title>
        <p>9Redux webpage: https://redux.js.org
10Node.js webpage: https://nodejs.org/en
11Express webpage: https://expressjs.com
12Sampo-UI information page: https://seco.cs.aalto.fi/tools/sampo-ui/
contributed to object acquisition and objects are associated with historical events. These relationships
are visualised through lists and explained through intermediary objects.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Use-cases</title>
      <p>PM-Sampo demonstrator provides a structured way to analyse provenance data by linking and
visualising historical events, places, and actors to object collections. Objects related to the historical event
“Aceh Oorlog (Sumatra)" (Aceh War 1873–1914) serve as an ideal case for demonstrating the various
functions of PM-Sampo. This subset of data has been thoroughly studied, containing high-quality
metadata that aids in identifying connections with the war in Aceh, Indonesia in the 19th and 20th
centuries.</p>
      <sec id="sec-5-1">
        <title>5.1. Geographic Distribution of Objects</title>
        <p>One of the initial observations through PM-Sampo demonstrator is the geographic spread of objects
related to the Aceh War, which is visualised in Figure 1. Although the war itself occurred in Aceh,
the dataset shows that the objects’ production places are from diverse locations, including India
and the United States. A visual representation using a place-object mapping function communicates
these interesting connections, highlighting the possible circulation of objects produced beyond Aceh.
This challenges common assumptions that the war loot is localised to the geographical location and
emphasises the necessity for investigation into object migration patterns.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Temporal Analysis: Acquisition Timeline</title>
        <p>Although the Aceh War formally ended in 1914, the PM-Sampo demonstrator reveals that object
collection activities associated with this historical event extended until 2010. This finding raises
significant questions about post-war circulation and reassessment of war-related artifacts. By visualising
an acquisition timeline (in Figure 2), PM-Sampo allows researchers to observe trends in collection
histories, revealing inconsistencies with common assumption and continuities in provenance data.</p>
        <p>Notably, the timeline highlights significant spikes in acquisition events in the years 1907 and 1959. A
closer examination of the data through the portal reveals that in 1907, a substantial proportion of these
acquisitions —550 out of 564 objects (97%)— originated from Theodorus Jacobus Veltman. This finding
aligns with archival records indicating that in 1907, Veltman, a former soldier in the Dutch colonial
forces, sold 753 objects, primarily from Indonesia, to the Museum Volkenkunde (the predecessor of
Wereldmuseum). The provenance of these objects raises further questions: while 550 of the 753 objects
are linked to the Aceh War, the remaining 203 objects from the same acquisition remain unaccounted
in this context. This discrepancy invites further investigation into the documentation of these objects,
as well as the broader mechanisms of collection and documentation in colonial-era acquisitions.</p>
        <p>The spike in acquisitions recorded in 1959 can be attributed to the incorporation of the Ethnographic
Museum Justinus van Nassau in Breda as a subsidiary of the Volkenkunde Museum in Leiden (a
predecessor of Wereldmuseum). When the museum in Breda was permanently closed in 1993, its ethnographic
collection was transferred to the Volkenkunde Museum, explaining the recorded acquisition events from
this period. However, it is crucial to note that while metadata documents the transfer of objects between
institutions, it does not capture the historical context of displacement of these objects from their places
of origin. One of the insightful features in PM-Sampo is provenance visualisation (cf. Figure 3a) that
acquisition, which make this gap in the record apparent. One other thing becomes evident though this
visual is that two attributes of acquisition event time and actor are never part of the same instance,
which is clearly a gap in the record (further explanation is given in Section 5.5). Using a visual interface,
such as provenance timelines, the demonstrator enables researchers to systematically track how objects
moved through various hands, institutions, and contexts.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Actor-Network Visualisation</title>
        <p>PM-Sampo plays a crucial role in establishing and analysing actor networks, which is valuable in
provenance research. For example, objects collected by specific individuals, such as H.T. (Henri Titus)
Damsté, are well documented in relation to historical events due to his connection with Aceh war.
Objects associated with his wife, I.F. (Isabella Franciska) Damsté-Muller, may not be categorised with
the same historical context. (However, in this particular case, both Mr. and Mrs. Damsté are explicitly
connected to the Aceh War.) Their connection, which is established through the chain of custody within
(a) Provenance visualisation of an object from</p>
        <p>Volkenkunde Museum Justinus van Nassau. The
acquisition date and acquisition actor is not
attributed to same acquisition event. Metadata
lacks history prior to museum acquisition.
(b) Visualisation of Actor-Actor relationship.</p>
        <p>Through shared objects, PM-Sampo established
link between H.T. (Henri Titus) Damsté and his
wife I.F. (Isabella Franciska) Damsté-Muller and
visualised with Related Actors tab.
PM-Sampo (see Figure 3b), underscores the demonstrator’s ability to reveal hidden relationships, such
as family bonds, which could otherwise remain undetected in standard provenance metadata.</p>
        <p>Future Work. Additionally, creating a visualisation of the actor network could significantly enhance
the understanding of these connections. A network representation would allow researchers to identify
hidden relationships, detect inconsistencies in metadata, and analyse the broader context of object
circulation more efectively. This approach would provide a more intuitive way to explore provenance
data, ensuring a clearer representation of how individuals, institutions, and objects are interlinked over
time. Thus, implementing such a visualisation will be a key focus for future development.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Scaling Up Provenance Research</title>
        <p>The ability to conduct large-scale provenance analysis and hypothesis generation is one of the key
objectives of PM-Sampo. Without the support of data analysis, provenance research often remains
limited to individual case studies due to the overwhelming amount of historical data. By leveraging
linked data methodologies and existing metadata, PM-Sampo makes it possible to analyse thousands of
objects simultaneously, visualising patterns, inconsistencies, and raising new research questions.</p>
        <p>One of the critical advantages of this approach is its ability to visualise trends that might otherwise
go unnoticed. For instance, a significant increase in object acquisitions occurring simultaneously with
the onset of a historical event in a specific geographic region may raise questions about how these
artifacts entered museum collections and whether their provenance has been accurately documented.
For example, while the Aceh War formally began in 1873, provenance data reveals a considerable
spike in acquisition events from museums in Aceh, Lombok, and Bali around the same period. Even if
these objects are not explicitly documented as being connected to the Aceh War, the visualisation of
acquisition patterns allows researchers to identify potential correlations that may not be immediately
evident in textual metadata. This capability is particularly valuable for domain experts, as it enables them
to assess whether spikes in acquisitions align with contemporary historical events and to investigate
possible links between these events and the circulation of cultural objects.</p>
        <p>Future work. Additionally, predicting historical events based on multidimensional patterns can
provide insights into undocumented occurrences by analysing correlations between persons, places,
time periods, and events, ultimately refining existing provenance narratives. Furthermore, deducing
historical events from geographical and temporal trends allows reconstructing historical occurrences.
Although visualisation techniques facilitated by PM-Sampo can assist in inferring new knowledge and
generate hypotheses, the usefulness of these inferences is still subject to further study.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Missing or Ambiguous Data</title>
        <p>A major challenge in provenance research is the ambiguity and incompleteness of historical records.
PM-Sampo communicate this through facets that require categorisation of objects based on property
values. Analysing this property value, it becomes apparent that same granularity level have not been
maintained across dataset. For example, some objects are only linked to a broad origin (e.g., “Indonesia")
rather than a precise location (e.g., “Aceh"). Similarly, certain collectors can only be identified at the
organisational level rather than individually. By providing an overview of the distribution of determined
versus undetermined provenance attributes, a pie chart visualisation could communicate the extent of
missing data, guiding further research priorities.</p>
        <p>Visualisation of provenance data reveals several significant gaps and inconsistencies in acquisition
records. In particular, acquisition sites are almost never present, making it dificult to trace the
geographical trajectory of objects before their entry into museum collections. Additionally, acquisition events
mainly document when objects were acquired by museums, rarely capturing their prior movements
through diferent hands before that. When examining provenance linked to actors, it becomes evident
that these records predominantly list organisations on the receiving end of acquisition rather than
individual collectors, further obscuring the specific pathways through which objects circulated. In
addition, acquisition dates and acquisition actors are never attributed to the same acquisition instance,
creating a disconnect in metadata that complicates eforts to establish a comprehensive and continuous
chains of custody for these objects. Through visual communication, PM-Sampo makes it easy to identify
which metadata needs to be recorded for better findability and accessibility of provenance record to
foster future research.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The PM-Sampo demonstrator shows it potential to be a valuable tool for provenance research, ofering
new ways to explore historical objects’ provenance, acquisition patterns, and hidden relationships
among collectors, events, and places. The use-case analysis reveals critical gaps in acquisition records,
unexpected geographical distributions of collected objects, and acquisition timelines extending well
beyond the formal end of the conflict. Additionally, the establishment of actor connections uncovers
previously unnoticed relationships between individuals and institutions involved in the movement of
these objects. The findings underscore the importance of structured provenance data and highlight the
need for more comprehensive documentation practices.</p>
      <p>By visualising provenance gaps, linking actors, and uncovering inconsistencies, PM-Sampo facilitates
a more comprehensive understanding of colonial-era collections. These insights provide a strong
foundation for further historical analysis and ethical reassessment of museum holdings related to contested
colonial histories. At the same time, by demonstrating the advantages of publishing provenance
metadata as Linked Open Data (LOD), this paper advocates for its wider adoption. Through visualisation,
we illustrate how structured, open data can support provenance research, encourage interdisciplinary
analysis, and contribute to the broader cultural heritage domain.</p>
      <p>Acknowledgments. This research was supported by the NWA-funded project Pressing Matter
(NWA.1292.19.419), by the Research Council of Finland FIN-CLARIAH funding from the European
Union NextGenerationEU instrument, and by the Aalto Science Institute (ASCI) Visiting Doctoral
Researcher Programme. Computing resources provided by the CSC – IT Center for Science were used
in our work.</p>
      <p>Declaration on Generative AI. Authors acknowledge the use of Writefull’s AI tools embedded in
Overleaf environment during the writing of the manuscript solely for grammar and spelling checks.
[19] M. Doerr, The cidoc conceptual reference module: an ontological approach to semantic
interoperability of metadata, AI Magazine 24 (2003) 75–75.
[20] S. B. A. Shoilee, V. de Boer, J. van Ossenbruggen, Polyvocal knowledge modelling for ethnographic
heritage object provenance, in: Knowledge Graphs: Semantics, Machine Learning, and Languages,
volume 56, IOS Press, Leipzig, Germany, 2023, pp. 127–143.
[21] D. Tunkelang, Faceted search, Synthesis Lectures on Information Concepts, Retrieval, and Services,
Morgan &amp; Claypool, 2009.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tompkins</surname>
          </string-name>
          , Provenance Research Today, Lund Humphries,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1111/cura.12528.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sarr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Savoy</surname>
          </string-name>
          , The Restitution of African Cultural Heritage, Ministère de la Culture,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Latour</surname>
          </string-name>
          ,
          <article-title>Reassembling the social: an introduction to actor-network-theory, Clarendon lectures in management studies</article-title>
          , Oxford University Press, Oxford ; New York,
          <year>2005</year>
          . OCLC: ocm58054359.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>U.</given-names>
            <surname>Fayyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Piatetsky-Shapiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Smyth</surname>
          </string-name>
          ,
          <article-title>From data mining to knowledge discovery in databases</article-title>
          ,
          <source>AI</source>
          Magazine
          <volume>17</volume>
          (
          <year>1996</year>
          )
          <article-title>37</article-title>
          . doi:
          <volume>10</volume>
          .1609/aimag.v17i3.
          <fpage>1230</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Tiddi</surname>
          </string-name>
          , M.
          <string-name>
            <surname>d'Aquin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Motta</surname>
          </string-name>
          ,
          <article-title>Data patterns explained with linked data</article-title>
          , in: A.
          <string-name>
            <surname>Bifet</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>May</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Zadrozny</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Gavalda</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Pedreschi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bonchi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Cardoso</surname>
          </string-name>
          , M. Spiliopoulou (Eds.),
          <source>Machine Learning and Knowledge Discovery in Databases</source>
          , Springer International Publishing, Cham,
          <year>2015</year>
          , pp.
          <fpage>271</fpage>
          -
          <lpage>275</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zuckerman</surname>
          </string-name>
          ,
          <article-title>Tracking looted art with graphs</article-title>
          .,
          <source>Graphs and Networks in the Humanities 2022 Conference, February 3-4</source>
          ,
          <year>2022</year>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <article-title>Using the semantic web in digital humanities: Shift from data publishing to dataanalysis and serendipitous knowledge discovery</article-title>
          ,
          <source>Semantic Web</source>
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>187</fpage>
          -
          <lpage>193</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>M. G. Kirschenbaum,</surname>
          </string-name>
          <article-title>The remaking of reading: Data mining and the digital humanities, in: The National Science Foundation symposium on next generation of data mining and cyber-enabled discovery for innovation, Baltimore</article-title>
          , MD, volume
          <volume>134</volume>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Falkenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barzen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Breitenbücher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brügmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Joos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Leymann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wurster</surname>
          </string-name>
          ,
          <article-title>Pattern research in the digital humanities: how data mining techniques support the identification of costume patterns</article-title>
          ,
          <source>Computer Science - Research and Development</source>
          <volume>32</volume>
          (
          <year>2017</year>
          )
          <fpage>311</fpage>
          -
          <lpage>321</lpage>
          . doi:
          <volume>10</volume>
          . 1007/s00450-016-0331-6.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>McCarty</surname>
          </string-name>
          , Humanities Computing,
          <source>International series of monographs on physics, Palgrave Macmillan London</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Gardiner</surname>
          </string-name>
          , R. G.
          <article-title>Musto, The Digital Humanities: A Primer for Students and Scholars</article-title>
          , Cambridge University Press,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Aleman-Meza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. B.</given-names>
            <surname>Arpinar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bertram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Warke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Halaschek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Anyanwu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Avant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. S.</given-names>
            <surname>Arpinar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kochut</surname>
          </string-name>
          ,
          <article-title>Semantic association identification and knowledge discovery for national security applications</article-title>
          ,
          <source>Journal of Database Management on Database Technology</source>
          <volume>16</volume>
          (
          <year>2005</year>
          )
          <fpage>33</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Viswanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ilango</surname>
          </string-name>
          ,
          <article-title>Ranking semantic relationships between two entities using personalization in context specification</article-title>
          ,
          <source>Information Sciences 207</source>
          (
          <year>2012</year>
          )
          <fpage>35</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <article-title>Using the semantic web in digital humanities: Shift from data publishing to dataanalysis and serendipitous knowledge discovery</article-title>
          ,
          <source>Semantic Web</source>
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>187</fpage>
          -
          <lpage>193</lpage>
          . doi:
          <volume>10</volume>
          .3233/ SW-190386.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rantala</surname>
          </string-name>
          ,
          <article-title>Knowledge-based relational search in cultural heritage linked data, Digital Scholarship in the Humanities (DSH) 36 (</article-title>
          <year>2021</year>
          )
          <fpage>155</fpage>
          -
          <lpage>164</lpage>
          . doi:https://doi.org/10.1093/llc/ fqab042.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <article-title>Digital humanities on the Semantic Web: Sampo model</article-title>
          and portal series,
          <source>Semantic Web journal 14</source>
          (
          <year>2022</year>
          )
          <fpage>729</fpage>
          -
          <lpage>744</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-223034.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ikkala</surname>
          </string-name>
          , E. Hyvönen,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rantala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koho</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sampo-UI</surname>
          </string-name>
          :
          <article-title>A full stack JavaScript framework for developing semantic portal user interfaces</article-title>
          ,
          <source>Semantic Web</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>69</fpage>
          -
          <lpage>84</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-210428.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Rantala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ikkala</surname>
          </string-name>
          , E. Hyvönen,
          <article-title>How to create easily a data analytic semantic portal on top of a SPARQL endpoint: introducing the configurable Sampo-UI framework</article-title>
          ,
          <source>in: Proceedings of 8th International Workshop on the Visualization</source>
          and
          <article-title>Interaction for Ontologies and Linked Data co-located with the 22nd International Semantic Web Conference (ISWC</article-title>
          <year>2023</year>
          ) in Athens, Greece,
          <source>CEUR Workshop Proceedings</source>
          , Vol.
          <volume>3508</volume>
          ,
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3508</volume>
          /paper3.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>