=Paper= {{Paper |id=Vol-3110/paper5 |storemode=property |title=Feast and Famine: The Problem of Sources for Linked Data Creation |pdfUrl=https://ceur-ws.org/Vol-3110/paper5.pdf |volume=Vol-3110 |authors=Rebecca Kahn,Rainer Simon }} ==Feast and Famine: The Problem of Sources for Linked Data Creation== https://ceur-ws.org/Vol-3110/paper5.pdf
      Feast and Famine: The Problem of
      Sources for Linked Data Creation

                                Rebecca Kahn1

                                  Rainer Simon2

           1 Alexander von Humboldt Institute for Internet and Society

                                   Berlin, Germany
                      2 AIT Austrian Institute of Technology

                                   Vienna, Austria



                                   Abstract


  In this article, we reflect on some of the challenges that are encountered
  when applying Linked Data within the context of museum collec-
  tions. On the one hand, we discuss the more ostensible challenges of
  data integration, vocabulary mapping, licensing, and attribution. On
  the other, we seek to stimulate a wider discussion of the complexit-
  ies of publishing museum data (which often contains multiple layers
  of provenance and copyright information, as well as rich contextual
  metadata) using a combination of the triple-based structure of RDF
  and standard ontologies. We will show that while ideal best-practice
  applications do exist, they are not always used in the case of museum
  data ’in the wild.’ This disjuncture has significant implications for the
  responsible publication of museum data, especially in the light of re-
  cent efforts within the museum world to decolonize collections by re-
  considering the provenance and historical contexts of objects and data.




        Creative Commons License Attribution 4.0 International (CC BY 4.0).
In: Tara Andrews, Franziska Diehr, Thomas Efer, Andreas Kuczera and Joris van Zun-
dert (eds.): Graph Technologies in the Humanities - Proceedings 2020, published at
http://ceur-ws.org.




                                        86
1        Introduction
The creation of annotations is one of the basic functions of scholarly prac-
tice across disciplines (Blanke and Hedges, 2013). In the digital space, such
annotations can emulate analog practices, assuming the role of note-taking,
summarizing, or highlighting, but they can also go much further and ex-
tend existing practice by becoming a mechanism for collaboration and the
sharing of information. Semantic annotation (i.e. the marking up of text
and images with references to controlled vocabularies) enables the linking of
annotations to each other and to secondary sources, stored in repositories
and collections across the web. In this paper, we reflect on the challenges
and opportunities of linking online collections by means of digital annota-
tion. We take a two-part approach based on our experience of working with
Linked Open Data (LOD) annotations of historical sources. We start with
a practical perspective, and consider the technical and methodological dif-
ficulties of enabling these linkages, in particular when creating direct links
between objects and documents. We then explore the theoretical implica-
tions of this method, and ask whether the ability to link objects should be
tempered by questions of data provenance and data models. These questions
will be framed by reference to museum collections, which do not always lend
themselves easily to the interconnectedness and openness required to make
LOD useful.

2        Background
Pelagios (Isaksen et al., 2014) is an international digital humanities network,
which facilitates the linking of online resources documenting the past via
the place names that occur in them. In 2014, Pelagios created Recogito,1
an online environment for the geographic annotation of historical sources
(Simon et al., 2019). Fundamental to Recogito is a graph model that al-
lows scholars to annotate sources with Linked Data identifiers provided by
gazetteers, thus building connections between documents that refer to the
same places. This tool has been enthusiastically taken up by scholars, who
have been able to transfer their analog practice of annotation into the digital
sphere, and extend it by leveraging the Linked Data connections to navig-
ate between sources that share relations to place. One of the aspects which
users appreciate most is the ease of use: an essential goal of the developers
was to create a tool that was sufficiently generic to cater to different research
needs, while not being engineered beyond the requirements of the average
researcher, who may not have much interest in how Recogito works, as long
    1
        https://recogito.pelagios.org




                                        87
as it does.
   As the user base has grown, we have increasingly realized how the ability
to create useful connections crucially depends on the availability of suitable
linked data vocabularies and name authorities. With regard to geographic
annotation, for example, Recogito can only be truly useful to a specific re-
search community if it integrates a critical mass of the right gazetteers –
linked data authorities that cover that community’s domain appropriately
in terms of geographic extent, granularity, temporal and cultural focus, etc.
While such scholarly name authorities have been successfully established in
some communities (e.g. in the Classics, with the Pleiades gazetteer2 ), other
communities (e.g. scholars of Early Modern history) lack established schol-
arly gazetteers. There is a variety of reasons for this discrepancy, some of
which are rooted in the period in question; for example, as McDonough
and van der Camp point out, in early modern France, place names were in
political and geographical flux in the years before and after the French re-
volution, making it difficult to match pre- and post-Revolution counterparts
(McDonough and van der Camp, 2017). In other cases, a lack of technical re-
sources, such as Natural Language Processing tools for particular languages,
or textual geoparsing tools which are limited to particular historical periods
or regions (Murrieta-Flores and Gregory, 2015) have restricted the develop-
ment of authoritative gazetteers. Without these, scholars wanting to use Re-
cogito are left with two options: either they work with a set of generic place
names and LOD identifiers provided by online sources such as GeoNames3
or Wikidata,4 or they bootstrap their own customized gazetteers, thus some-
what defeating the purpose of using Linked Data in the first place. Finding
ways to link people, events, or sources is equally, if not more, challenging.
   In addition to the problem of Linked Data authorities, there remains
a question about the model of representing the data that we aim to link
through them. The Resource Description Framework (RDF) is the funda-
mental “markup language” for encoding Linked Data. But is its minimal
triple structure helpful as a foundation and mental model when conceptu-
alizing humanities data? Challenges in this regard relate, among others, to
the issue of resource identity, and to the distinction between information
and non-information resources. Born digital material may, or may not, have
a physical counterpart. If they do, they may carry hidden assumptions and
biases in the way that they represent or describe physical items, or agglom-
erate information from multiple information sources without making it ex-

   2
     https://pleiades.stoa.org/
   3
     https://www.geonames.org/
   4
     https://www.wikidata.org/




                                     88
plicit. While the “shape” of Linked Data is ideal for pulling together these
multiple sources and perspectives, the subject-predicate-object construction
of the RDF triple does not make it easy to include the essential contextu-
alizing data that may accompany these information objects; at least not in
a straightforward manner. Bechhofer et al. (2013) describe in detail how in
medical research the publication of triples without contextualizing data may
be counterproductive to the development of scientific research methodolo-
gies, including the provision of provenance data, which aids the interpreta-
tion and trust of results, and methods to support reproducibility. They ar-
gue that publishing results as LOD poses the risk of confounding the flow of
rights to the researcher. Although we are dealing with different material in
this paper, it is possible to ask similar questions when considering the use of
RDF as a tool for representing and publishing some types of cultural heritage
sources. Over the following sections, we will describe some of the contexts
of humanities scholarship in which it is appropriate to ask these questions,
and explore the implications of this for digital humanities work.

3        The Feast
The digitization of cultural heritage data is creating previously unimagined
possibilities for humanities researchers. Since the Semantic Web was first
presented as a theoretical concept in the late 1990s (Berners-Lee, 1999), cul-
tural heritage institutions have been quick to realize the transformative po-
tential offered by Linked Data and semantic modeling, recognizing it as
a way to create connections between previously siloed collections of data.
Museum, archive, and library materials are now available online, and the
growth of large-scale digital infrastructure projects such as Europeana5 and
the European Holocaust Research Infrastructure6 allow scholars to over-
come many of the logistical challenges of using primary sources, such as the
fragmentation and geographic dispersal of collections. Collaborative pro-
jects such as Nomisma.org have linked hundreds of thousands of archival re-
cords and knowledge objects into networks and infrastructures, thereby en-
abling comprehensive views of previously disparate collections. The success
of these initiatives, however, should not mask the difficulty involved in trans-
forming collections of data that were previously stored in relational databases
into knowledge graph models, or the compromises between interoperability
and specificity that need to be made with regards to the data itself. Further-
more, it remains unclear whether the minimal triple structure of RDF is a
good starting point when conceptualizing some types of humanities data,
    5
        https://classic.europeana.eu/portal/en
    6
        https://www.ehri-project.eu/




                                                 89
specifically sources that are stored in cultural heritage collections such as gal-
leries, libraries, archives, and museums (GLAMs). While the triple model fa-
cilitates the creation of complex networks, and shared authority files enable
the opening up of silos of data, the question is whether such a model pri-
oritizes connections over complexity, and what the middle ground between
these two data ideals might be.

3.1      Difficulties in Mapping Data
Mapping museum data to RDF is a considerable task. Many large museums
have long histories of acquisition, which has often resulted in idiosyncratic
record-keeping and documentation practices (Blagoev et al., 2018). These
practices have produced legacy data, which is often unique to individual
institutions and in some cases to individual catalogers within those institu-
tions, with different approaches to documentation being employed across
different curatorial departments and object types. The British Museum, for
example, holds artworks by Rembrandt in the same collections management
system as multiple terracotta lamps created by unnamed artisans in antiquity.
These are different objects, which require different approaches to describe
and record their details and the knowledge that has emerged from their study.
In terms of content, museum data may be messy as a result of many years of
labor, mistakes, and revisions by many different hands. Many collections
feature text fields which contain a great deal of unstructured metadata in the
form of free text strings, which provide additional data about an object, its
provenance, or associated individuals and groups. Finally, when mapping
museum data onto an ontology, it is not uncommon to be faced with data
that arrives in a range of different formats, from CSV to XML and JSON
(Knoblock et al., 2017).
   As discussed above, the move from a flat, relational data model to a graph
with the ability to supplement and extend information objects by expressing
complex relationships within the data offers great promise. However, get-
ting the data to fit into such a format often involves extensive processes of
standardizing and cleaning. Standardization into ontologies or controlled
vocabularies is vital for the transformation of the data in question, but, as
Kelly Davis argues, it is a valid process only if the result remains usable within
the Linked Data model (Davis, 2019). This argument is echoed by the exper-
ience of the American Art Collaborative project,7 which used Karma,8 an in-
termediary tool, to map the data from 14 art museums to Linked Data, using
the de facto standard museum ontology, the CIDOC Conceptual Reference
   7
       https://americanart.si.edu/about/american-art-collaborative
   8
       https://usc-isi-i2.github.io/karma/




                                                90
Model (CIDOC CRM)9 . In this case, there was no consensus between the
CIDOC experts on the project as to how certain data should be mapped,
which resulted in a suspension of the mapping until agreement could be
reached (Knoblock et al., 2017). This complexity, and a paucity of expert-
ise among museum staff to resolve it, is often a contributing factor to the
overall lack of museum data in the Linked Data ecosystem (Geddes, 2019).
   When developing Recogito, we solved the problem of making data that
differ vastly in terms of size, content type, and theme uniformly accessible
under a single user interface by taking a pragmatic approach. Realizing that
it was unlikely that we would be able to convince our partners to all agree
on one method of representing their data, Pelagios provides a set of light-
weight conventions for how to express links between the data and the things
described in it, which in most cases were geographic descriptions. We refer
to this approach as “connectivity through common references” (Simon et al.,
2019). However, this solution can only be operationalized if there is already
a critical mass of material available as LOD, which is not yet the case when it
comes to ethnographic museum data.

4         Changing Data
Unlike library records, which provide the metadata that facilitates access to
an information object (Gilliland, 2016), museum records typically involve a
great deal of contextualizing and descriptive information at the individual
item level, including descriptive data, rights data, as well as technical and
preservation data. Typically, a museum object record will also be the place
where any new research about the object is recorded, including its exhibi-
tion history. In this way, it is fair to say that a museum record is never truly
complete, since the information contained within it is liable to be constantly
changed and updated. In the context of ethnographic museum collections,
this presents a particular set of challenges. One significant tension that has
to be managed is between the volume of data and the individual, item-level
object biographies. Scale is an essential aspect of Linked Data,10 since the
model is designed to work best when there are multiple connections across
many different data repositories. This system is dependent on shared ter-
minology, usually mapped to an authority file, which allows terms that are
semantically similar to be recognized, and the linkages to be made. How-
ever, the historical nature of these materials and their documentation means
that even in the analogue versions, there is often very little standardization
across fields. This is particularly evident in the notes fields of many collec-
    9
         http://www.cidoc-crm.org/
    10
         https://www.w3.org/standards/semanticweb/data




                                                  91
tions, which contain information on the object itself, details about proven-
ance, or extensive notes provided by the curators, who are often subject ex-
perts (Griffiths, 2010). This data presents a conundrum: either it can be
excluded from what is visible in the aggregator (as is the case with Recogito),
although it may still be found by following the links to the original source
record in the data provider’s repository; or it has to be manually remodeled
in order to be included, as was the case with the American Art Collaborative.
Both options have distinct benefits and drawbacks.

5        Managing the Mass
In recent years, there has been an increasing discussion among museum
scholars and professionals about the need to decolonize their collections. In
many cases, these discussions focus on individual objects, their display and
documentation, although they also have an effect on the data infrastruc-
tures. Inclusive terminologies, which reflect the practices of the communit-
ies in which the objects originated, are being added to many records and cul-
tural sensitivity warnings are becoming increasingly ubiquitous when access-
ing collections databases online.11 Many of these efforts are codified in the
2013 International Council of Museums Code of Ethics, which provides
recommendations for the acquisition, storage, research, and display of cul-
turally sensitive material. But in the context of converting museum records
to Linked Data, in which an institution provides access to many thousands
of records via a SPARQL endpoint or a data dump, the practical realities of
managing changes like this can be a challenge, particularly if the revisions are
being made retrospectively, after the LOD workflow has been implemented.
   One illustrative example is a selection of records of human remains ori-
ginating from what was then the Belgian Congo (today the Democratic Re-
public of Congo), which are now part of the collection of the Museum of
Ethnography in Stockholm. The museum’s collections are aggregated by
the national LOD aggregator Swedish Open Cultural Heritage (SOCH),12
which currently contributes over 2,400,000 records from around 45 institu-
tions to Europeana.13 A search by contributing institutions reveals that the
Ethnographic Museum has contributed 238,138 objects to Europeana via
SOCH. Within this subset, a search using the terms ‘human remains’ (män-
skliga kvarlevor in Swedish) renders 1,010 results. Of these, the majority of
    11
      See, for example, the cultural warnings for anthropology museums such as the Pitt
Rivers Museum in Oxford
(https://prm.web.ox.ac.uk/terms-use-pitt-rivers-museum-database-object-collections) or the
South Australia Museum (https://www.samuseum.sa.gov.au/cultural-sensitivity-warning).
   12
      https://www.raa.se/in-english/digital-services/about-soch/
   13
      https://pro.europeana.eu/organisation/swedish-open-cultural-heritage




                                             92
the records display a grayed-out image panel, which states ‘Human Remains,
image has been blocked.’ However, six items do have visible accompanying
images, which show human skulls in various states of completeness, some
adorned with shells and feathers. All six of the images and records are freely
available for reuse, with Creative Commons CC BY licences, which require
only attribution of the original source, or CC 0 (public domain) licenses. No
contextualizing information on how the materials came to be in the collec-
tion, where, when, and under what circumstances they were obtained, is to
be found in the accompanying metadata. At no point in the record is there a
statement from either Europeana or the museum which acknowledges that
these are human remains, addresses how they are displayed or documented,
or explains why they are visible, when other items in the subset are not. This
is despite the fact that the ICOM Code of Ethics, subsection 4.3, states that:
“Human remains and materials of sacred significance must be displayed in
a manner consistent with professional standards and, where known, taking
into account the interests and beliefs of members of the community, eth-
nic or religious groups from whom the objects originated. They must be
presented with great tact and respect for the feelings of human dignity held
by all peoples.”14 This example represents a very small percentage of the over-
all number of objects from the museum’s collection which have been added
to Europeana, and is not meant to imply that the Ethnographic Museum,
Swedish Open Cultural Heritage, or Europeana are knowingly sharing im-
ages of human remains on their websites or portals. Rather, what they illus-
trate is the complexity of managing large sets of heterogeneous data charac-
terized by a significant variance in their descriptive terminologies, and the
need for added precautionary measures when certain types of material are
part of those datasets.

6         The Famine
The famine in the title of this paper not only refers to the lack of gazetteers as
a source for creating semantic links between sources, but also to the inherent
difficulties of managing large sets of data, which, ironically, can result in a
dearth of contextualized, rich, and usable Linked Data that is at the disposal
of researchers.15 The cultural heritage sources which form the basis of much
of humanities scholarship are diverse and complex. Digitized sources may
include transcribed or OCRed texts, scanned images, high resolution photo-
graphs or stills, musical scores, and census or survey data. The digitization
process may enrich these knowledge objects with supplementary metadata,
    14
         https://icom.museum/en/resources/standards-guidelines/code-of-ethics/
    15
         https://linked.art/loud/




                                                  93
drawn from a range of different sources. Meanwhile, born digital sources
(such as the annotations in Recogito) may be completely new or only vaguely
resemble the original, as they contain an agglomeration of multiple informa-
tion sources. This re-aggregation of data into a triple means that by default,
some data will not be automatically included in the digital source. Persist-
ent provenance data, for example, which aids the interpretation and trust
of results, and facilitates methods to support reproducibility, is not natively
included in the model (Sikos and Philip, 2020). This omission has signific-
ant implications for linked cultural heritage data, where provenance inform-
ation is essential for providing context to the object, as is particularly evident
in the case of ethnographic collections, as well as collections containing ob-
jects that may be considered to be looted or stolen art.

6.1   Copyright and Permission Problems
One of the minimal conceptual drivers behind the Semantic Web is that the
data being shared is openly licensed, thus enabling the linking and sharing of
datasets in non-proprietary formats. As illustrated in the example provided
above, even this most basic of assumptions is complicated in the context
of museum data collections, which often contain a range of different copy-
right statements within one collection. In many museum collections, it is
extremely difficult to identify the copyright status of individual collection
items, and identifying rights holders in order to obtain clearance can be com-
plex and time consuming, if not impossible (Wallace and Euler, 2020). Lim-
ited or incomplete information is often all that is available for works, or they
may have multiple rights holders. These conditions make the management
of copyright data within an LOD framework extremely complex, if, for ex-
ample, not all of the data within a dataset can be openly licensed without
risking a contravention of the copyrights pertaining to them. A 2018 study
(Blijden, 2018) on the accuracy of the rights statements in collections in-
cluded in Europeana found that 19% of the collections examined (a total
of 10 collections) had inaccurate edm:rights values, while accuracy for a fur-
ther 25% (15 collections) could not be determined. Bearing in mind that
each collection contains many hundreds of thousands of objects, this means
that a significant set of data has been released into the web with inaccurate
or incomplete rights information. The researchers also found that, in gen-
eral, the more heterogeneous the collections, the more likely the data was to
be incomplete or inaccurate. This example highlights the complexities in-
volved in managing important metadata for large collections of linked open
cultural heritage data, and the risks posed by releasing material for which
there is missing or incomplete copyright documentation.



                                       94
6.2   Ethnographic Collections

The recent move towards critical self-reflection in museum practice has high-
lighted the need for museums to both engage with and communicate how
their collections came into being. Many of the world’s great encyclopedic
museums were created as part of larger national and imperial expansion pro-
jects during the nineteenth and early twentieth centuries (Turner, 2015).
The ways in which material was collected, cataloged, and described, but also
the language and terminology included in the collections records, reflect
these historical realities: ethnographic objects were seen as scientific speci-
mens rather than individual works of art or human remains, and the bio-
graphies of those objects and their creators often went unrecorded (Beltrame,
2016). Today, many collections around the world are beginning to recognize
and acknowledge these histories, and the role that plunder, coercion, and
political violence may have played in their formation. Efforts are now being
undertaken in institutions around the world to revise these collections and
their records, to reconsider their display, both online and in-gallery, and to
re-evaluate institutional responsibilities to a range of audiences (Taylor and
Gibson, 2017). Actions such as the removal of certain items from display,
the return of sacred objects to source communities, and the supplement-
ing of records with additional contextual information about how materials
came to be included in museum collections create a dialogue between the
viewer and the object in question – the interaction with it is historicized
and complicated, which in turn has profound implications for its use as a
source of scholarship (Geismar, 2018). But as it stands, this process risks be-
ing truncated when the object is represented as RDF. While RDF provides a
structure for representing the data, supplementary contextual information
is modeled using other semantic tools, such as ontologies, controlled vocabu-
laries, or authority files. Without ensuring that this information is also read-
ily accessible to a querying tool, we risk seeing museum objects which have
been incorporated into the Linked Data ecosystem as uniform and reducible
to the conceptual logic of code, which ultimately entails a reduction in their
complexity.
   The ethnographic nature of materials in many museum collections
presents a particular set of difficulties when it comes to standardizing data us-
ing controlled vocabularies or thesauri. These challenges can be broken into
two distinct types: those related to the terminologies used when the records
were created, and those related to the information content of the records
themselves. The first set of challenges is one which is emerging as curators,
archivists, and librarians have begun to approach their institutional practices
and inherited records from reflective critical positions. Over the last decade



                                      95
or so, scholarship on libraries, archives, and museums has begun to address
the historical cataloging and documentation strategies of these institutions.
The naming and classification of people and objects through the application
of what in the eighteenth and nineteenth centuries were considered to be sci-
entific methods has shaped the field of knowledge in museums and persists
in many of the records today. While there is a growing awareness among mu-
seum professionals that they need to address, acknowledge, and ameliorate
these histories, particularly as they are manifested in their records, there is
also a question of stewardship. On the one hand, these data collections are
the ideal context in which to approach these questions, since ethnographic
museum data meets several of the technical and conceptual requirements
for an in-depth inquiry: it is complex and heterogeneous, often irregular,
and many different metadata schemas are in use in the field. Museum data
also requires context in order to be understood and reused in ways which
are sensitive to the origins and history of much of the material, which adds a
level of urgency to the process. But on the other hand, scholars working on
the creation of these links should also ask themselves the question of whether
everything that can be linked should be. The collections data in many of these
institutions is complicated and not always comfortable, but hiding this data
is not a constructive response. As scholarship around responsible data stew-
ardship develops (Coleman, 2020), it becomes imperative that we consider
this tension, particularly as large cultural heritage institutions in Europe con-
tinue to open their collections. This is not an argument against the use of
RDF as a technical representation format. Rather, it is a question of what
models to use, so that we are able to preserve the complexity and ambiguity
that make museums important sites of study, while not impeding interoper-
ability. Models for both data and platforms which replicate the siloed nature
of earlier data structures risk rendering Linked Data unusable, negating its
purpose and squandering its key benefits.16
   This state of affairs also raises the problem of how to include attribution
in an RDF-based knowledge graph. In their study of the publication of med-
ical data, Bechofer et al. argue that publishing scientific results as LOD poses
the risk of confounding the flow of rights to the researcher. This has res-
onance with the use of RDF as a tool for representing and publishing some
types of cultural heritage data. In this context, the question of rights is partic-
ularly significant, since copyright regimes and provenance metadata for cul-
tural heritage collections is often extremely complex, as outlined in Section
4.1. The issues of attribution and contextualizing data are also significant
for researchers and institutions who work on the provenance of looted and
  16
       https://www.w3.org/standards/semanticweb/data




                                                96
stolen artworks. This community was quick to realize the value of LOD as
a tool for managing the documentation trails which are essential for track-
ing lost or stolen artworks, and, if necessary, establishing legal claims for
ownership and provenance (Fink et al., 2014). While the linking of collec-
tions has increased the ability of researchers to follow certain individuals in-
volved in looted or stolen art markets, there is little, at present, in the way of a
centralized authority, like the gazetteers used by Recogito, against which to
resolve their identities. Collaborative community initiatives such as Open
Art Data are using Linked Data tools to create initial searches across col-
lections, and highlight or ’red flag’ names of collectors and art dealers who
were associated with the theft of artworks. However, provenance tracking in
these situations is extremely complex, as false provenance information is reg-
ularly being inserted into the documentation of looted and stolen works.17
In these cases, without the addition of contextualizing information, the cre-
ation of semantic links between collections may have the undesired effect of
perpetuating certain falsehoods around the ownership and/or origin of ob-
jects, which poses the risk of confounding the ability of researchers to con-
duct digital network analysis of the markets for looted and stolen art.

7         Summary and Conclusion
In this paper, we have tried to assess the use of Linked Data from cultural her-
itage sources from both a practical and a theoretical perspective, and to show
how data consumers are often confronted with a situation where they have
both too much available data and not enough useful data. We have shown
how Linked Data can provide a meaningful mechanism for creating connec-
tions between sources and for facilitating scholarly practices. But such an
approach is only possible if there is a critical mass of data available, and if
the connections between sources can be established through reliable, open,
persistent authorities, such as gazetteers. Without these verification mechan-
isms, the mass of heritage LOD quickly becomes unnavigable and unusable,
resulting in a famine of usable data. We have also looked at the practical
difficulties of managing and integrating contextualizing metadata, such as
provenance information, alongside RDF triples, and have argued that this
can be a significant problem for certain heritage data types, such as ethno-
graphic collections, or looted and stolen artworks. One of the implications
of such difficulties is that data which is incorrect, culturally insensitive, or
contains outdated terminologies can inadvertently be released into the LOD
ecosystem, without a tempering mechanism to draw users’ attention to the
issues at hand. This state of affairs is particularly troubling for large collec-
    17
         https://www.openartdata.org/2020/07/how-to-track-falsification-of-provenance.html




                                                  97
tions of heritage data, which are proliferating as more and more institutions
open up their collections. At the same time, this proliferation of data belies
the significant effort and complexity faced by museums when they under-
take the process of converting, mapping, and verifying their data into a LOD
schema. These difficulties are particularly pronounced in ethnographic col-
lections, where the current turn towards reevaluating collections and their
colonial past requires augmentations to the collections’ documentation and,
consequently, the available data. We have also shown how content licens-
ing that facilitates open sharing, one of the essential components of Linked
Data, can be imprecise and difficult to manage, which creates a risk of loss
of data within the ecosystem.

References
Bechhofer, S., Buchan, I., De Roure, D., Missier, P., et al. (2013). Why
  Linked Data is Not Enough for Scientists. Future Generation Computer
  Systems, 29(2):599–611, DOI: 10.1016/j.future.2011.08.004.

Beltrame, T. (2016). Creating New Connections: Objects, People and Di-
  gital Data at the Musée du Quai Branly. Anuac, 4(2):106–129, DOI:
  10.7340/anuac2239-625X-1980.

Berners-Lee, T. (1999). Realising the Full Potential of the Web. Technical
  Communication, 46(1):79–82, https://www.learntechlib.org/p/85551.

Blagoev, B., Felten, S., and Kahn, R. (2018). The Career of a Catalogue:
  Organizational Memory, Materiality and the Dual Nature of the Past at
  the British Museum (1970-Today). Organization Studies, 39(12):1757–
  1783, DOI: 10.1177%2F0170840618789189.

Blanke, T. and Hedges, M. (2013). Scholarly Primitives: Building Institu-
  tional Infrastructure for Humanities E-Science. Future Generation Com-
  puter Systems, 29(2):654–661, DOI: 10.1016/j.future.2011.06.006.

Blijden, J. (2018). The Accuracy of Rights Statements on Europeana.eu.
   Technical report, Kennisland.

Coleman, N. (2020). Managing Bias When Library Collections Be-
  come Data. International Journal of Librarianship, 1(4):8–19, DOI:
  10.23974/ijol.2020.vol5.1.162.

Davis, K. (2019). Old Metadata in a New World: Standardizing the Getty
  Provenance Index for Linked Data. Art Libraries Journal, 44(4):162–166,
  DOI: 10.1017/alj.2019.24.




                                     98
Fink, E., Szekely, P., and Knoblock, C. (2014). How Linked Open Data
  Can Help in Locating Stolen or Looted Cultural Property. In Ioannides,
  M., Magnenat-Thalmann, N., Fink, E., Žarnić, R., et al., editors, Di-
  gital Heritage. Progress in Cultural Heritage: Documentation, Preserva-
  tion, and Protection, pages 228–237. Springer International Publishing,
  Cham, DOI: 10.1007/978-3-319-13695-02 2.

Geddes, M. (2019). Strategies to Support Wider Adoption of Linked Open
  Data in Smaller Museums. http://jhir.library.jhu.edu/handle/1774.2/62123.

Geismar, H. (2018). Museum Object Lessons for the Digital Age. UCL Press,
  London.

Gilliland, A. (2016). Setting the Stage. In Baca, M., editor, Introduc-
  tion to Metadata. Getty Publications, https://www.getty.edu/publications/
  intrometadata/setting-the-stage/.

Griffiths, A. (2010). Collections Online: The Experience of the British
  Museum. Master Drawings, 48(3):356–367, https://www.jstor.org/stable/
  25767237.

Isaksen, L., Simon, R., Barker, E. T., and de Soto Cañamares, P. (2014).
   Pelagios and the Emerging Graph of Ancient World Data. In Proceed-
   ings of the 2014 ACM Conference on Web Science, WebSci ’14, pages
   197–201, New York, NY. Association for Computing Machinery, DOI:
   10.1145/2615569.2615693.

Knoblock, C. A., Szekely, P., Fink, E., Degler, D., et al. (2017). Lessons
  Learned in Building Linked Data for the American Art Collaborative. In
  d’Amato, C., Fernandez, M., Tamma, V., Lecue, F., et al., editors, The Se-
  mantic Web – ISWC 2017, pages 263–279. Springer International, DOI:
  10.1007/978-3-319-68204-42 6.

McDonough, K. and van der Camp, M. (2017). Mapping the Encyclopédie:
 Working Towards an Early Modern Digital Gazetteer. In Proceedings of
 the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities - GeoHu-
 manities’17, pages 16–22. ACM Press, DOI: 10.1145/3149858.3149861.

Murrieta-Flores, P. and Gregory, I. (2015). Further Frontiers in GIS: Extend-
 ing Spatial Analysis to Textual Sources in Archaeology. Open Archaeology,
 1(1):166–175, DOI: 10.1515/opar-2015-0010.




                                     99
Sikos, L. F. and Philip, D. (2020). Provenance-Aware Knowledge Rep-
   resentation: A Survey of Data Models and Contextualized Know-
   ledge Graphs.        Data Science and Engineering, 5:293–316, DOI:
   10.1007/s41019-020-00118-0.

Simon, R., Vitale, V., Kahn, R., Barker, E., et al. (2019). Revisiting Linking
  Early Geospatial Documents with Recogito. e-Perimetron, 14(3):150–
  163.

Taylor, J. and Gibson, L. K. (2017).      Digitisation, Digital Interac-
  tion And Social Media: Embedded Barriers to Democratic Herit-
  age. International Journal of Heritage Studies, 23(5):408–420, DOI:
  10.1080/13527258.2016.1171245.

Turner, H. (2015). Decolonizing Ethnographic Documentation: A Crit-
  ical History of the Early Museum Catalogs at the Smithsonian’s National
  Museum of Natural History. Cataloging & Classification Quarterly,
  53(5/6):658–676, DOI: 10.1080/01639374.2015.1010112.

Wallace, A. and Euler, E. (2020). Revisiting Access to Cultural Heritage in
 the Public Domain: EU and International Developments. IIC - Interna-
 tional Review of Intellectual Property and Competition Law, 51:823–855,
 DOI: 10.2139/ssrn.3575772.




                                    100