=Paper=
{{Paper
|id=Vol-3262/paper15
|storemode=property
|title=Wikidata and Wikibase as complementary research data management services for cultural heritage data
|pdfUrl=https://ceur-ws.org/Vol-3262/paper15.pdf
|volume=Vol-3262
|authors=Lozana Rossenova,Paul Duchesne,Ina Blümel
|dblpUrl=https://dblp.org/rec/conf/semweb/RossenovaDB22
}}
==Wikidata and Wikibase as complementary research data management services for cultural heritage data==
<pdf width="1500px">https://ceur-ws.org/Vol-3262/paper15.pdf</pdf>
<pre>
Wikidata and Wikibase as complementary research
data management services for cultural heritage data
Lozana Rossenova 1, Paul Duchesne 1 and Ina Blümel 1
1TIB – Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167 Hannover, Germany


                            Abstract
                            The NFDI (German National Research Data Infrastructure) consortia are associations of various
                            institutions within a specific research field, which work together to develop common data
                            infrastructures, guidelines, best practices and tools that conform to the principles of FAIR data [1, 2].
                            Within the NFDI, a common question is: What is the potential of Wikidata to be used as an application
                            for science and research [3]? In this paper, we address this question by tracing current research use-
                            cases and applications for Wikidata, its relation to standalone Wikibase instances [4], and how the two
                            can function as complementary services to meet a range of research needs. This paper builds on lessons
                            learned through the development of open data projects and software services within the Open Science
                            Lab at TIB, Hannover, in the context of NFDI4Culture – the consortium including participants across
                            the broad spectrum of the digital libraries, archives, and museums field, and the digital humanities [5,
                            6].

                            Keywords 1
                            Wikidata, Wikibase, research data management, cultural heritage, open science, NFDI, NFDI4Culture


1. Introduction
Wikidata was released in 2012 and originally intended to resolve concrete issues pertaining
to Wikipedia [7]. It aimed to reduce data redundancy and serve as the language-agnostic data
source for infoboxes that are now ubiquitous across Wikipedia pages. Wikidata is distinct
from its sister project, Wikipedia, for the fact that it stores structured data and makes that
data accessible via a SPARQL endpoint, providing a machine-readable service in contrast to
Wikipedia’s primarily human-readable interface. As of July 2022, it stores well over a billion
statements and 99 million items on subjects across a vast range of knowledge domains [8,9].
It highlights the power of a centralized and distributed approach [10]: a vast amount of
information is accessible through a central endpoint in a standardized linked open data
format, all items have PIDs (persistent identifiers), and all that information is crowdsourced
through collaborative editing efforts. Edits made by both human users and bots are tracked in
a reliable version control system with clear provenance and the ability to discuss, debate and
(sometimes) revoke each edit [11].
   These qualities make Wikidata an attractive environment for data storage, curation and
extraction. It is already widely used across many domains of knowledge management,

The 3rd Wikidata Workshop, Workshop for the scientific Wikidata community, @ ISWC 2022, 24 October 2022.
  Lozana.Rossenova@tib.eu (L. Rossenova); Paul.Duchesne@tib.eu (P. Duchesne); Ina.Bluemel@tib.eu (I. Blümel)
  0000-1111-2222-3333 (L. Rossenova); 0000-0002-3075-7640 (I. Blümel)
                            © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

                            CEUR Workshop Proceedings (CEUR-WS.org)
 CEUR
             ht
              tp:
                //
                 ceur
                    -ws
                      .or
                        g
 Works
     hop     I
             SSN1613-
                    0073
 Pr
  oceedi
       ngs
including scientometrics (e.g. WikiCite [12] and Scholia [13] initiatives) and academic
research data management. Communities of researchers in the life sciences [10], computer
science and digital preservation [14], as well as cultural heritage [15], among other fields,
have already documented their work and experience using Wikidata as research
infrastructure. But they have also documented where the issues born out of dealing with a
bottom-up ontology design and curation [16], the burden of vastness of scale on performance
[17], the restriction of Wikidata being a secondary database (vs original research repository)
[18], and ultimately trustworthiness [19] have become problematic in the context of
established scientific practice.
   Despite these potential disadvantages, the growing use of Wikidata – both as a repository
to upload data to, and a rich resource on the linked open data (LOD) cloud to federate with –
can also be presented as an exciting ‘proof of concept’. In this paper, we discuss how its
underlying software suite (Wikibase) can fulfil many of the core requirements of scientists
and researchers dealing with structured data, while at the same time removing some of the
issues born out of scale, governance policy, and ontology particularities. Furthermore, we
showcase how independent Wikibase instances can be effectively deployed as research
infrastructure for cultural data that complements and can further enrich, but also benefit from,
Wikidata’s already rich network of connections.

2. The drawbacks of an open editing policy for research data
The primary strength of Wikidata is currently its size, both in relation to coverage of subject
area and also the sheer scale of data points. The desire to include data pertaining to all areas
of human knowledge is ambitious, reflected by the Wikimedia observation that it “only works
in practice. It could never work in theory” [20].
   This growth has been enabled by providing a relatively low barrier of entry for contributor
involvement, with no impediment to anyone with internet access being granted the ability to
edit production data.2 This low barrier of entry has been purposefully designed as a means of
encouraging community involvement.3 An effect of this is that there is no inherent preference
given to a user’s expertise in a specific area, which means that a domain expert could find
their contribution overwritten by a user with only a cursory knowledge of the subject. In
practice this appears to be mitigated by obscure specialist data ‘hiding in plain sight’, and by
the fact that vandalism mostly targets highly visible entries (for which circumstances some
page protection tools have been developed [21]). The lack of editing limitation is not only
applied to human users, as there are many resources for setting up discrete editing bots which
amend data based on preconfigured logic, and which can be authenticated to operate
autonomously once granted permission by the community [22]. The use of these services does
require prior manual editing history to prove the ‘good’ intent of the individual. Furthermore,
the ability to revoke data edits based on contributor information is one of the platform’s
strongpoints.
   There are however no mechanisms to assert authority based on access to physical evidence
(which is particularly relevant to cultural institutions, i.e. access to artefacts themselves) or

   2‘Production’    is used here in the sense defined by the Agile development model:
http://www.agiledata.org/essays/sandboxes.html
    3Wikidata’s vandalism strategy is mostly reparative, not preventative, relying on extensive versioning
information to rollback undesirable edits. See: https://www.wikidata.org/wiki/Wikidata:Vandalism
first-hand experiences [23]. The combination of these factors makes an institution’s decision
to contribute and/or rely on Wikidata a complex one, as they could easily find themselves in
the role of ‘digital gardener’, not just contributing data but having to maintain their
statements from alteration [19].
    There are also general issues with citation on the platform. There is a well-supported
method for applying a reference link to any statement – most often as a web link to a
secondary source which supports the claim. However, this is not a mandatory attribute, so
many statements are simply presented ‘as-is’ – without any justification or means of
verification. The heavy reliance on web links also means that ‘link rot’ is a concern. A
promising area of investigation is the potential future use of ‘signed statements’4 to allow
institutions and authority sources to authenticate and endorse claims which are displayed and
attributed to them. This would allow subject-matter experts the ability to assert their domain
knowledge and elevate their ability to verify a statement over the claims of a causal user.

3. The trouble with standard ontologies
The open editing policy also leads to challenges related to ontological coherence. Wikidata
was conceived with an inherently flat structure in relation to creative works, due to its
primary function as facilitating data exchange between different language Wikipedia
resources, which resulted in an initial one-to-one relationship between the two platforms.
Increasingly there has been community-driven work to define multi-tiered structures for
cultural data (such as WikiProject Books [24]), but there are issues with this approach which
can be illustrated with a closer look at the representation of data related to literature.
    Contemporary models for cataloguing (for example FRBR [25], BIBFRAME [26] or FIAF
[27]) rely on structures generally involving at least three tiers. This begins with a ‘work’ or
‘expression’ to represent the artwork as an abstract entity, a ‘manifestation’ or ‘edition’ for
each interchangeable batch of physical material, and an ‘item’ to represent an individual
physical artefact. For example, Alice’s Adventures in Wonderland (meaning specifically the
book by Lewis Carroll) is an identified singular artistic ‘work’, individual ISBNs5 delineate
different ‘editions’, and an ‘item’ is a physical copy sitting on a shelf. It is unreasonable to
expect individual ‘items’ to be represented in Wikidata unless they are especially notable,6
although it is worth observing that some disciplines place a great deal of importance on a
specific item being identified.7 The primary tension arises from attempting to enforce the
‘work’ and ‘edition’ levels of the schema as distinct ontological elements on a platform not
initially intended to facilitate any strict ontological orders of classes and subclasses. The wide
reach of Wikidata, coupled with finite infrastructure resources, means that it is not possible
to incorporate highly granular data for every represented field. This further contributes to


    4See the proposal and community discussion for this cryptographic method for endorsing and verifying claims
on Wikidata (note, however, that this is still under discussion and not a method already implemented in Wikidata):
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Signed_Statements_(T138708)
    5ISBNs indeed provide a rare example where a URI is already ‘minted’, attached to a physical item and ready
to be reused as a persistent identifier for that edition.
    6One    of few examples of an ‘item’ level book entity is the Lincoln Bible. See:
https://www.wikidata.org/wiki/Q1816474
    7There is a great variety amongst extant artefacts of early cinema, in many cases due to distinct coloring or
modification processes performed on individual prints.
preventing adoption as a primary research platform, as it is precisely this detail which would
make it a more valuable resource for many researchers.
    Another area which requires attention is the enforcement of class-specific schemas. Due
to a lack of restrictive mechanisms it is currently possible to make data statements which are
completely nonsensical. This appears to be mostly avoided by a lack of interest in this form
of vandalism, in addition to an active community who pursue what are deemed incorrect
edits. Still, a significant step towards being a more trusted resource would be the inclusion of
strict data validation which could build upon the existing EntitySchema extension, if it is
scaled beyond its current application of serving mostly as a guideline [28].

4. Wikibase as research data management (RDM) service
Thanks to many of the features developed for Wikidata, Wikibase is already more than a
single tool. It can instead be considered an umbrella of services [29], including:
   •     A graphical user interface (GUI) to a database with collaborative and version control
   features;
   •     A triplestore, editable down to the triple statement level (thanks to the requirements
   of Wikidata to serve as a verifiable source and secondary database, every triple statement
   can be edited and enriched with qualifying statements and references to external sources
   – all achievable via the GUI);
   •     A SPARQL endpoint with its own GUI;
   •     An API for read/write programmatic access;
   •     A wide range of script libraries (PyWikibot, WikidataIntegrator), as well as additional
   user-friendly tools (QuickStatements, Cradle, OpenRefine reconciliation service), for data
   import/export.
   Private Wikibase instances rarely have to deal with performance issues born out of
Wikidata’s scale, at least not until they grow into the magnitude of dealing with many
millions of items. Importantly, private Wikibase instances can store original research data, as
they don’t share Wikidata’s policy for being only a secondary database. In that sense,
individual Wikibase instances can serve as primary sources for data to be later referenced in
Wikidata (this type of decentralization is in fact in alignment with the long-term strategy and
vision behind Wikidata itself and the broader Wikimedia movement [30, 31]). Private
Wikibase instances can also hold various data licenses, or remain entirely closed off from the
open web, depending on the nature of the research and the need for privacy control (note that
another restriction of Wikidata is the need for data to be licensed CC0).
   When installed ‘out-of-the-box’, Wikibase provides users with an experience nearly
identical to that of Wikidata. There are minimal options for custom branding, but the default
interface follows the same templates for creating, editing, or simply viewing item and
property pages as Wikidata [32]. The SPARQL endpoint GUI is also identical to Wikidata’s
query service. Additional tools from the Wikidata ecosystem (as import/export tools) can also
be adapted to Wikibase instances [33, 34]. This provides ease of transition from one service
to the other. Since there is already extensive documentation and training material for
Wikidata users [35], much of the same material and approaches to training can be adopted
when introducing Wikibase in research and/or cultural institution environments. Thus it can
be argued that Wikibase is well suited to fulfil the need for end-to-end services to ‘LOD’-ify
research data, while at the same time easing the learning curve to working with LOD
compared to other existing knowledge graph tools [36]. However, Wikibase is not entirely
free of the issues born out of the open approach to defining a data model and ontology and
the simultaneous lack of validation mechanisms in Wikidata.
   Wikibase follows the conventions for data structuring (and PID generation) set out by
Wikidata (so all items receive a Q number, whereas properties receive a P number), however
there is no requirement to follow Wikidata’s upper ontology [16], and Wikibase users can
define their own entities (items) and relations (properties) from scratch. Here, the researchers
deploying a Wikibase instance for their own datasets will be in control of how data is curated
and described, while taking advantage of the familiar graphical user interface for data entry
and editing. There will be no need to deal with the ‘messiness’ of collaborating with a vast
and largely anonymous international community, which expands Wikidata’s vocabulary
primarily in response to the needs of the Wikidata and Wikipedia projects.

4.1.      Addressing the ‘open world’ scenario of modelling data in Wikibase

The possibility to define new LOD models with the help of a GUI can be a liberating prospect
for disciplines that defy conventions and do not fit neatly in previously established metadata
standards (particularly good case in point being contemporary art data [36]). But this can also
be an issue for researchers who want (and in some cases may be required) to work within an
established domain standard and to reuse ontologies maintained elsewhere on the semantic
web. Although there are ways to map local Wikibase properties and classes to external
ontologies, the native RDF structure remains relatively ‘flat’ and lacks the semantics needed
for more sophisticated reasoning operations over its triplestore [3].8 Furthermore there are
no formal constraints or validation rules that can be applied to Wikibase’s flat data structure
as is.9
    An established tradition in free and open source software (FOSS) culture is that if a tool
doesn’t perform to user expectations, users have the choice to either find workarounds or
they can write a patch, or an extension of the tool, and submit a pull request for the issue at
hand. As a FOSS tool, Wikibase benefits from this ethos and its community has been
collaborating in finding workarounds (e.g. see Figure 1) or developing entirely new features.
    Mapping local properties and items to Wikidata, for example, has been piloted and shown
to deliver successful results in query federation [37] and data syncing [38]. Developing data
modelling principles and import/export pipelines for Wikibase that explicitly match to
established schemas or ontologies, such as the standard ontology for cultural heritage data
CIDOC-CRM [39], has also shown promising results in moving away from the complete ‘open
world’ paradigm [40, 41].


     8Still, Wikidata and Wikibase offer some capabilities out-of-the-box that are actually less ‘flat’ than other RDF
resources, for example the possibility to attach references or qualifiers to individual triple statements. In addition,
it is worth noting that Wikidata contains properties like wdt:P279 and wdt:P31 which are direct translations of
rdf:type and rdfs:subclass of. Waagmeester [10] has demonstrated the potential to use CONSTRUCT queries to get
the semantics needed for reasoning also in Wikidata, e.g. see this example query: https://w.wiki/4x49
     9This is somewhat addressed by the EntitySchema extension [28] which can bring machine-readable
boundaries to any Wikibase, though the values in statements defined by an entity schema are still not validated
apart from data type, e.g. a statement with a property calling for a country can still be (wrongly) populated by a
city item, because the only constraint is to add a Wikidata item in the value field associated with the country
property.
Figure 1: Generalizable SPARQL template for identifier translation. These simple SPARQL examples
show how identifiers of any biological type can easily be translated using SPARQL queries. Image by
Waagmeester, et al (2020) [10].

   While the above can be described as workarounds or mitigation tactics, there are also
efforts among user communities to redevelop the explicit logic of how RDF is written to the
Wikibase triplestore. Within NFDI4Culture, the team at FIZ Karlsruhe have experimented
with methods that write semantic RDF directly into the triplestore, thus circumventing the
limitations of the MySQL database that is connected to the Wikibase user frontend and serves
as default source for RDF statements written to the triplestore [3]. The issue with the latter
methods is that these statements are not bi-directional and cannot be used when continuous
edits need to be performed via the frontend user interface. An alternative plan for an
extension that creates explicit mappings between Wikibase items and properties and standard
OWL or SKOS classes and properties is being discussed and prepared among a Wikibase
community group dedicated to new (re)developments supplementing the official product
roadmap set by Wikimedia Germany (the lead maintainers of Wikibase) [42]. While this
extension alone does not solve all the issues regarding the use of formal semantics and
standard ontologies in Wikibase, it is an illustration of the capacities within FOSS
communities to respond to user needs – an important factor to consider when researchers
select tools and infrastructures for long-term projects.

4.2.    Wikibase RDF extension

The Wikibase RDF extension [43] currently in development will work via implementing a
graphical configuration window into each Wikibase entity page, which will enable users to
specify correlations between Wikibase entities and externally defined RDF resources, using
commonly-used OWL and SKOS methods of declaring two resources to be ‘same’ or ‘similar’
(see Figure 2).
   This extension will also transfer these correlation statements into the triplestore. However,
the triplestore’s default configuration currently precludes leveraging any of the inferencing
or knowledge generation features which are present in modern knowledge graphs. The next
step is to reconfigure the Blazegraph triplestore to incorporate and appropriately process
these newly generated mapping triples, and also to explore routing data generated by this
extension directly into other triplestores. At the Open Science Lab, we are actively supporting
work on this extension and triplestore reconfiguration, with the aim to allow independent
Wikibase instances to be used both as practical tools facilitating introduction to the semantic
web, and also as fully-featured knowledge graphs meeting the needs of the NFDI research
consortia.


Figure 2: Design mockup of the new graphical interface for property mapping via the Wikibase RDF
extension. Courtesy of Dragan Espenschied (Wikibase Stakeholder Group).

5. Linked research data: increasing granularity and specificity
So far we have examined some of the respective strengths and weaknesses of Wikidata and
Wikibase, but comparing one against the other as a service is not the aim. Rather, the
approach we see as most productive for a comprehensive research data management
infrastructure is one where the two services are utilized side-by-side to produce richly linked
research data at various levels of granularity and specificity, while retaining licensing and a
degree of ‘openness’ appropriate to each use-case’s context. In this approach, Wikidata
remains a ‘hub’ service, linking together a family of independently maintained Wikibases,
each containing a vast expansion of detail for a given subject area. If we take the previously
used example – Alice’s Adventures in Wonderland – the book remains as a single entity on
Wikidata (the ‘work’), connected via external identifiers with other Wikibases which need to
link their resources (most obviously in this instance, Wikibases run by libraries).10
   The decision to either use Wikidata as primary data repository, or deploy an independent
Wikibase and then interlink with relevant Wikidata entities, ultimately depends on the
granular nature of the data itself. At the Open Science Lab we have two ongoing cultural data
projects which provide good illustration of the decisions involved in choosing either path. We

    10There  are already a number of library initiatives working in              this   direction   See:
https://www.wikidata.org/wiki/Wikidata:WikiProject_LD4_Wikidata_Affinity_Group                      and
https://www.wikimedia.de/the-wikilibrary-manifesto/
discuss these use cases below to illustrate how both projects support the above-mentioned
approach to data management.

5.1.    Case study A: DigAMus Award

To visualize the advantages of structured, networked information that can be edited by
anyone, we first discuss the ‘DigAMus goes Wikidata’ project. The DigAMus Award, which
honors successful digital offerings from museums in German-speaking countries, grew out of
a grassroots movement, and continues to be organized on a voluntary basis [44]. The first call
for submissions in 2020 received 129 entries – far more than expected. Building on our
experience with similar projects with potential to facilitate work through communities [45],
we suggested curating structured data about DigAMus in Wikidata, instead of a simple
spreadsheet. We were able to demonstrate the benefits to museums who can themselves
expand Wikidata entries on an ongoing basis with further information on their projects. We
also explained the benefits for museums in maintaining information about them and their
activities available as LOD, and thus accessible for further applications [46]. Crucially, the
decision to choose Wikidata was informed by the fact that there was no need to create any
new items or properties in order to represent the knowledge as desired.11 We created and
mapped an appropriate data model,12 imported data via the easily accessible, open source tool
OpenRefine [47], and provided pre-built queries13 to demonstrate the possibilities of the
networked data. Visualizations of the query results were a particularly valuable outcome of
the project – presenting a comprehensive and visually-appealing overview of all submitted
projects.
   For legal reasons [48], we were not allowed to simply create thumbnail-images from the
project web pages, store them in Wikimedia Commons and link to Wikidata so that they can
be displayed in the visualizations. This was a learning experience for the Award organizers
and the submission of a CC-BY-licensed thumbnail image for the respective project was
requested for the second call in 2021. The projects from the second year’s DigAMus Award
were added to Wikidata right at the point of submission so that they could already be searched
via queries, along with former projects.
   Despite the successful track record for adoption of Wikidata in GLAMs,14 Wikidata and
especially its SPARQL endpoint at first sight appear challenging to many museum staff
members who can be potential contributors. Therefore, we emphasized community building
throughout this project and distributed information on contributions via a dedicated
WikiProject page [49]. Together with DigAMus, we offered an online hands-on workshop
explaining how to contribute to the project and how to make one’s own institution more


    11See       a    list     of       the    items       and       properties     we   utilised   at:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Digital_projects_of_museums/DigAMus_Award#DigAMu
s_Award_-_Wikidata_items
    12Graphical draw.io data model for mapping the DigAMus data in Wikidata is available at:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Digital_projects_of_museums/DigAMus_Award#Datamod
el_visualization
    13Selected                   queries                  are                  listed              at:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Digital_projects_of_museums/DigAMus_Award#Querys
    14See numerous GLAM-institutional pages related to their work in Wikidata, which are linked from:
https://www.wikidata.org/wiki/Wikidata:GLAM
visible via Wikidata and Wikimedia Commons.15 Unexpected feedback from the workshop
was that due to the deliberately-chosen low entry level, the participants “dared to ask very
basic questions for the first time” – an important step towards much needed increase in
adoption of LOD and FAIR data principles in the cultural heritage and humanities fields [50,
51].


Figure 3: Image showing museum projects in context. Query available from the WikiProject page at:
https://bit.ly/3Ju2O3E

   Following the proposed model in this paper – of Wikidata acting as a hub service –
individual museums could easily run dedicated Wikibase instances for specific projects that
may be included in DigAMus in the future. Such Wikibases would contain far more granular
data – needed for a specific project execution, for example, and then link out to the more
general, overview data about all the Award submissions and related institutions in Wikidata.
   The DigAMus Award project was well-suited to highlight the importance of LOD for
cultural data, given that the Award itself promotes the exploration of varied applications of
digital technology in museums.

5.2. Case study B: Semantic annotation of 3D models, a minimum viable
product (MVP)

In the context of our work within NFDI4Culture’s ‘Task Area 1: Data Capture and
Enrichment’, we have identified the need for open and collaborative digital infrastructure that

    15See                                   workshop                             documentation:
https://docs.google.com/document/d/12KmJqfEIRTj3DCQTygDMc5A67SnRmN6RvxivaO82YVY/edit
allows for the storage, access and annotation of various digitized cultural heritage objects,
including complex 3D media files. Such an infrastructure needs to address the challenges
specific to 3D media, e.g. copyright licenses that preclude 3D files which utilize special
textures from being uploaded to open, public resources like Wikimedia Commons.
Furthermore, the need for scholarly annotations presents a good illustration of the granularity
argument. Unlike the more general data needed to describe digital museum projects presented
in the previous case, individual annotations created by different scholars to describe a single
3D model and to make a scholarly argument are far too granular to store in a data repository
such as Wikidata. Hence, the need to store data in a dedicated Wikibase instance, which can
be linked to relevant, less specific items already stored in Wikidata such as physical buildings,
geographic locations, historical figures, artistic styles, and more. To address these concrete
infrastructural challenges, we developed an integrated toolchain that consists of three main
open source software components (see Figure 4): 1) OpenRefine – for data reconciliation and
batch upload; 2) Wikibase – for linked open data storage; and 3) Kompakkt – for rendering
and annotating 3D models, and other 2D and AV media files [52]. All components of the
toolchain feature graphical user interfaces aiming to lower the barrier of participation for a
wide range of cultural practitioners and researchers.
    In Phase 1 of this project, we developed an MVP which works with a specific art and
architectural data research project – reconstruction work and 3D modelling of the
Weikersheim castle and its baroque ceiling paintings [53]. This very specific dataset allowed
us to work with real world data and develop all aspects of the toolchain with concrete user
requirements in mind. At the same time, the combination of art historical data, architectural
data, 2D images, 3D models and attendant metadata, as well as annotations, provides a very
rich sample of highly heterogeneous data in order to serve as proof of concept that this
toolchain can be deployed in multiple instances and applied beyond the field of architecture
to a wider range of related cultural disciplines.
    During Phase 2 of the project development, we are working to develop a common data
model around media files, their annotations and connections to objects in the physical world,
that can fit different cultural digitization use cases [54]. The first version of this data model
intentionally features significant overlap with Wikidata properties, to facilitate federated
querying (see Figure 5). These property mappings only cover relations concerning physical
objects and historical figures. For annotations of digital media, we have developed a custom
mapping between Kompakkt’s data model, which follows the W3C Web Annotation standard
[55], and Wikibase. Once it is possible to deploy the new Wikibase RDF extension, we will be
able to map this data model in RDF-compliant ways to the W3C Annotation standard, as well
as to more established standard schemas and ontologies commonly used in the cultural field,
such as CIDOC-CRM, and better integrate widely used vocabularies and thesauri, such as the
Getty’s Art and Architecture Thesaurus [56] and Iconclass [57], among others.
    Thanks to the capacity to store data in RDF format and the public SPARQL endpoint of
any Wikibase instance, deploying multiple instances of our toolchain for multiple project
partners does not mean that their resources will be siloed. Data from multiple collections
representing different degrees of granularity and specificity within a given cultural domain,
will remain interoperable by following our common, generalized data model with links to
Wikidata – the latter retaining its role as the hub service – and with mappings to common
ontologies and thesauri.
Figure 4: Diagrammatic representation of the MVP toolchain architecture


Figure 5: Map visualization of a sample federated query showing castles built in Renaissance
architectural style within 100km of Weikersheim: https://tinyurl.com/26q6bl5j
6. Outlook and long-term sustainability
As the many projects we have cited in this paper and the two case studies from the Open
Science Lab attest, both Wikidata – as a public, centralized and distributed vast repository of
knowledge – and Wikibase – with its possibility to be deployed as stand-alone software
product – can serve as working, albeit imperfect, solutions for the needs of scientists and
researchers managing heterogeneous datasets.
   Furthermore, the ability to deploy Wikibase is becoming increasingly easier given a focus
on providing containerized packages and installation templates [58], with community
engagement and growth being a goal explicitly stated in the Wikimedia Linked Open Data
Strategy [31]. As with all ecosystems, the likelihood of this model being fully realized is
dependent on adoption, and the willingness of relevant organizations to engage with this
vision. It is also worth noting that developing any infrastructure (whether a Wikibase
instance, or another triplestore) in the context of a research project is tied to (and limited by)
grant funding cycles. At the end of a research project, infrastructure often decays. Using
Wikibase offers seamless alignment with Wikidata, the latter being a stable infrastructure
independent of such funding cycles. Data originally stored in a Wikibase instance can easily
be exported and made permanently available on Wikidata. It can also live on in its RDF form
on many other RDF platforms. In short, Wikibase can work as a ‘proxy’ in the research data
landscape that allows long-term sustainability of the acquired knowledge [59].

Acknowledgements
NFDI4Culture is funded by the Deutsche Forschungsgemeinschaft (DFG) under grant no.
441958017.

References
[1] NFDI, Homepage, 2022. URL: https://www.nfdi.de/.
[2] Mark Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, et al, The FAIR Guiding
    Principles for scientific data management and stewardship. Scientific Data 3, 160018
    (2016). https://doi.org/10.1038/sdata.2016.18
[3] Ina Blümel, Paul Duchesne, Lozana Rossenova, Harald Sack, NFDI InfraTalk: Wikibase -
    knowledge graphs for RDM in NFDI4Culture (7 March 2022). URL:
    https://www.youtube.com/watch?v=RPMkuDxHJtI&ab_channel=NFDIDirektorat.
[4] Wikibase, Homepage, 2022. URL: https://wikiba.se/.
[5] Open Science Lab, Homepage, 2022. URL: https://www.tib.eu/en/research-
    development/research-groups-and-labs/open-science.
[6] NFDI4Culture, Homepage, 2022. URL: https://nfdi4culture.de/.
[7] Denny Vrandečić, Markus Krötzsch, Wikidata: A Free Collaborative Knowledgebase.
    Communications of the ACM 57 (10) 78–85 (2014). https://doi.org/10.1145/2629489.
[8] Wikidata, Stats Homepage, 2022. URL: https://wikidata-todo.toolforge.org/stats.php.
[9] Wikidata, Homepage, 2022. URL: https://www.wikidata.org/wiki/Wikidata:Main_Page.
[10] Andra Waagmeester, et al, Science Forum: Wikidata as a knowledge graph for the life
     sciences. eLife 2020; 9:e52614 (2020). DOI: 10.7554/eLife.52614.
[11] Andrea Piscopo, Structuring the World’s Knowledge: Socio-Technical Processes and
     Data Quality in Wikidata. PhD thesis, University of Southampton, UK, 2019.
[12] WikiCite, Homepage, 2022, URL: https://meta.wikimedia.org/wiki/WikiCite.
[13] Scholia, Homepage, 2022. URL: https://scholia.toolforge.org/.
[14] Katherine Thornton, Kenneth Seals-Nutt, Euan Cochrane, Carl Wilson, Wikidata for
     Digital Preservation, in: Proceedings of iPRES’18, Cambridge, MA, USA, September 24–
     27, 2018.
[15] Effie Kapsalis, Wikidata: Recruiting the Crowd to Power Access to Digital Archives.
     Journal of Radio & Audio Media 26 (2019) 134–142.
[16] Lydia Pintscher, Silvan Heintze, Ontology issues in Wikidata. in: Data Quality Days,
     online                                      2021.                                 URL:
     https://commons.wikimedia.org/w/index.php?title=File%3ADataQualityDaysontologyis
     sues.pdf
[17] Mike Pham, et al, Scaling Wikidata Query Service – unlimited access to all the world’s
     knowledge for everyone is hard, in: WikidataCon 2021, online, 2021. URL:
     https://www.youtube.com/watch?v=oV4qelj9fxM&ab_channel=wikimediaDE.
[18] Daniel Mietchen, Gregor Hagedorn, Egon Willighagen, et al, Enabling Open Science:
     Wikidata for Research (Wiki4R). Research Ideas and Outcomes 1: e7573 (2015). doi:
     https://doi.org/10.3897/rio.1.e7573
[19] Martin Zeinstra, Returning Commons Community Metadata Additions and Corrections
     to      Source,      Swedish       National      Heritage      Board,      2019.  URL:
     https://meta.wikimedia.org/wiki/File:Research_Report_–
     _Returning_commons_community_metadata_additions_and_corrections_to_source.pdf
[20] Foundation-l Mailing List, The problem with Wikipedia, 2010. URL:
     https://lists.wikimedia.org/pipermail/foundation-l/2010-June/059273.html.
[21] Wikidata,               Protection              Policy,           2022.           URL:
     https://www.wikidata.org/wiki/Wikidata:Protection_policy.
[22] Wikidata,           Requests          for         permissions,         2022.      URL:
     https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot.
[23] Philip Roth, An Open Letter to Wikipedia. New Yorker (September 6, 2012). URL:
     https://www.newyorker.com/books/page-turner/an-open-letter-to-wikipedia.
[24] WikiProject               Books,             Homepage,             2022.          URL:
     https://www.wikidata.org/wiki/Wikidata:WikiProject_Books.
[25] IFLA, Functional Requirements for Bibliographic Records, 2009. URL:
     https://www.ifla.org/publications/functional-requirements-for-bibliographic-records.
[26] Library of Congress, Bibliographic Framework Initiative, 2022. URL:
     https://www.loc.gov/bibframe/.
[27] Natasha Fairbairn, Maria Assunta Pimpinelli, Thelma Ross, The FIAF Moving Image
     Cataloguing Manual, International Federation of Film Archives (2016). URL:
     https://www.fiafnet.org/pages/E-Resources/Cataloguing-Manual.html.
[28] Mediawiki,             EntitySchema             Extension,           2021.        URL:
     https://www.mediawiki.org/wiki/Extension:EntitySchema.
[29] Renat Shigapov, RaiseWikibase: Towards fast data import into Wikibase. in: 2nd
     Workshop on Wikibase in Knowledge Graph based Research Data Management (NFDI)
     Projects, online, 2021. URL: https://madoc.bib.uni-mannheim.de/60059/1/29.07.2021-
     RaiseWikibase-Shigapov.pdf.
[30] Lydia Pintscher, et al, Strategy for the Wikibase Ecosystem (2019). URL:
     https://upload.wikimedia.org/wikipedia/commons/c/cc/Strategy_for_Wikibase_Ecosyst
     em.pdf.
[31] Wikimedia.         Strategy     2021:      Wikibase     ecosystem,       2021.      URL:
     https://meta.wikimedia.org/wiki/LinkedOpenData/Strategy2021/Wikibase.
[32] Lozana Rossenova, ArtBase Archive—Context and History: Discovery Phase and User
     Research        2017–2019.        2020.      URL:      https://lozanaross.github.io/phd-
     portfolio/docs/1_Report_ARTBASE-HISTORY_2020.pdf.
[33] Lozana Rossenova and Lucia Sohmen, Using OpenRefine with arbitrary Wikibase
     instances,        in:       WikidataCon       2021,       online,       2021.       URL:
     https://pretalx.com/wdcon21/talk/XDNW9A.
[34] Alexander Derveaux, Demo of upload process for a Wikibase instance, in: WikidataCon
     2021, online, 2021. URL: https://pretalx.com/wdcon21/talk/LERPAG.
[35] Wikidata, Training, 2022. URL: https://www.wikidata.org/wiki/Wikidata:Training.
[36] Sandra Fauconnier, Dragan Espenschied, Lyndsey Moulds, Lozana Rossenova, Many
     Faces of Wikibase: Rhizome’s Archive of Born-Digital Art and Digital Preservation,
     Wikimedia                        Blog                   (2018).                     URL:
     https://wikimediafoundation.org/news/2018/09/06/rhizome-wikibase/.
[37] Rhizome, Welcome to the ArtBase Query Service: Federation and Advanced Queries
     (2021).                                                                             URL:
     https://artbase.rhizome.org/wiki/Query#Federation_and_Advanced_Queries.
[38] Dennis Diefenbach, Max de Wilde, Samantha Alipio, Wikibase as an Infrastructure for
     Knowledge Graphs: the EU Knowledge Graph. in: ISWC 2021, France, online, 2021. URL:
     https://hal.archives-ouvertes.fr/hal-03353225/document.
[39] CIDOC-CRM, Homepage, 2022. URL: https://www.cidoc-crm.org/.
[40] David Fichtmueller, Using Wikibase as a Platform to Develop a Semantic Biodiversity
     Standard.     in:     1st   NFDI     Wikibase    Workshop,      online,    2021.    URL:
     https://docs.google.com/presentation/d/1i91OB9xPZVVovd8c7Cm2sOglQLM8CEeZed8
     grdVaFwU/edit.
[41] Jose Emilio Labra Gayo, et al, Representing the Luxembourg Shared Authority File based
     on CIDOC-CRM in Wikibase. in: SWIB 2021, online, 2021. URL:
     https://swib.org/swib21/slides/05-03-gayo.pdf.
[42] Wikibase Stakeholders Group, Homepage, 2022. URL: https://wbstakeholder.group/.
[43] ProfessionalWiki        Github,     Wikibase     RDF      Extension,      2022.     URL:
     https://github.com/ProfessionalWiki/WikibaseRDF.
[44] DigAMus Award, Homepage, 2022. URL: https://digamus-award.de
[45] Ina Blümel, Lucia Sohmen, Nils Casties, Integration of Wikidata 4OpenGLAM into data
     and information science curricula, in: WikidataCon 2021, online, 2021. URL:
     https://pretalx.com/wdcon21/talk/MK3ZBH/.
[46] DigAMus Award. DigADigAMus goes Wikidata, 2021. URL: https://digamus-
     award.de/2021/07/29/digamus-goes-wikidata/.
[47] OpenRefine, Homepage, 2022 https://openrefine.org/.
[48] Wikimedia         Commons,       Commons:       Copyright      rules,     2021.     URL:
     https://commons.wikimedia.org/wiki/Commons:Copyright_rules.
[49] DigAMus Award, WikiProject Digital projects of museums, 2022. URL:
     https://www.wikidata.org/wiki/Wikidata:WikiProject_Digital_projects_of_museums.
[50] Ulrike Wuttke, Here be dragons: Open Access to Research Data in the Humanities (2019).
     URL: https://ulrikewuttke.wordpress.com/2019/04/09/open-data-humanities/.
[51] Erzsebet Tóth-Czifra, Ulrike Wuttke, Loners, Pathfinders, or Explorers? How are the
     Humanities Progressing in Open Science? (2019). doi: https://doi.org/10.25815/x516-
     wf23.
[52] Kompakkt, Homepage, 2022. URL: https://kompakkt.de/home.
[53] Bayerische Akademie der Wissenschaften, Corpus der barocken Deckenmalerei in
     Deutschland, 2021. URL: https://deckenmalerei.badw.de/.
[54] NFDI4Culture 3D Data Enrichment MVP. Data Model, 2022. URL:
     https://wikibase.semantic-kompakkt.de/wiki/Data_Model.
[55] Rob Sanderson, Paolo Ciccarese, Benjamin Young (Eds.), Web Annotation Data Model:
     W3C Recommendation 23              February           2017,        2017.         URL:
     https://www.w3.org/TR/annotation-model/.
[56] The        Getty      Research      Institute,     Getty       Vocabularies,     URL:
     https://www.getty.edu/research/tools/vocabularies/.
[57] Iconclass, Homepage, 2022. URL: https://iconclass.org/.
[58] WMDE          Github,      Wikibase        Release      Pipeline,     2022.      URL:
     https://github.com/wmde/wikibase-release-pipeline.
[59] Andra Waagmeester, in: Email conversation with the author Lozana Rossenova, March
     14, 2022.

</pre>