Aligning Patterns to the Wikibase Model ?

Andrew Eells1[0000−0001−6357−6646] , Cogan Shimizu1[0000−0003−4283−8701] , Lu
   Zhou1[0000−0002−0453−9965] , Pascal Hitzler1 , Seila Gonzalez2 , and Dean
                                 Rehberger2
             1
                 Data Semantics Laboratory, Kansas State University, USA
                      2
                        MATRIX, Michigan State University, USA


        Abstract. When developing a knowledge graph, it is important to pro-
        mote its (re)usability, accessibility, and persistence. One way of accom-
        plishing (re)usability is through the principled use (i.e., using a struc-
        tured development methodology) of a schema that describes and doc-
        uments the relations between concepts in the knowledge graph. With
        respect to accessibility and persistence, one can consider exposing a
        SPARQL endpoint and allowing interested parties to query against it.
        While this is a very flexible approach, it makes it difficult to explore the
        data. On the other hand, one could consider exposing data through a
        framework such as Wikibase. In this paper, we provide a small library
        of patterns that link between a traditional ontology design pattern and
        the underlying Wikibase data model.


1     Introduction

When developing a knowledge graph, there are many aspects to consider during
its deployment. These range from usability of its interfaces (both human and
programmatic), the (re)usability of the data that it contains, its accessibility
(both in terms of uptime and its interfaces), transparency (relating to prove-
nance and trustworthiness), and its persistence. These characteristics are neatly
summarized in the FAIR manifesto [10]. One way of accomplishing (re)usability
of the data is through the principled use (i.e., using a structured development
methodology) of a schema that describes and documents the relations between
concepts in the knowledge graph. With respect to accessibility and persistence,
one can consider exposing a SPARQL endpoint and allowing interested parties
to query against it. While this is a very flexible approach, it makes it difficult to
explore the data. On the other hand, one could consider exposing data through
a framework such as Wikibase. In this paper, we explore how the Modular On-
tology Modeling methodology (MOMo [7]) can be applied in such a way that
eventual deployment of the graph data to the Wikibase model is seamless.
    Modular Ontology modeling (MOMo) specifies the development of a module
for sets of tightly bound key notions that will be included in a given ontology
?
    Copyright @ 2021 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0)
2      Eells, A. et al.

[7].3 When developing a module, it is generally suggested to identify applicable
ontology design patterns (ODPs) [2] and adapt them to the use-case at hand
through template-based instantiation [3]. During this process it is good practice
to consult existing collections of ODPs, such as the ODP Community Wiki4 or
MODL [8].
    As one of the largest publicly editable and accessible knowledge bases, Wiki-
data is an immense, crowdsourced knowledge base with persistent data that is
available for public use and consumption. It would be very difficult to have an
ontology of everything, but Wikidata is probably close enough for this purpose.
It contains millions of pieces of knowledge from many different domains in the
world. In addition, Wikidata serves as the structured data hub for all of Wiki-
media’s projects (e.g., Wikipedia, Wikivoyage, Wiktionary, and Wikisource). As
such, when modeling an ontology, it makes sense to consider ease of integration
with resources like Wikidata, among other Linked Data Platforms.5 Wikibase is
the underlying technology for the Wiki- projects.
    The Enslaved Ontology [9] was modeled using the nascent MOMo method-
ology and is used as the schema for the resulting knowledge base. The Enslaved
Hub is an innovative and compelling centralized location for engaging with his-
torical slave trade data from a variety of sources and is supported by an under-
lying installation of the Wikibase platform hosting the Enslaved Hub.
    During development, we had assumed that it would be relatively easy to
adapt the modular Enslaved ontology to the Wikibase model. Unfortunately,
between different semantics for validating data to be put into the knowledge
base and unclear mapping between different notions of provenance, it was not
as straightforward as we had expected. In the meantime, we developed some
simple and complex mapping patterns between the Enslaved modular ontology
and the Wikibase model, established in the Enslaved benchmark [11], which is
incorporated into the complex track in Ontology Alignment Evaluation Initiative
(OAEI).
    This paper takes an alternative approach by starting at the beginning of
the modeling process. In particular, by utilizing a library of ontology design
patterns that have been specifically engineered to explicitly represent how Wik-
ibase models data “under-the-hood,” thus ensuring that the ontology is optimally
structured for interoperability with Wikibase. This library can be used by any
organization to model their own internal and proprietary knowledge graphs and
apply their alignments to Wikibase model as an important tool to augment or
induce new information into their own knowledge graph. In particular, this paper
provides the first 6 set of such patterns for representing basic modeling problems
alongside an intuitive graphical notation of these patterns.
3
  We focus on the MOMo paradigm as it is closely aligned with our use case, but any
  pattern-based methodology, such as eXtreme Design [5] would work similarly.
4
  See https://ontologydesignpatterns.org/.
5
  See https://www.wikidata.org/.
6
  We are unaware of any existent patterns that address this aspect of modeling with
  respect to the Wikibase schema.
                                 Aligning Patterns to the Wikibase Model     3

1.1   The Graphical Syntax


We have developed two layers of notation. The first layer, which we refer to
as abbreviated, focuses on what an end user would input if they were adding
an entry to Wikidata via the web interface. The second layer, referred to as
expanded, builds on our abbreviated notation by including additional context
information that Wikibase adds to statements, depending on their type and
content. Our abbreviated notation, seen in figure 1, has the following syntax:
1. Gold boxes with solid borders refer to a Wikibase entity or item.
2. Arrows between nodes denote the type and direction of the relationship. In
   the diagrams, each relationship will have a prefix, which gives the type of
   the relationship, followed by a colon, and then the name of the property.
   The different prefixes are explained in Section 2.
3. Blue ovals denote a specific datatype.
4. Sometimes a property label may emit arrows; these represent context about
   the relationship. This is explained in detail in Section 3.
Figure 2 shows our expanded notation. We follow the syntax from the abbreviated
notation, with the following additions:
1. Gold boxes of the form prefix:<hash> refer to hash nodes created by Wik-
   ibase in order to add context statements to the relationship.
2. Purple boxes are explicit typing information added by Wikibase to give the
   type of the hash node.
3. Arrows pointing to a purple box are rdf:type.
The rest of this paper is organized as follows. Section 2 briefly outlines the
standard Wikibase model in order to provide context for the rest of the paper.
Section 3 describes our developed patterns with their schema diagrams. Section
4 concludes with future work.


                         Fig. 1: Abbreviated Notation
4         Eells, A. et al.


                             Fig. 2: Expanded Notation


2      The Wikibase Model

A Wikibase knowledge base is a collection of entities. Entities are the basic
elements of the knowledge base, which can be described and referenced using
the Wikibase data model. The Wikibase data model describes the structure of
the data that is handled in Wikibase. In particular, it specifies which kind of
information users can contribute to the system. In this section, we introduce
how Wikibase models data in RDF format and some basic concepts that will be
used in the rest of the paper.
    There are several predefined concepts in Wikibase data model introduced as
follows:

    – Items: Items are the way Wikibase refers to anything of interest. For every
      Item, there is some basic information that clarifies what the Item is about,
      such as the link to a Wikipedia page in some language. Each Item has a
      unique identifier of the format Q<number>, with a prefix of wd:, which is
      abbreviation of http://www.wikidata.org/entity/, such as wd:Q1234 in
      figure 3. There are also human readable labels and short descriptions that
      are used to help Wikidata users find the right Item.
    – Statements: Statements are descriptions that users have entered about
      the Item. The RDF format represents statements in two forms. The first
      type is named as a Truthy statement. These are simple triples that as-
      sert facts. The other statement type is the Full statement, which is used
      to represent all data about the statement in the system. To differentiate
      them, Wikidata usually uses specific namespaces in specific places in both
      truthy and full statements. For example, in figure 3, the namespace http:
      //www.wikidata.org/prop/direct/, which is abbreviated using the pre-
      fix wdt:, is usually used for the predicate of a truthy statement, and the
      range of the predicate is the simple value for the statement. The full state-
      ment is represented as separate node using property reification, with prefix
      wds: (http://www.wikidata.org/entity/statement/) with the ID of the
      statement (e.g., wds:12345678 in figure 3). There is no guaranteed format
      or meaning to the statement id. The statements are linked to the entity with
                                 Aligning Patterns to the Wikibase Model        5


                          Fig. 3: Wikidata Model [1]


  the predicate with prefix p: (http://www.wikidata.org/prop/) and the
  name of the property. The simple value is represented by the predicate with
  prefix ps: ( http://www.wikidata.org/prop/statement/) and the name
  of the property while the full value is represented by the predicate with prefix
  psv: (http://www.wikidata.org/prop/statement/value/).

– Qualifiers: The statement always has no more than one value, but can have
  multiple qualifiers and references. The qualifiers are used to further describe
  or refine the value of a property given in a statement. They are represented by
  predicates with prefix pq: (http://www.wikidata.org/prop/qualifier/),
  and the object is the simple value of the qualifier. The full value is repre-
  sented by the predicate with prefix pqv: (http://www.wikidata.org/prop/
  qualifier/value/), and the range of the predicate is the full value node.

– References: References offer a source that supports the given claim. They
  are represented by the predicate prov:wasDerivedFrom with the object be-
  ing the reference node. Similar to qualifiers, the simple value of the reference
  is represented by predicates with prefix pr: (http://www.wikidata.org/
  prop/reference/), while the full value is represented by the predicate with
  prefix prv: (http://www.wikidata.org/prop/reference/value/).
6         Eells, A. et al.

3      The Patterns

In what follows we outline the patterns we have designed. In each pattern dia-
gram, we follow the notation syntax outlined in Section 1.1, and shown in figure
1 and figure 2. For each pattern, we show our two notations:

    – Our abbreviated notation shows what an end user might provide in the
      front-end interface of Wikibase if they were to add an item of that datatype
      (string, dateTime, quantity, etc.) and usage (statement value vs qualifier).
    – Our expanded notation shows the additional context information that Wik-
      ibase adds to the given datatype/usage combination.

   The patterns we have provided here are only a starting point. Wikidata and
Wikibase offer native support for an ever-growing list of datatypes, and even
more can be defined with various extensions. We have chosen to focus initially
on three core datatypes that are defined in OWL [4], and offer the most general
usability and greatest impact.


3.1     Statement with Qualifier and Reference


         (a) Abbreviated


                                               (b) Expanded

                    Fig. 4: Statement with Qualifier and Reference


    The essential components of Wikibase, and what makes it an intensely pow-
erful platform for knowledge management, are the statements that describe each
entity. Each statement makes a claim about the entity, and may or may not have
a qualifier - which provides additional information about the claim (such as when
the claim was valid) - or a reference (where the claim was sourced from). These
statements link and describe entities in a structured way that we have modeled
in figure 4.
                                   Aligning Patterns to the Wikibase Model        7

    The abbreviated model for this (figure 4a) is fairly straight forward. We
have an entity, seen here in the gold box containing wd:Entity, linked by a
property shown in the green box to another entity. Context information is then
provided for that specific Entity-Property-Entity relationship, which may include
one or more qualifiers or references. Each qualifier is linked to the statement by
a pq:PXXX identifier, with the prefix pq: here denoting that the relationship
is that of a property qualifier. References are linked via prov:wasDerivedFrom
which will be expanded upon below.
    The expanded model, seen in figure 4b, shows not only the information de-
scribed above, but also yet more context information provided by Wikibase to
each statement. Starting again from our main wd:Entity node, we see it liked by
a property identified by a property ID with the format p:PXXX to a hashed node,
identified by s:<hash>. For the property, the prefix p: denotes that the type of
the relationship is that of a property, while the PXXX provides the property ID.
On the hashed statement node, s: here denotes that the node is a statement.
Coming off of the s:<hash> node, we see the link to another wd:Entity via a
property statement with the format ps:PXXX. As with the abbreviated diagram,
qualifiers are linked via pq:PXXXX.
    The three nodes on the left, prefixed with wikibase: are information added
by Wikibase to describe the statement. The first, wikibase:Statement types the
hashed node as a statement, rather than a qualifier or a reference. The other two,
wikibase:BestRank and wikibase:NormalRank provide the style of ranking for
the statement, and denote whether or not this statement is the “best” statement
for this Entity-Property-Entity relationship. For example, if an entity is linked to
two other entities, both by the same property, one of them can be ranked higher
than the other, denoting it as the “best” or preferred value for the statement.
    If a reference is provided for a given statement, Wikibase creates another
hashed node linked via prov:wasDerivedFrom [6], shown with the format
ref:<hash>. The prefix ref: denotes the node as a reference node, which is also
denoted by the wikibase:Reference node coming off of the ref:<hash>. Lastly,
the ref:<hash> node is linked to the corresponding entity for the reference via
pr:PXXX.
    The qualifier example in figure 4b, as with many datatype values, has a
complex structure that we will elaborate on in the next section.

3.2   The DateTime Pattern
In the patterns that follow, we will see that there is some significance to whether
a given datatype is used as the value of a statement or a qualifier. Let’s first
look at the use of a dateTime, shown in figures 5 and 6.
    The abbreviated notation is quite straightforward indeed. As a statement
value, an entity is linked to a dateTime value via a property. But, as shown
in figure 5b and as mentioned in section 3.1 above, Wikibase adds several data
points to this that make each statement much more useful. In addition to the
information added to the s:<hash> node, including a simple representation of
the value via ps:PXXX, Wikibase adds another hashed node. This time the added
8        Eells, A. et al.


    (a) Abbreviated
                                             (b) Expanded

                            Fig. 5: dateTime as Statement Value


    (a) Abbreviated


                                             (b) Expanded

                               Fig. 6: dateTime as Qualifier


node will house the additional context information about the value. The state-
ment node is linked to the v:<hash> value node via psv:PXXX, with the prefix
psv: denoting the relationship as that of a property statement value.
    Wikibase types the value node explicitly via wikibase:timeValue. In this
instance, given that the datatype is a dateTime value, Wikibase adds a link
to the Wikidata entity for the calendar model used (i.e. Gregorian, etc.) via
wikibase:timeCalendarModel. Also provided is a number representing the time-
zone in offset from UTC, via wikibase:timeTimezone, and a number represent-
ing the time precision via wikibase:timePrecision.7 Lastly the value node
links back to the simple value for the dateTime via wikibase:timeValue.

7
    There are 15 levels of time precision on Wikibase, with 0 corresponding to ”billions
    of years”, and 14 corresponding to seconds.
                                   Aligning Patterns to the Wikibase Model      9


(a) Abbreviated

                                           (b) Expanded

                        Fig. 7: Quantity as Statement Value


   When a dateTime is used in a qualifier, the internal structure for the v:<hash>
node remains unchanged. However, rather than linking to the statement hash
node via ps:PXXX and psv:PXXX, it is via pq:PXXX and pqv:PXXX respectively.

3.3   Quantity


      (a) Abbreviated

                                             (b) Expanded

                           Fig. 8: Quantity as Qualifier


    The quantity pattern (figures 7 and 8) mimics some of the features of date-
Time, but Wikibase then adds some specific data about the quantity mentioned.
When the quantity is being used as the value of a statement, the abbreviated
notation continues to be quite simple. A wd:Entity is linked to a xsd:decimal
via a property.
10       Eells, A. et al.

    Expanding that model out, we see that Wikibase again adds a hashed node
(v:<hash>). The statement hash links to the simple value for the quantity via
ps:PXXX and to the value node via psv:PXXX.
    When the quantity is a qualifier, our abbreviated notation (figure 6a) shows
that the quantity is linked to from the property via pq:PXXX. In its expanded
form (figure 6b), Wikibase creates the same v:<hash> node mentioned previously
however it links to the corresponding nodes via pq:PXXX and pqv:PXXX.
    Wikibase adds two datapoints for a quantity, whether it is a statement value
or a qualifier. Firstly, it types the hash node via wikibase:QuantityValue. It
also links to the Wikidata entity for the unit of measurement for that quantity
via wikibase:quantityUnit. Finally, the value node links back to the simple
value of the quantity via wikibase:quantityAmount.


3.4    String


      (a) Abbreviated
                                                   (b) Expanded

                            Fig. 9: String as Statement Value


      (a) Abbreviated
                                                   (b) Expanded

                               Fig. 10: String as Qualifier
                                     Aligning Patterns to the Wikibase Model          11

    In a slight deviation from the previous datatypes, Wikibase only adds rank-
ing (wikibase:rank) and typing (wikibase:Statement) information for a state-
ment with a string (figures 9 and 10).
    As the abbreviated notation shows, when used as the value of a statement,
an entity is linked to a string via a property. In the case of a string being used
as a qualifier, a property is linked to a string via pq:PXXX.
    Notably, no value node is created for a string.


4    Conclusion
When developing and deploying a knowledge graph, there are many obstacles
to a persistent, transparent, and usable resource. One way to overcome these
obstacles is to use the Wikibase framework. In this paper, we have represented
several common modeling constructions in a graphical syntax that makes it clear
how they map into the Wikibase context. This should allow ontology developers
to more quickly, accurately, and with reduced effort create ontologies (or knowl-
edge graph schema) that are “Wikibase ready,” thus improving persistence and
accessibility of the deployed knowledge graph. With a base understanding of
how these frequently occurring design patterns are represented in the Wikibase
model, there are many potential next steps. Of particular importance and ur-
gency would be
 1. the complete translation of MODL [8] patterns;
 2. the complete translation of Enslaved Ontology patterns.
Regarding the former, this would facilitate the creation of arbitrary, modular
ontologies that are intended to be deployed to the Wikibase framework, thus
encouraging a persistent (if Wikidata is utilized) and usable deployment of the
knowledge graph in question. For the latter, we will be able to examine the
efficacy of our approach by comparing the re-implementation of the Enslaved
Ontology patterns to the manually altered modules utilized in the Enslaved
Portal (see https://enslaved.org/).

Acknowledgements. The authors acknowledge support by the National Science
Foundation under Grant 2032628 EAGER: Open Science in Semantic Web Re-
search and Grant 2033521 A1: KnowWhereGraph: Enriching and Linking Cross-
Domain Knowledge Graphs using Spatially-Explicit AI Technologies, as well
as the Mellon Foundation through the Enslaved: Peoples of the Historic Slave
Trade.


References
 1. Wikibase/indexing/rdf dump format - mediawiki (Mar 2021), https://www.
    mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
 2. Gangemi, A., Presutti, V.: Ontology design patterns. In: Staab, S., Studer, R. (eds.)
    Handbook on Ontologies, pp. 221–243. International Handbooks on Information
    Systems, Springer (2009). https://doi.org/10.1007/978-3-540-92673-3 10, https:
    //doi.org/10.1007/978-3-540-92673-3_10
12      Eells, A. et al.

 3. Hammar, K., Presutti, V.: Template-based content ODP instantiation. In: Ham-
    mar, K., Hitzler, P., Krisnadhi, A., Lawrynowicz, A., Nuzzolese, A.G., Solanki, M.
    (eds.) Advances in Ontology Design and Patterns [revised and extended versions of
    the papers presented at the 7th edition of the Workshop on Ontology and Semantic
    Web Patterns, WOP@ISWC 2016, Kobe, Japan, 18th October 2016]. Studies on the
    Semantic Web, vol. 32, pp. 1–13. IOS Press (2016). https://doi.org/10.3233/978-
    1-61499-826-6-1, https://doi.org/10.3233/978-1-61499-826-6-1
 4. Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S. (eds.):
    OWL 2 Web Ontology Language Primer (Second Edition. W3C Recommendation
    11 December 2012 (2012), available from http://ww.w3.org/TR/owl2-primer/
 5. Presutti, V., Daga, E., Gangemi, A., Blomqvist, E.: eXtreme Design with content
    ontology design patterns. In: Blomqvist, E., Sandkuhl, K., Scharffe, F., Svátek, V.
    (eds.) Proceedings of the Workshop on Ontology Patterns (WOP 2009) , collocated
    with the 8th International Semantic Web Conference ( ISWC-2009 ), Washington
    D.C., USA, 25 October, 2009. CEUR Workshop Proceedings, vol. 516. CEUR-
    WS.org (2009), http://ceur-ws.org/Vol-516/pap21.pdf
 6. Sahoo, S., McGuinness, D., Lebo, T.: PROV-o: The PROV ontology. W3C
    recommendation, W3C (Apr 2013), http://www.w3.org/TR/2013/REC-prov-o-
    20130430/
 7. Shimizu, C., Hammar, K., Hitzler, P.: Modular ontology model-
    ing.    Tech.     rep.    (2021),    https://daselab.cs.ksu.edu/publications/
    modular-ontology-modeling, under Review.
 8. Shimizu, C., Hirt, Q., Hitzler, P.: MODL: A modular ontology design library.
    In: Janowicz, K., Krisnadhi, A.A., Poveda-Villalón, M., Hammar, K., Shimizu,
    C. (eds.) Proceedings of the 10th Workshop on Ontology Design and Patterns
    (WOP 2019) co-located with 18th International Semantic Web Conference (ISWC
    2019), Auckland, New Zealand, October 27, 2019. CEUR Workshop Proceed-
    ings, vol. 2459, pp. 47–58. CEUR-WS.org (2019), http://ceur-ws.org/Vol-2459/
    paper4.pdf
 9. Shimizu, C., Hitzler, P., Hirt, Q., Shiell, A., Gonzalez, S., Foley, C., Rehberger, D.,
    Watrall, E., Hawthorne, W., Tarr, D., Carty, R., Mixter, J.: The enslaved ontology
    1.0: People of the historic slave trade. Tech. rep., Michigan State University, East
    Lansing, Michigan (April 2019)
10. Wilkinson, M.D., Dumontier, M., et al.: The fair guiding principles for scientific
    data management and stewardship. Scientific Data 3, 160018 EP – (Mar 2016),
    https://doi.org/10.1038/sdata.2016.18, comment
11. Zhou, L., Shimizu, C., Hitzler, P., Sheill, A.M., Estrecha, S.G., Foley, C., Tarr,
    D., Rehberger, D.: The enslaved dataset: A real-world complex ontology align-
    ment benchmark using wikibase. In: d’Aquin, M., Dietze, S., Hauff, C., Curry,
    E., Cudré-Mauroux, P. (eds.) CIKM ’20: The 29th ACM International Conference
    on Information and Knowledge Management, Virtual Event, Ireland, October 19-
    23, 2020. pp. 3197–3204. ACM (2020). https://doi.org/10.1145/3340531.3412768,
    https://doi.org/10.1145/3340531.3412768