<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Contextualization via Qualifiers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter F. Patel-Schneider?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Nuance AI and Language Lab</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>A common method for contextualizing facts in knowledge graph formalisms is by adding property-value pairs, qualifiers, to the facts in the knowledge graph. Qualifiers work well for information that is additional to the base fact but pose an unwarranted burden on consumers of the information in knowledge graphs when the qualifier instead contextualizes the base fact, as in limiting the applicability of the fact to some time period or providing a confidence level for the fact. Contextualization should instead by done in a more principled manner and accompanied by tools that lessen the burden on consumers of knowledge graphs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Additive Qualifiers</title>
      <p>Consider first using qualifiers to represent the role played by a cast member of a film.
This information is somewhat awkward to represent in triple-based formalisms because
it is a three-place relationship. So Wikidata and schema.org use qualifiers to attach the
role played to the base information that an actor is in the cast of a film.</p>
      <p>
        RDF reification and singleton properties can use qualifiers for this as well. In RDF
reification [
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ], the reified cast membership triple is represented as four triples, using
built-in RDF vocabulary, as in
_:rt rdf:type rdf:Statement .
_:rt rdf:subject :GWTW .
_:rt rdf:predicate :castMember .
      </p>
      <p>_:rt rdf:object :ClarkGable .
where _:rt is a statement node that represents the triple</p>
      <p>:GWTW :castMember :ClarkGable .</p>
      <p>The statement node can then have the other information associated with it, as in
_:rt :characterRole :RhettButler .</p>
      <p>It is important to note that the reification of a triple in RDF does not entail the triple,
i.e., it is possible to have the reification of a triple in RDF without implying that the
triple itself is true so there is no implication from the above that Gone with the Wind
has Clark Gable as a cast member, nor that he plays Rhett Butler in the movie.</p>
      <p>
        Wikidata [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is built around items and property-value statements about them, thus
forming in essence subject-predicate-object facts. Each statement can have associated
qualifiers,4 which are predicate-object pairs. So Wikidata should contain the following
information5
      </p>
      <p>Gone with the Wind
instance of: film
cast member: Clark Gable</p>
      <p>character role: Rhett Butler
The intended meaning of this is that Gable is a cast member of Gone with the Wind
and that the character role of this cast membership is Rhett Butler. Here the
underlying “triple” is indeed a fact, as opposed to the situation with RDF reification, but, as
Wikidata does not have a formal semantics, there is no formal statement of this
intended meaning. As qualifiers are placed directly on regular statements any tools that
use Wikidata information don’t need any special code to handle the statements that have
attached qualifiers.</p>
      <p>
        Singleton properties [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is an extension to RDF that creates a new kind of property,
one that can only have one triple in its extension. This singleton property can then
4 https://www.wikidata.org/wiki/Help:Qualifiers
5 Descriptive names are used here in Wikidata information structures—as opposed to the actual
opaque numeric identifiers—so that the meaning of the structures is easier to see.
be used as the subject of other triples that are considered not to be about the singleton
property itself but instead are about its single triple. So in singleton properties one might
say
:castMember_1 rdf:singletonPropertyOf :castMember
:GWTW :castMember_1 :ClarkGable .
      </p>
      <p>:castMember_1 :characterRole :RhettButler
This looks like a syntactic abbreviation for RDF reification but it has one important
difference—there is an entailment in singleton properties from the singleton property
to its regular property. That is, in the singleton properties semantics the above triples
entail</p>
      <p>Because singleton properties uses a special data structure (the singleton property) a tool
for singleton properties needs to have special code that makes the inference from triples
with singleton properties to the triple with the correct regular property.</p>
      <p>schema.org contains a special class Role (https://schema.org/Role) that
is used for additional information about a relationship.6 Instead of a property directly
linking from a subject to an object, the property links from the subject to a role item
and then again from the role item to the object, as in (again recast slightly)
:GWTW :castMember _:r .
_:r rdf:type schema:Role .
_:r :castMember :ClarkGable .</p>
      <p>_:r :characterName :RhettButler .</p>
      <p>The intended meaning of this is again that Gone with the Wind has Clark Gable as a
cast member, and that the character role of this cast membership is Rhett Butler. (Indeed
schema.org has a specialization of Role, PerformanceRole, that is designed specifically
for this particular kind of extra information.) So here again the underlying “triple” is
indeed a fact, as opposed to the situation with RDF reification. As schema.org does
not have a formal semantics, there is no formal statement of this intended meaning.
Any tool for schema.org data needs to handle Role items specially, short-circuiting the
double link to recover the underlying normal meaning</p>
      <p>From this example it appears that each of Wikidata, singleton properties, and schema.org
have qualifiers working very nicely. They all permit additional information to be
associated with their analogue of triples while retaining the underlying truth of the triple.
The special processing required to recover this underlying truth is not onerous. RDF
reification appears the loser here as it does not directly retain the underlying truth while
associating extra information with a triple. If the triple itself is supposed to be true then
it has to be asserted independently, and connecting the additional information to this
triple also has to be done.
6 http://blog.schema.org/2014/06/introducing-role.html
In the above example the qualifer is adding extra information about the fact, namely that
the cast member is playing a particular role. This extra information does not interfere
in any way with the cast membership, so tools (and queries) that are interested in cast
membership can safely ignore the qualifier so long as they handle the modifications to
the underlying data structure to handle qualifiers. This is true even in RDF, so long as
the base triple is also present.</p>
      <p>However, many, perhaps most, uses of qualifiers in Wikidata7 and schema.org
contextualize the underlying fact, i.e., they limit the contexts in which the underlying fact is
true. One of the most important contextualizations is temporal contextualization, which
is generally handled in these schemes via a start and end qualifier.</p>
      <p>
        So in singleton properties representing the temporal aspects of Bob Dylan’s marital
information would be (adapted from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]):
:isMarriedTo#1 rdf:singletonPropertyOf :isMarriedTo .
:BobDylan :isMarriedTo#1 :SaraLownds .
:isMarriedTo#1 :hasStart "1965-11-22"ˆˆxsd:date .
:isMarriedTo#1 :hasEnd "1977-06-29"ˆˆxsd:date .
:isMarriedTo#2 rdf:singletonPropertyOf .
:BobDylan :isMarriedTo#2 :CarolDennis .
:isMarriedTo#2 :hasStart "1986-06-04"ˆˆxsd:date .
:isMarriedTo#2 :hasEnd "1992-10-01"ˆˆxsd:date .
In Wikidata this is
      </p>
      <p>Bob Dylan
spouse: Sara Dylan
end time: 29 June 1977
start time: 22 November 1965
spouse: Carolyn Dennis
end time: October 1992
start time: 4 June 1986
In singleton properties there is the entailment from the above triples to
:BobDylan :isMarriedTo :SaraLownds .</p>
      <p>:BobDylan :isMarriedTo :CarolDennis .
i.e., that Bob Dylan is a bigamist. In Wikidata there is also the implied intent that there
are two spouses of Bob Dylan, but although this is somewhat misleading there is
perhaps not quite the same force to the conclusion of bigamy because of the name of the
7 Of the ten most-used qualifier properties in Wikidata as of July 2018. (See https://
tools.wmflabs.org/sqid/#/browse?type=properties.) four provide
temporal context (point in time, start time, valid in period, and end time), one provides spatial context
(chromosome), one can be considered to provide a certainty context (determination method),
three are ”additive” (taxon author, points scored, and matches played), and one provides no
information content (stated as).
relationship. As there is no formal semantics for Wikidata there is no formal support
for drawing both conclusions. The qualifier methodology of schema.org would act the
same as Wikidata.</p>
      <p>RDF reification, on the other hand, can represent this information as in
:rt1 rdf:subject :BobDylan .
:rt1 rdf:predicate :isMarriedTo .
:rt1 rdf:object :SaraLownds .
:rt1 :hasStart "1965-11-22"ˆˆxsd:date .
:rt1 :hasEnd "1977-06-29"ˆˆxsd:date .
:rt1 rdf:subject :BobDylan .
:rt1 rdf:predicate :isMarriedTo .
:rt1 rdf:object :CarolDennis .
:rt1 :hasStart "1986-06-04"ˆˆxsd:date .</p>
      <p>:rt1 :hasEnd "1992-10-01"ˆˆxsd:date .
which does not imply that Bob Dylan is a bigamist.
3</p>
    </sec>
    <sec id="sec-2">
      <title>The Problem with Contextual Qualifiers</title>
      <p>The underlying problem is that contextual information does not add to the base
information but instead modifies the it, stating in which context (temporal or otherwise) the
information is true or providing information as to how likely the base information is to
be true. Consumers of the information need to always be aware of whether the context
that they are (perhaps implicitly) working in is not one to which the qualifiers attached
to a particular piece of information apply.</p>
      <p>This might not be so hard if the only contextual information is temporal and that
information is carried only by start and end dates. Tools (and queries against the
underlying data) working in a particular time point can explicitly exclude facts with temporal
qualifiers that do not cover that time point. Tools (and queries) working in an implicit
now can exclude any fact with an end time qualifier (assuming of course that no end
date is in the future, which could be a requirement for temporal qualifiers).</p>
      <p>However, each and every contextual qualifier has to be considered by every tool
(and query against the underlying data). So there can be multiple exclusions, such as
for contextual location, confidence, and certainty, and tools (and queries) will have to
be updated whenever new such qualifiers are added. Wikidata has over one hundred and
fifty qualifiers so to determine whether a fact is true in a context in Wikidata each and
every one has to be examined to see if it is a contextual qualifier and for those which are
their interaction with the current context or the implicit context has to be determined.
This is a high bar indeed, made even worse as new qualifiers are added on an ongoing
basis.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Solutions for Contextual Qualifiers</title>
      <p>The right solution is to not show contextual qualifiers to consumers. Ideally, instead
replace contextual qualifiers with a formal theory of the contexts so that basic tools
(contextualizers, reasoners) can be written that correctly take context into account.
Alternatively, create low-level tools that remove facts that are not valid in the contexts that
a consumer wants to use.</p>
      <p>The formal approach requires building logics and theories for the contextual
constructs. Building and implementing these formal theories is certainly not a trivial task—
handling temporal contexts, for example, requires a temporal logic8 —but the result will
provide firm underpinnings of the constructs, avoiding problems with differing views
of their meaning by different consumers of the information. Making the complexities
inherent in contexts part of the base capabilities of knowledge graphs also means that
more of the complexities are handled by the producers of contextualized knowledge
graphs and fewer are inflicted on the consumers of information from knowledge graphs.</p>
      <p>
        The non-formal approach requires less initial effort. For example, instead of being
given full temporal reasoning tools, consumers would be given an interface that
produces only the information valid in a particular context or a particular range of contexts.
This is somewhat similar to RDF extracts produced from Wikidata [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and the truthy
statements in the Wikdata RDF dumps described in https://www.wikidata.
org/wiki/Wikidata:Database_download/en although these extracts are not
based on an interesting theory of contextual qualifers and do not fully distinguish
between qualifiers that provide extra information and those that provide
contextualization. This interface, however, does not allow consumers to work in multiple contexts
(or ranges of contexts) at one, limiting the kind of work they can do.
      </p>
      <p>It is quite reasonable to combine the two approaches, providing extraction tools
that are based on a formal theory of the contexts. Consumers that only need simple
contextual processing can use the extraction tools; consumers that need more complex
contextual processing can use the full reasoning tools.
8 If all time points are constants on a single time line, the required temporal reasoning might be
simple, but still a temporal logic should be used to underpin the temporal reasoning algorithm.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Jeremy</given-names>
            <surname>Carroll</surname>
          </string-name>
          , Christian Bizer, Pat Hayes, and
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Stickler</surname>
          </string-name>
          .
          <article-title>Named graphs</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <fpage>247</fpage>
          -
          <lpage>267</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Richard</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , David Wood,
          <string-name>
            <given-names>and Markus</given-names>
            <surname>Lanthaler</surname>
          </string-name>
          .
          <source>RDF 1</source>
          .
          <article-title>1 concepts and abstract syntax</article-title>
          .
          <source>W3C Recommendation</source>
          , http://www.w3.
          <source>org/TR/rdf11-concepts</source>
          ,
          <issue>25</issue>
          <year>February 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Fredo</given-names>
            <surname>Erxleben</surname>
          </string-name>
          , Michael Gu¨nther, Markus Kro¨tzsch, Julian Mendez, and Denny Vrandec˘ic´.
          <article-title>Introducing wikidata to the linked data web</article-title>
          .
          <source>In Proceedings of the Thirteenth International Semantic Web Conference</source>
          , pages
          <fpage>50</fpage>
          -
          <lpage>65</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Hayes and Peter F. Patel-Schneider</surname>
          </string-name>
          .
          <source>RDF 1.1 semantics. W3C Recommendation</source>
          , http://www.w3.org/TR/rdf11-mt/,
          <source>25 February</source>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Vinh</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , Olivier Bodenreider, and
          <string-name>
            <given-names>Amit</given-names>
            <surname>Sheth</surname>
          </string-name>
          .
          <article-title>Don't like RDF reification? making statements about statements using singleton property</article-title>
          .
          <source>In Proceedings of the 23rd international conference on the World Wide Web (WWW</source>
          <year>2014</year>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Denny</given-names>
            <surname>Vrandec</surname>
          </string-name>
          <article-title>˘ic´ and Markus Kro¨tzsch. Wikidata: a free collaborative knowledgebase</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>57</volume>
          (
          <issue>10</issue>
          ):
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>