<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Focused Belief Measures for Uncertainty Quantification in High Performance Semantic Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cliff Joslyn</string-name>
          <email>cliff.joslyn@pnnl.gov</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesse Weaver</string-name>
          <email>jesse.weaver@pnnl.gov</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fundamental and Computational Sciences Directorate, Pacific Northwest National Laboratory</institution>
          ,
          <addr-line>Richland, WA 99354</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Security Directorate, Pacific Northwest National Laboratory</institution>
          ,
          <addr-line>Seattle, WA 98109</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <issue>164</issue>
      <fpage>8</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>-In web-scale semantic data analytics there is a great need for methods which aggregate uncertainty claims, on the one hand respecting the information provided as accurately as possible, while on the other still being tractable. Traditional statistical methods are more robust, but only represent distributional, additive uncertainty. Generalized information theory methods, including fuzzy systems and Dempster-Shafer (DS) evidence theory, represent multiple forms of uncertainty, but are computationally and methodologically difficult. We require methods which provide an effective balance between the complete representation of the full complexity of uncertainty claims in their interaction, while satisfying the needs of both computational complexity and human cognition. Here we build on Jøsang's subjective logic to posit methods in focused belief measures (FBMs), where a full DS structure is focused to a single event. The resulting ternary logical structure is posited to be able to capture the minimally sufficient amount of generalized complexity needed at a maximum of computational efficiency. We demonstrate the efficacy of this approach in a web ingest experiment over the 2012 Billion Triple dataset from the Semantic Web Challenge.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>I. INTRODUCTION</p>
      <p>Many analytic domains face the problem of determining
the veracity of claims from multiple sources. The problem
can be further complicated by the presence of large numbers
of sources asserting large numbers of propositions over short
periods of time. Examples include intelligence gathering and
sensor networks. Such problems are only exacerbated on the
web with the constituent heterogeneity of data, and then again
especially when brought to web scale.</p>
      <p>
        While there are various logics for dealing with inconsistency
or uncertainty [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], to our knowledge, none have
achieved significant uptake in computational systems for large
data. Traditional statistical uncertainty representation (UR)
models fail to represent complex uncertainty situations
requiring imprecision or other forms of ambiguous judgements.
Socalled Generalized Information Theory (GIT) approaches [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
such as fuzzy and Dempster-Shafer (DS) can represent these
complexities, but at the cost of high computational expense.
      </p>
      <p>Massive streaming or ingest problems in the semantic web
require UR strategies which provide both representation of
ambiguity and computational efficiency.</p>
      <p>
        Here we present Focused Belief Measures (FBMs), an
adaptation of Jøsang’s subjective logic (SL) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], as a candidate to
play this role. By modifying DS to focus on specific events
in a complex space, FBMs can model logical combinations
of complex beliefs involving imprecision, ambiguity, or total
ignorance using linear algorithms. This compromise promises
to support the minimal amount of generalized complexity
which may be nonetheless sufficient, but at a maximum of
computational efficiency.
      </p>
      <p>We begin by introducing FBMs in the context of both SL
and DS. We then demonstrate its utility in a web analytics
experiment involving the evaluation of a large RDF graph
drawn from the 2012 Semantic Web Challenge.</p>
      <p>II. FOCUSED BELIEF MEASURES (FBMS)</p>
      <p>
        UR methods and formalisms are legion, primarily rooted
in probability theory, logic, or their combination (e.g. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]).
      </p>
      <p>For decades, UR researchers have struggled with two
competing imperatives. On the one hand, traditional UR methods,
including probabilistic (statistical) and logical approaches
require closed-world assumptions. For probability, we require
knowledge about likelihood distributions over a set of entities
which are both exhaustive and mutually exclusive in order to
guarantee mathematical additivity: summation of all modeled
probabilities to 1. Representing total uncertainty requires
assuming a uniform distribution over these choices. Similarly,
traditional logic represents only the two states for true (A)
and false (⇠ A, here taking ⇠ for negation), according to an
excluded middle axiom.</p>
      <p>
        The large range of GIT methods, including fuzzy systems,
DS evidence theory, random sets, imprecise probabilities [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ],
and many others, support open world situations by allowing
some form of third option or remainder between True and
False, or non-additive, imprecise “overlap” between
nondisjoint options. They do this in different ways, but the basic
concept is the same, to generalize traditional approaches by
relaxing certain axioms, such as allowing probabilities to
sum to more or less than 1, or to recognize truth values
“between” True and False (different kinds of “Maybe”), in
order to represent more complex uncertainty structures other
than probabilistic likelihoods. In this way one can represent
inherent vagueness or imprecision of events, or second- or
higher-order uncertainties about other uncertainties, reflecting
the veracity of the information source, the confidence or
likelihood of claims, the uncertainty about a claim, or other
open world situations where “something else” we didn’t think
about could occur.
      </p>
      <p>Remainder</p>
      <p>These generalized mathematical structures are inherently
hierarchical, since this “remainder” “stands above” its
constituent choices. Fig. 1 shows the absolute simplest such case,
where “neither A nor ⇠ A” is our “third choice” standing above
A and ⇠ A themselves. This remainder can be used to represent
total uncertainty or imprecision when positive weight is given
to the remainder, but no weight is given to either of the two
specific choices.</p>
      <p>
        Fig. 1 actually shows a tiny lattice structure, and the
resulting methods require lattice-based computations arising
from non-additivity. For example, classical probability has the
condition Pr(A [ B) = Pr(A) + Pr(B) Pr(A \ B), which
is a fully modular function on the subset lattice, allowing
relatively simple calculations. But in non-additive formalisms,
we have Pr(A [ B) ? Pr(A) + Pr(B) Pr(A \ B), which is
sub- or super-modularity [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Modularity allows probabilities
of “bigger” events to be calculated from those of “smaller”, so
non-modularity forces a high computational price. In big data,
semantic web environments with massive ingest and streaming
input applications, we need methods for representing such
hybrid uncertainty, but which are both expressive and tractable.
      </p>
      <p>
        Our FBM approach is built on Jøsang’s SL [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which is in
turn based on DS. Consider a decision problem, like whether
Alice, Bob, or Carol committed a crime. In probability, we
need P = p(A) + p(B) + p(C) = 1 to hold. If P &gt; 1,
then we have conflict and have to renormalize; while if P &lt;
1, then we have a remainder, represented as an uncertainty
U = 1 P &gt; 0 leftover. In DS theory, we represent general
probabilistic uncertainty by giving probabilities m not just to
each of these n = 3 disjoint events A, B, C 2 ⌦ , but to each
of the 2n subsets R ✓ ⌦ of such events. Formally, we have
m : 2⌦
! [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ],
m(; ) = 0,
      </p>
      <p>X m(R) = 1.</p>
      <p>R✓ ⌦
We identify any R ✓ ⌦ with m(R) &gt; 0 as focal.</p>
      <p>This supports modeling of imprecision and ignorance
together with likelihood by assigning values to the completely
imprecise event m(⌦ ), down to the most precise singletons
m({A}), and everything in between, including composite
events like m({A, B}) for “Alice and/or Bob did it”.</p>
      <p>
        The resultant belief measures
b : 2⌦
! [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ],
b(R) = X b(S)
      </p>
      <p>S✓ R
on any subset R ✓ ⌦ capture a mixture of likelihood and
imprecision, since claims m about subsets R cannot
necessarily be disambiguated to knowledge about their constituent
elements ! 2 R. But considering (the middle of) Fig. 2
compared to Fig. 1, we see that we now need to support the full
Boolean lattice representing the power set 2⌦ of all subsets.</p>
      <p>This comes at a huge computational cost, since we now have
to work with 2n rather than n claims, and moreover their
interaction within the lattice structure.</p>
      <p>
        But Jøsang [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] has noted that if we focus attention to a
particular composite event R ✓ ⌦ (like {A, B}=Alice and/or
Bob did it), we can reduce the complexity to just three disjoint
groups of subsets:
1) Those (like {A}=Alice did it) completely within R
      </p>
      <p>supporting R itself;
2) Those (like {C}=Carol did it) completely disjoint from</p>
      <p>R supporting ⇠ R (now taking ⇠ R = ⌦ \ R for set
complement); and
3) The remainder (like {B, C}=Bob and/or Carol did it),
providing information contradictory to or ambiguous
with respect to both R and ⇠ R.</p>
      <p>These three groups reduce to a single “opinion” vector</p>
      <p>w(R) = hb(R), d(R), u(R)i ,
where in addition to b(R) as the belief of R, we have
d(R) = b(⇠ R) =</p>
      <p>m(S)
X
S✓⇠</p>
      <p>R</p>
      <p>X</p>
      <p>
        S : ;6 =S\ R6=S
as the belief of ⇠ R, that is the disbelief of R1. Since
b(R), d(R) 2 [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] and b(R) + d(R)  1, this allows us to
define
u(R) =
m(S) = 1
b(R)
      </p>
      <p>d(R)
to serve elegantly as our generalized remainder, or uncertainty
of R.</p>
      <p>As so specified, b(R) + d(R) + u(R) = 1. Thus b, d, and
u exhaust all the options concerning R and ⇠ R, but do so
while including a representation of the “remainder”, u, which
is about “neither R nor ⇠ R”. This reflects the fact that while,</p>
      <p>1We depart from Jøsang in using this formulation for d, rather than his
(equivalent) d(R) = PS\ R6=; ,S6✓ R m(S). His is both formally more
complex and conceptually less cogent, since it does not capture the sense
in which b and d represent the overall belief in the set R and its “opposite”.</p>
      <p>Since our choice is to cast both b and d as beliefs, just in the set R and
its opposite ⇠ R, and additionally since our formulation does not rely on
anything inherently either “subjective” or “logical”, we choose to identify
our formulation as focused belief measures rather than Jøsang’s “subjective
logic”.
.3
.2
.4
AC
B
.4
BC
remauinder
.4
AB
b</p>
      <p>.2</p>
      <p>C=~dAB
.2 C
w(AB) = &lt;b(AB),d(AB),u(AB)&gt; = &lt;.4,.2,.4&gt;
Fig. 2. Proababilities about Alice, Bob, and Carol, and their combinations, need 23
1 = 7 assessments. When focused on {A, B}, we reduce to three.
for any R ✓ ⌦ , the two sets R and ⇠ R partition ⌦ , it is rather
the three classes (sets of subsets)
{S ✓</p>
      <p>R},
{S ✓⇠</p>
      <p>R},
2⌦ \ ({S ✓</p>
      <p>R} [ {S ✓⇠</p>
      <p>R})
which partition the power set 2⌦ . It is this third class which
is our remainder.</p>
      <p>Consider opinions wA(R), wA(S), and wB(S) as opinions
from information sources A and B about propositions R and
S. Also let wA(B) be source A’s opinion of source B. Jøsang
then provides a series of algebraic operators for different
combinations, including:
• Conjunction
wA(R ^ S) =</p>
      <p>bA(R)bA(S),
and disjunction
wA(R _ S) =
bA(R) + bA(R)</p>
      <p>bA(R)bA(S),
,
+
expressing the opinion of one proposition R by two
sources A, B; and
• A series discounting operator
wA(B) ⌦ wB(S) =</p>
      <p>D
bA(B)bB(S),
bA(B)dB(S),
dA(B) + uA(B) + bA(B)uB(S)</p>
      <p>E
expressing the discounted opinion about a base opinion
wB(S) in light of another opinion wA(B), which we
take to be A’s opinion about the agent B expressing the
opinion wB(S).</p>
      <p>Note the tradeoff that FBMs make. We are not representing
the full complexity of all 2n 1 possible combinations required
by DS; but for any R, or collection of Rs, we are able to
directly model R, ⇠ R, and their remainder, and while in
a minimal way, with a maximal amount of computational
efficiency: the huge advantage of these operators is that they
are linear in the components b, d, u of the opinion vectors
w. Given that we care about only k such events, then in
realistic cases we have reduced the size of our problem space
to 2k ⌧ 2n 2 (that is, to O(k) from order O(2n)). We
have also vastly improved user comprehensibility, since
conceptualizing operations on linear vectors is far less challenging
than the structure of hypercubic Boolean lattices. Thus logical
combinations of complex situations can be represented easily
and cheaply, while still representing our “third option”.</p>
      <p>This is shown even more strongly in Fig. 3, now the case for
n = 4 basic choices, shown in the 4-dimensional hypercube
(Boolean 4-lattice), and displaying the 4 focal sets. If we
wished to track all 2n 1 = 15 possible choices, then so
be it, but consider instead that we were only interested in the
k = 3 choices {A}, {A, C}, and {A, B, C}. Then we would
only need to track the 3k = 9 pieces of information in the
opinion vectors
w(A) = h.1, .4, .5i , w(AC) = h.3, .4, .3i</p>
      <p>w(ABC) = h.6, 0, .4i
(note that here we simplify notation so that e.g. ABC =
{A, B, C}). As shown, we’ve replaced the need to store and
compute on a 4-dimensional hypercube in exchange for three
2-dimensional hypercubes. FBMs are thus ternary, avoiding
limited binary reasoning with a third category to represent
“complete ignorance”, but also avoiding the full 2n complexity
of the set-based DS.</p>
      <p>.3</p>
      <p>We next seek to demonstrate the value, feasability, and
tractability of using FBMs in a data ingest experiment on
the semantic web. The experiment reported on below is
provisional, an initial foray into the basic operation of the FBM
approach, but as specifically applied to web-based analytics.</p>
      <p>In particular, as will be described in more detail below, we
use opinion vectors which are relatively constrained examples,
being unequivocal for basic claims, but always with residual
uncertainty in their aggregation. Further investigation can open
the approach up to range over a wider array of values, seeking
performance and sensitivity analyses.</p>
      <p>A. FBM Problem Setup</p>
      <p>Consider the following decision problem. Three data
sources make claims about certain facts. Alice and Cindy
assert p, but Bob says ⇠ p. The judge must determine whether
he/she believes p is true. The judge generally believes Alice
and Bob (Bob more than Alice) and thinks that Cindy is lying.</p>
      <p>An FBM setup for this problem is shown in Fig. 4. First, the
base claims made by Alice, Bob, and Cindy are unequivocally
either true or fale (Jøsang calls these “dogmatic”), and thus
lack the uncertainty parameter u. We have
wA(p) = h1, 0, 0i ,
wB(p) = h0, 1, 0i ,</p>
      <p>wC (p) = h1, 0, 0i .</p>
      <p>
        Next, the judge is inclined to believe Alice at about 80%, but
it’s not that her remaining 20% disbelieves Alice, but she is
rather uncertain about Alice, not having further grounds to ei- B. Billion Triple Challenge
ther believe of disbelieve her. We thus have w(A) = h.8, 0, .2i. Over the last 14 years, the Resource Description Framework
The judge finds Bob convincing at 95%, even more than Alice, (RDF) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has become a popular knowledge representation
so w(B) = h.95, 0, .05i. Finally the judge is quite sure that with mature research and tools to support it. It also has
Cindy is lying, but there is always the residual possibility associated ontology languages such as RDF Schema (RDFS)
otherwise, so w(C) = h0, .99, .01i. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and the Web Ontology Language (OWL) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] which allow
Bob"
      </p>
      <p>,p"
Cindy"
p"
&lt;1,"0,"0&gt;"
&lt;0,"1,"0&gt;"
&lt;1,"0,"0&gt;"
&lt;0.8,"0,"0.2&gt;"</p>
      <p>Alice"is"
believable"</p>
      <p>Bob"was"
&lt;0.95,"0,"0.05&gt;" convincing"
&lt;0,"0.99,"0.01&gt;"
&lt;0.8,"0,"0.2&gt;"
&lt;0,"0.95,"0.05&gt;"
&lt;0,"0,"1&gt;"</p>
      <p>Cindy"is"
lying"</p>
      <p>Judge"
p"seems"more"
false"than"true"
&lt;0.17,"0.79,"0.04&gt;"</p>
      <p>Assume that Alice, Bob, and Cindy actually testify in
order, modeling a streaming ingest operation. Initially, we have
absolutely no knowledge about p, and thus the only valid
choice is the totally uncertain opinion w(p) = h0, 0, 1i. As
an opinion wX (p) of p by source X arrives, we then update
our opinion as:
w0(p) = w(p)</p>
      <p>(w(X) ⌦ wX (p)),
first discounting X’s opinion of p with our opinion of X,
and then aggregating with our prior opinion. We are thus
building the consensus opinions as more data arrives from
more sources, and the only state that needs to be saved is a
single opinion for p.</p>
      <p>After Alice, the judge’s opinion of p is
h0, 0, 1i</p>
      <p>(h1, 0, 0i ⌦ h 0.8, 0, 0.2i) = h0.8, 0, 0.2i.</p>
      <p>Then Bob testifies that p is absolutely false, or alternatively,
that ¬p is absolutely true. Now the judge’s opinion of p is
h0.8, 0, 0.2i</p>
      <p>(h0, 1, 0i ⌦ h 0.95, 0, 0.05i) = h0.17, 0.79, 0.04i.</p>
      <p>Bob has effectively changed the judge’s mind. Finally, Cindy
testifies that p is absolutely true, but the judge is nearly certain
that she is lying. In the end, the judge’s opinion of p is
h.17, .79, .04i</p>
      <p>(h1, 0, 0i ⌦ h 0, .99, .01i) = h.17, .79, .04i.</p>
      <p>The judge’s mind did not change at all as a result of Cindy’s
testimony. In the end, the judge is inclined to believe that p
is false.</p>
      <p>
        Triples
Any FOAF-related triple
With predicate foaf:name
With predicate foaf:knows
for declarative semantics that support reasoning tasks like
inference and consistency checking. There are many efforts
to expose data as RDF (e.g., Facebook [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Data.gov [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
biomedical [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], etc.)2, and major companies are employing
RDF to allow users to mark up their data (e.g., Open Graph
Protocol3, schema.org4, Twitter cards5).
      </p>
      <p>
        Thus there is an abundance of RDF data which is driving
the kinds of data integration challenges we posit FBMs to
be valuable for. Even those which use different ontologies
can be meaningfully unified using a relatively simple “upper
ontology” given a basic knowledge of common ontologies
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. For non-RDF data sources, the heterogeneity problem
can be solved by providing an RDF or SPARQL [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] interface
on data sources (either on the producer or consumer side,
like in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], which is a coding task) and by providing an
appropriate ontology to give a unified view of the data (which
is a design task needing sufficiently knowledgeable persons or
good documentation of the data source).
      </p>
      <p>
        We experimented with this streaming FBM method on a
large RDF dataset crawled from the web. The 2012 Billion
Triple Challenge dataset (BTC)6 is a set of RDF quads crawled
from the Web for the purposes of challenging competitors to
work at scale. BTC was chosen because it represents one of the
best and largest publicly available RDF datasets, and because
of our own past experience working with previous versions of
it [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>The RDF quads in BTC are RDF triples with an additional
component that we will refer to as the “graph name”. For
an RDF quad hs, p, o, gi (where g is the graph name), let
d = http(g) be the direct URL of the document retrieved
over HTTP when following g (e.g., when you put g in your
browser). In some cases, d = g. However, http(g) can result
in redirection (e.g., HTTP codes 301, 302, 303) in which case
d 6= g. Many graph names can map to the same document
URL, and these mappings are also captured in the (broader
sense of the) BTC dataset.</p>
      <p>For our experiment, we limited ourselves to quads in BTC
that utilized terms from the friends-of-a-friend (FOAF)
ontology7. FOAF contains 164.3M overall, including two specific
sub-groups (foaf:knows and foaf:name) which we will
use below (see Table I).</p>
      <p>2http://www.semantic-web-journal.net/accepted-datasets – last accessed
August 8, 2013 – contains an entire list of such datasets.</p>
      <p>3http://ogp.me/ – last accessed August 8, 2013
4http://schema.org/ – last accessed August 8, 2013
5https://dev.twitter.com/docs/cards – last accessed August 8, 2013
6http://km.aifb.kit.edu/projects/btc-2012/ – last accessed August 8, 2013
7http://xmlns.com/foaf/spec/ – last accessed August 8, 2013</p>
      <p>FOAF captures information about people, webpages, and
their relationships. We considered documents as sources for
determining beliefs. Relating to the judge example, we are
the judge, and each document is a witness. We assume every
document asserts that it is telling the absolute truth for each
triple. That is, for each hs, p, o, gi such that d = http(g), d’s
belief of hs, p, oi is h1, 0, 0i (absolute belief). As the judge, we
must determine how to discount these beliefs since we are not
inclined to believe anything absolutely just by mere testimony.</p>
      <p>
        Hogan et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] have already established a notion of
authority for RDF triples based on the sources of the triples,
and so we make use of that. It is important to understand
that whether an RDF triple is authoritative depends on the
relationship between the subject of the RDF triple (that is, s
in hs, p, oi) and the graph name g and/or document URL d.
      </p>
      <p>The general idea is that any RDF triple in which the subject is
a term defined by the source, such an RDF triple is considered
authoritative. The foundation for this notion is laid by concepts
of RDF namespaces and Linked Data principles8, the scope
of which are beyond this particular work.</p>
      <p>The specific rules we used are as follows.
• For any URI u, let nofrag(u) be everything
before the fragment (if any fragment exists). Thus
nofrag(data:abc) = data:abc, and nofrag(data:abc#def) =
data:abc (the “fragment” is everything after and including
the first “#” in a URI). hs, p, o, gi is considered
authoritative iff nofrag(s)=nofrag(g) or nofrag(s)=nofrag(http(g)).
• We transform hs, p, o, gi RDF quads into quints
hs, p, o, d, ai where d = http(g) and a = 1 if hs, p, o, gi is
authoritative and a = 0 otherwise, and we consider only
unique quints. If a = 1, our belief in d for the assertion
of hs, p, oi is h0.9, 0, 0.1i (90% belief), and if a = 0, our
belief in d for the assertion of hs, p, oi is h0.01, 0, 0.99i
(99% uncertainty). The values are chosen somewhat
arbitrarily. Since d’s belief of hs, p, oi is always h1, 0, 0i
and X ⌦ h 1, 0, 0i = X, our belief in d becomes our belief
of d’s assertion of hs, p, oi. (That is, unsurprisingly, our
belief in the source for a particular triple is the same as
our belief in the triple stated by that source.)
• At this point, we have beliefs for every unique
hs, p, o, d, ai, which we will denote b(hs, p, o, d, ai), but
we wish to form some overall belief for each unique
hs, p, oi. This is determined using the consensus operator
. For every hs, p, oi, our belief is</p>
      <p>M
hs,p,o,d,ai2 BT C
b(hs, p, oi) =</p>
      <p>b(hs, p, o, d, ai).</p>
      <p>C. Implementation</p>
      <p>Our evaluation was run using simple Unix commands like
sort, cut, and uniq; and short, custom Perl scripts. This
was simply the easiest path to a preliminary evaluation of
FBMs. In principle, though, the same computation is easily
parallelizable. Let S be a set of data sources (documents),</p>
      <p>8http://www.w3.org/DesignIssues/LinkedData.html – last accessed August
8, 2013
and let W be a function associating our opinions about
data sources in S to those same data sources. Let K (the
“knowledge base”) be a set of opinions from the sources in
S. Then generalizing the previous formula for BTC, we form
our opinion w(R) of a proposition R as:</p>
      <p>M
wA(R)2 K
w(R) =
[W (A) ⌦</p>
      <p>wA(R)].</p>
      <p>To parallelize this computation, simply artibrarily distribute
K to some n processors, that is, K = Sn</p>
      <p>i=01 Ki. Then each
processor can determine its local consensus opinions Ki0 as:</p>
      <p>Ki0 =
(</p>
      <p>Then in order to derive the global consensus opinions, a simple
parallel reduction using the operator is possible by virtue of
the fact that is associative and commutative. For example,
each processor i may have a local consensus opinion wi(R).</p>
      <p>Then the global consensus opinion of R is Lin=01 wi(R),
derived using a parallel reduction. Alternatively, for every
proposition R, a hash function h can be used to redistribute
local consensus opinions among processors so that processor
h(R) mod n has all of {wi(R)}in=01 and the derivation of the
global opinions can be performed as local operations.</p>
      <p>D. Results</p>
      <p>The distribution of the consensus beliefs are illustrated
in Fig. 5. Each hX, Y i point is the number Y of unique
hs, p, oi triples for which our belief is X . The red points
represent authoritative triples, and the blue points represent
non-authoritative triples. For a closer view of the highest
beliefs, refer to Fig. 6.</p>
      <p>Distribu6on"of"Consensus"Beliefs"for"FOAF"Triples"in"BTC"2012"</p>
      <p>Authorita7ve&amp; Non&lt;authorita7ve&amp;
1.E+09&amp;
Fig. 5. Each hX, Y i point is the number Y of unique hs, p, oi triples for
which our belief is X. The red points represent authoritative triples, and the
blue points represent non-authoritative triples.</p>
      <p>We note a good distribution over the range of belief values
(i.e., there are no significant gaps across the horizontal axis)
which suggests that belief as specified herein provides a useful
ranking mechanism. Second, there are a large number of triples
1.E+09&amp;
for which we have very little belief and a large number of
triples for which we have high belief, which indicates that
belief as specified herein is a useful metric for separating
highly believable statements from hardly believable statements
(as long as the underlying assumptions of authority hold and
are meaningful in reality). Third, our most believed triples are
actually non-authoritative, reflecting strong public consensus
about these triples/propositions even without an authoritative
source.</p>
      <p>Diving deeper into the data, it appears that the overall
shape of the charts is caused (at least in part) by
distribution of names, as shown in figure 7. It so happens
that documents often include the names of people
mentioned even if the document is not authoritative for that
person. For example, Jesse may have a document that is
authoritative for statements about j:jesse, and so the
document may say that hj:jesse, foaf:knows, c:cliffi and also that
hc:cliff, foaf:name, “Cliff”i, even though the document is only
authoritative for j:jesse and not c:cliff. We conjecture that
popular persons have their names replicated across relatively
non-authoritative documents which accounts for the high belief
in some non-authoritative triples.</p>
      <p>If we look at only triples with foaf:knows as a predicate,
the disparity in Fig. 8 is quite obvious. Non-authoritative
foaf:knows triples are hardly believed while authoritative
foaf:knows triples are very believed. We conjecture that this
is because the publication behavior of foaf:knows triples is
opposite to that of foaf:name triples. For example, Jesse’s
document may state that hj:jesse, foaf:knows, c:cliffi, but it
does not state that hc:cliff, foaf:knows, j:jessei. Clearly, such
triples of the latter case exist or else no non-authoritative
foaf:knows triples would even exist, but such triples are
uncommon which leads to low belief.</p>
      <p>This work represents the beginning of our investigation,
and indeed more is necessary to verify our conjectures and to</p>
      <p>1&amp;
0.1&amp;
0.2&amp;
0.3&amp;
0.4&amp;
0.6&amp;
0.7&amp;
0.8&amp;
0.9&amp;</p>
      <p>1&amp;
Belief distribution for the 17.3M triples with predicate foaf:knows
0.5&amp;
Belief"
0.5&amp;</p>
      <p>Belief"
find more patterns. Regardless, this preliminary work indicates
that focused belief measures hold promise and that more
investigation is warranted.</p>
      <p>
        The most significant issue is in our use of “dogmatic” base
claims, that is, opinions of the form h1, 0, 0i or h0, 1, 0i,
expressing complete belief or disbelief on the part of the
claimant. In fact, truth claims come in all forms, e.g. “A
believes that p” or “A holds p with 50% probability” or “A
believes that p falls in the range [
        <xref ref-type="bibr" rid="ref10">10, 20</xref>
        ]”, and many other
possible forms involving intervals, distributions, statistical
properties, etc. Being able to map these source claims to FBM
opinions is an important next problem for us.
      </p>
      <p>Other future work includes:
• The current experiments depend on a number of constant
assumptions, as shown above. Parameterization of these
constants will support a sensitivity analysis over this
space of inputs to help determine experimental behavior.
• Discovering more complex categorizations of triples
on the web than merely “authoritative” and
“nonauthoritative” and determining meaningful discounting
opinions for these categorizations.
Fig. 8.
in BTC.
• Taking into account negative assertions (that is, asserting
the falsity of a triple). Such is supported by OWL, but it
is expressed using multiple triples. Our evaluation herein
equated each single triple as a single proposition.
• Implementation of a parallel system for deriving
consensus opinions as described in section III-C.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Bada</surname>
          </string-name>
          , Kevin Livingston, and
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Hunter</surname>
          </string-name>
          .
          <article-title>An ontology representation of biomedical data sources and records</article-title>
          .
          <source>Bio-Ontologies</source>
          <year>2011</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Jeremy</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Carroll</surname>
            and
            <given-names>Graham</given-names>
          </string-name>
          <string-name>
            <surname>Klyne</surname>
          </string-name>
          .
          <article-title>Resource description framework (RDF): Concepts and abstract syntax</article-title>
          .
          <source>W3C recommendation, W3C</source>
          ,
          <year>February 2004</year>
          . http://www.w3.org/TR/2004/REC-rdf-concepts20040210/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Brian</surname>
            <given-names>F</given-names>
          </string-name>
          <string-name>
            <surname>Chellas</surname>
          </string-name>
          .
          <article-title>Modal logic</article-title>
          . Cambridge university press,
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Li</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <surname>Dominic</surname>
            <given-names>DiFranzo</given-names>
          </string-name>
          , Alvaro Graves,
          <string-name>
            <surname>James R. Michaelis</surname>
            ,
            <given-names>Xian</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Deborah L. McGuinness</surname>
          </string-name>
          , and
          <article-title>James A. Hendler. TWC data-gov corpus: incrementally generating linked government data from data.gov</article-title>
          .
          <source>In Proceedings of the 19th International Conference on the World Wide Web</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Pedro</given-names>
            <surname>Domingos</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Lowd</surname>
          </string-name>
          .
          <article-title>Markov Logic: An Interface Layer for Artificial Intelligence</article-title>
          . Morgan and Claypool,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Michel</given-names>
            <surname>Grabisch</surname>
          </string-name>
          .
          <article-title>Belief functions on lattices</article-title>
          .
          <source>Int. J. Intelligent Systems</source>
          ,
          <volume>24</volume>
          :1:
          <fpage>76</fpage>
          -
          <lpage>95</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W3C</given-names>
            <surname>OWL Working</surname>
          </string-name>
          <article-title>Group</article-title>
          .
          <article-title>OWL 2 web ontology language document overview</article-title>
          .
          <source>Technical report, W3C</source>
          ,
          <year>December 2012</year>
          . http://www.w3.org/TR/2012/REC-owl2
          <string-name>
            <surname>-</surname>
          </string-name>
          overview-20121211/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Hayes</surname>
          </string-name>
          .
          <source>RDF semantics. W3C recommendation, W3C</source>
          ,
          <year>February 2004</year>
          . http://www.w3.org/TR/2004/REC-rdf-mt-
          <volume>20040210</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Aidan</given-names>
            <surname>Hogan</surname>
          </string-name>
          , Jeff
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          , Axel Polleres, and
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Decker</surname>
          </string-name>
          .
          <article-title>Saor: template rule optimisations for distributed reasoning over 1 billion linked data triples</article-title>
          .
          <source>In Proceedings of the 9th international semantic web conference</source>
          , pages
          <fpage>337</fpage>
          -
          <lpage>353</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Audun</given-names>
            <surname>Josang</surname>
          </string-name>
          .
          <article-title>A logic for uncertain probabilities</article-title>
          .
          <source>Int. J. Uncertainty, Fuzziness and Knowledge-Based Systems</source>
          ,
          <volume>9</volume>
          :3:
          <fpage>279</fpage>
          -
          <lpage>311</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Cliff</surname>
            <given-names>Joslyn</given-names>
          </string-name>
          , Bob Adolf, Sinan al Saffar,
          <string-name>
            <given-names>J</given-names>
            <surname>Feo</surname>
          </string-name>
          , Eric Goodman, David Haglin,
          <string-name>
            <given-names>Greg</given-names>
            <surname>Mackey</surname>
          </string-name>
          , and
          <string-name>
            <surname>David Mizell.</surname>
          </string-name>
          <article-title>High performance descriptive semantic analysis of semantic graph databases</article-title>
          .
          <source>In Proc. Wshop</source>
          .
          <article-title>on High Performance Computing for the Semantic Web (HPCSW 2011), CEUR</article-title>
          , volume
          <volume>736</volume>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Cliff</given-names>
            <surname>Joslyn</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jane</given-names>
            <surname>Booker</surname>
          </string-name>
          .
          <article-title>Generalized information theory for engineering modeling and simulation</article-title>
          . In E Nikolaidis et al., editor,
          <source>Engineering Design Reliability Handbook</source>
          , pages
          <volume>9</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          . CRC Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Nils</surname>
            <given-names>J</given-names>
          </string-name>
          <string-name>
            <surname>Nilsson</surname>
          </string-name>
          .
          <article-title>Probabilistic logic</article-title>
          .
          <source>Artificial intelligence</source>
          ,
          <volume>28</volume>
          (
          <issue>1</issue>
          ):
          <fpage>71</fpage>
          -
          <lpage>87</lpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Donald</given-names>
            <surname>Nute</surname>
          </string-name>
          .
          <article-title>Defeasible logic</article-title>
          . In Oskar Bartenstein, Ulrich Geske, Markus Hannebauer, and Osamu Yoshie, editors,
          <source>Web Knowledge Management and Decision Support</source>
          , volume
          <volume>2543</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>151</fpage>
          -
          <lpage>169</lpage>
          . Springer Berlin Heidelberg,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Eric</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux and Andy Seaborne. SPARQL query language for RDF. W3C recommendation, W3C</article-title>
          ,
          <year>January 2008</year>
          . http://www.w3.org/TR/2008/REC-rdf
          <string-name>
            <surname>-</surname>
          </string-name>
          sparql-query-
          <volume>20080115</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P</given-names>
            <surname>Walley</surname>
          </string-name>
          .
          <article-title>Towards a unified theory of imprecise probabilities</article-title>
          .
          <source>Int. J. Approximate Reasoning</source>
          ,
          <volume>24</volume>
          :
          <fpage>125</fpage>
          -
          <lpage>148</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Jesse</given-names>
            <surname>Weaver</surname>
          </string-name>
          and
          <string-name>
            <given-names>Paul</given-names>
            <surname>Tarjan</surname>
          </string-name>
          .
          <article-title>Facebook Linked Data via the Graph API</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Gregory</surname>
            <given-names>T</given-names>
          </string-name>
          <string-name>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Jesse Weaver, Medha Atre, and James A Hendler. Scalable reduction of large datasets to interesting subsets</article-title>
          .
          <source>In Billion Triple Challenge, ISWC</source>
          <year>2009</year>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Gregory</given-names>
            <surname>Todd Williams</surname>
          </string-name>
          , Jesse Weaver, Medha Atre, and
          <article-title>James A Hendler. Scalable reduction of large datasets to interesting subsets</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>8</volume>
          (
          <issue>4</issue>
          ):
          <fpage>365</fpage>
          -
          <lpage>373</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>