=Paper=
{{Paper
|id=Vol-1097/STIDS2013_T03
|storemode=property
|title=Focused Belief Measures for Uncertainty Quantification in High Performance Semantic Analysis
|pdfUrl=https://ceur-ws.org/Vol-1097/STIDS2013_T03_JoslynWeaver.pdf
|volume=Vol-1097
|dblpUrl=https://dblp.org/rec/conf/stids/JoslynW13
}}
==Focused Belief Measures for Uncertainty Quantification in High Performance Semantic Analysis==
Focused Belief Measures for
Uncertainty Quantification in
High Performance Semantic Analysis
Cliff Joslyn Jesse Weaver
National Security Directorate Fundamental and Computational Sciences Directorate
Pacific Northwest National Laboratory Pacific Northwest National Laboratory
Seattle, WA 98109 Richland, WA 99354
Email: cliff.joslyn@pnnl.gov Email: jesse.weaver@pnnl.gov
Abstract—In web-scale semantic data analytics there is a great Here we present Focused Belief Measures (FBMs), an adap-
need for methods which aggregate uncertainty claims, on the tation of Jøsang’s subjective logic (SL) [10], as a candidate to
one hand respecting the information provided as accurately as play this role. By modifying DS to focus on specific events
possible, while on the other still being tractable. Traditional
statistical methods are more robust, but only represent distri- in a complex space, FBMs can model logical combinations
butional, additive uncertainty. Generalized information theory of complex beliefs involving imprecision, ambiguity, or total
methods, including fuzzy systems and Dempster-Shafer (DS) ignorance using linear algorithms. This compromise promises
evidence theory, represent multiple forms of uncertainty, but to support the minimal amount of generalized complexity
are computationally and methodologically difficult. We require which may be nonetheless sufficient, but at a maximum of
methods which provide an effective balance between the complete
representation of the full complexity of uncertainty claims in computational efficiency.
their interaction, while satisfying the needs of both computational We begin by introducing FBMs in the context of both SL
complexity and human cognition. Here we build on Jøsang’s and DS. We then demonstrate its utility in a web analytics
subjective logic to posit methods in focused belief measures experiment involving the evaluation of a large RDF graph
(FBMs), where a full DS structure is focused to a single event. The drawn from the 2012 Semantic Web Challenge.
resulting ternary logical structure is posited to be able to capture
the minimally sufficient amount of generalized complexity needed II. F OCUSED B ELIEF M EASURES (FBM S )
at a maximum of computational efficiency. We demonstrate the
efficacy of this approach in a web ingest experiment over the UR methods and formalisms are legion, primarily rooted
2012 Billion Triple dataset from the Semantic Web Challenge. in probability theory, logic, or their combination (e.g. [5]).
For decades, UR researchers have struggled with two com-
I. I NTRODUCTION peting imperatives. On the one hand, traditional UR methods,
Many analytic domains face the problem of determining including probabilistic (statistical) and logical approaches re-
the veracity of claims from multiple sources. The problem quire closed-world assumptions. For probability, we require
can be further complicated by the presence of large numbers knowledge about likelihood distributions over a set of entities
of sources asserting large numbers of propositions over short which are both exhaustive and mutually exclusive in order to
periods of time. Examples include intelligence gathering and guarantee mathematical additivity: summation of all modeled
sensor networks. Such problems are only exacerbated on the probabilities to 1. Representing total uncertainty requires as-
web with the constituent heterogeneity of data, and then again suming a uniform distribution over these choices. Similarly,
especially when brought to web scale. traditional logic represents only the two states for true (A)
While there are various logics for dealing with inconsistency and false (⇠A, here taking ⇠ for negation), according to an
or uncertainty [3], [13], [14], to our knowledge, none have excluded middle axiom.
achieved significant uptake in computational systems for large The large range of GIT methods, including fuzzy systems,
data. Traditional statistical uncertainty representation (UR) DS evidence theory, random sets, imprecise probabilities [16],
models fail to represent complex uncertainty situations requir- and many others, support open world situations by allowing
ing imprecision or other forms of ambiguous judgements. So- some form of third option or remainder between True and
called Generalized Information Theory (GIT) approaches [12] False, or non-additive, imprecise “overlap” between non-
such as fuzzy and Dempster-Shafer (DS) can represent these disjoint options. They do this in different ways, but the basic
complexities, but at the cost of high computational expense. concept is the same, to generalize traditional approaches by
Massive streaming or ingest problems in the semantic web relaxing certain axioms, such as allowing probabilities to
require UR strategies which provide both representation of sum to more or less than 1, or to recognize truth values
ambiguity and computational efficiency. “between” True and False (different kinds of “Maybe”), in
STIDS 2013 Proceedings Page 18
order to represent more complex uncertainty structures other m({A}), and everything in between, including composite
than probabilistic likelihoods. In this way one can represent events like m({A, B}) for “Alice and/or Bob did it”.
inherent vagueness or imprecision of events, or second- or The resultant belief measures
higher-order uncertainties about other uncertainties, reflecting X
the veracity of the information source, the confidence or b : 2⌦ ! [0, 1], b(R) = b(S)
likelihood of claims, the uncertainty about a claim, or other S✓R
open world situations where “something else” we didn’t think on any subset R ✓ ⌦ capture a mixture of likelihood and
about could occur. imprecision, since claims m about subsets R cannot neces-
sarily be disambiguated to knowledge about their constituent
Remainder elements ! 2 R. But considering (the middle of) Fig. 2
compared to Fig. 1, we see that we now need to support the full
A ~A Boolean lattice representing the power set 2⌦ of all subsets.
This comes at a huge computational cost, since we now have
Fig. 1. In this simplest case, generalized information theories support to work with 2n rather than n claims, and moreover their
imprecise choices by always including a “third option” or “remainder”, interaction within the lattice structure.
allowing positive weight to “neither A nor ⇠A.
But Jøsang [10] has noted that if we focus attention to a
particular composite event R ✓ ⌦ (like {A, B}=Alice and/or
These generalized mathematical structures are inherently Bob did it), we can reduce the complexity to just three disjoint
hierarchical, since this “remainder” “stands above” its con- groups of subsets:
stituent choices. Fig. 1 shows the absolute simplest such case, 1) Those (like {A}=Alice did it) completely within R
where “neither A nor ⇠A” is our “third choice” standing above supporting R itself;
A and ⇠A themselves. This remainder can be used to represent 2) Those (like {C}=Carol did it) completely disjoint from
total uncertainty or imprecision when positive weight is given R supporting ⇠ R (now taking ⇠ R = ⌦ \ R for set
to the remainder, but no weight is given to either of the two complement); and
specific choices. 3) The remainder (like {B, C}=Bob and/or Carol did it),
Fig. 1 actually shows a tiny lattice structure, and the providing information contradictory to or ambiguous
resulting methods require lattice-based computations arising with respect to both R and ⇠R.
from non-additivity. For example, classical probability has the These three groups reduce to a single “opinion” vector
condition Pr(A [ B) = Pr(A) + Pr(B) Pr(A \ B), which
is a fully modular function on the subset lattice, allowing w(R) = hb(R), d(R), u(R)i ,
relatively simple calculations. But in non-additive formalisms,
we have Pr(A [ B) ? Pr(A) + Pr(B) Pr(A \ B), which is where in addition to b(R) as the belief of R, we have
sub- or super-modularity [6]. Modularity allows probabilities X
d(R) = b(⇠R) = m(S)
of “bigger” events to be calculated from those of “smaller”, so
S✓⇠R
non-modularity forces a high computational price. In big data,
semantic web environments with massive ingest and streaming as the belief of ⇠ R, that is the disbelief of R1 . Since
input applications, we need methods for representing such b(R), d(R) 2 [0, 1] and b(R) + d(R) 1, this allows us to
hybrid uncertainty, but which are both expressive and tractable. define
Our FBM approach is built on Jøsang’s SL [10], which is in X
turn based on DS. Consider a decision problem, like whether u(R) = m(S) = 1 b(R) d(R)
Alice, Bob, or Carol committed a crime. In probability, we S : ;6=S\R6=S
need P = p(A) + p(B) + p(C) = 1 to hold. If P > 1, to serve elegantly as our generalized remainder, or uncertainty
then we have conflict and have to renormalize; while if P < of R.
1, then we have a remainder, represented as an uncertainty
As so specified, b(R) + d(R) + u(R) = 1. Thus b, d, and
U = 1 P > 0 leftover. In DS theory, we represent general
u exhaust all the options concerning R and ⇠R, but do so
probabilistic uncertainty by giving probabilities m not just to
while including a representation of the “remainder”, u, which
each of these n = 3 disjoint events A, B, C 2 ⌦, but to each
is about “neither R nor ⇠R”. This reflects the fact that while,
of the 2n subsets R ✓ ⌦ of such events. Formally, we have
X 1 We depart from Jøsang in using this formulation for d, rather than his
m : 2⌦ ! [0, 1], m(;) = 0, m(R) = 1. (equivalent) d(R) =
P
S\R6=;,S6✓R m(S). His is both formally more
R✓⌦ complex and conceptually less cogent, since it does not capture the sense
in which b and d represent the overall belief in the set R and its “opposite”.
We identify any R ✓ ⌦ with m(R) > 0 as focal. Since our choice is to cast both b and d as beliefs, just in the set R and
This supports modeling of imprecision and ignorance to- its opposite ⇠ R, and additionally since our formulation does not rely on
anything inherently either “subjective” or “logical”, we choose to identify
gether with likelihood by assigning values to the completely our formulation as focused belief measures rather than Jøsang’s “subjective
imprecise event m(⌦), down to the most precise singletons logic”.
STIDS 2013 Proceedings Page 19
ABC
B .4
A
.3 .4 remainder
.3 u
.1 .4 .2
AB AC BC
.4 AB C=~AB
b d
.2
C A .1 B .2 C
w(AB) = = <.4,.2,.4>
Fig. 2. Proababilities about Alice, Bob, and Carol, and their combinations, need 23 1 = 7 assessments. When focused on {A, B}, we reduce to three.
for any R ✓ ⌦, the two sets R and ⇠R partition ⌦, it is rather • A series discounting operator
the three classes (sets of subsets) D
wA (B) ⌦ wB (S) = bA (B)bB (S),
{S ✓ R}, {S ✓⇠R}, 2⌦ \ ({S ✓ R} [ {S ✓⇠R})
bA (B)dB (S),
E
which partition the power set 2 . It is this third class which
⌦
dA (B) + uA (B) + bA (B)uB (S)
is our remainder.
Consider opinions wA (R), wA (S), and wB (S) as opinions expressing the discounted opinion about a base opinion
from information sources A and B about propositions R and wB (S) in light of another opinion wA (B), which we
S. Also let wA (B) be source A’s opinion of source B. Jøsang take to be A’s opinion about the agent B expressing the
then provides a series of algebraic operators for different opinion wB (S).
combinations, including: Note the tradeoff that FBMs make. We are not representing
• Conjunction the full complexity of all 2n 1 possible combinations required
D by DS; but for any R, or collection of Rs, we are able to
wA (R ^ S) = bA (R)bA (S), directly model R, ⇠ R, and their remainder, and while in
a minimal way, with a maximal amount of computational
dA (R) + dA (R) dA (R)dA (S), efficiency: the huge advantage of these operators is that they
bA (R)uA (R) + uA (R)bA (S)+ are linear in the components b, d, u of the opinion vectors
E
uA (R)uA (S) w. Given that we care about only k such events, then in
realistic cases we have reduced the size of our problem space
and disjunction to 2k ⌧ 2n 2 (that is, to O(k) from order O(2n )). We
D have also vastly improved user comprehensibility, since con-
wA (R _ S) = bA (R) + bA (R) bA (R)bA (S), ceptualizing operations on linear vectors is far less challenging
than the structure of hypercubic Boolean lattices. Thus logical
dA (R)dA (S), combinations of complex situations can be represented easily
dA (R)uA (R) + uA (R)dA (S)+ and cheaply, while still representing our “third option”.
E
This is shown even more strongly in Fig. 3, now the case for
uA (R)uA (S)
n = 4 basic choices, shown in the 4-dimensional hypercube
opinions about two different propositions by the same (Boolean 4-lattice), and displaying the 4 focal sets. If we
source; wished to track all 2n 1 = 15 possible choices, then so
• A parallel consensus operator
be it, but consider instead that we were only interested in the
* k = 3 choices {A}, {A, C}, and {A, B, C}. Then we would
bA u B + bB u A only need to track the 3k = 9 pieces of information in the
wA (R) wB (R) = , opinion vectors
uA + uB uA uB
dA uB + dB uA w(A) = h.1, .4, .5i , w(AC) = h.3, .4, .3i
,
uA + uB uA uB +
w(ABC) = h.6, 0, .4i
uA uB
uA + uB uA uB (note that here we simplify notation so that e.g. ABC =
{A, B, C}). As shown, we’ve replaced the need to store and
expressing the opinion of one proposition R by two compute on a 4-dimensional hypercube in exchange for three
sources A, B; and 2-dimensional hypercubes. FBMs are thus ternary, avoiding
STIDS 2013 Proceedings Page 20
limited binary reasoning with a third category to represent
“complete ignorance”, but also avoiding the full 2n complexity Alice" p"
of the set-based DS.
Alice"is"
believable"
<1,"0,"0>" <0.8,"0,"0.2>"
.4
remainder Judge"
,p" Bob"was"
Bob"
.6 0 convincing"
<0.95,"0,"0.05>"
remainder
ABC D <0,"1,"0>" Cindy"is"
.3 .3 lying"
ABC remainder <0,"0.99,"0.01>"
p"seems"more"
.3 .4 Cindy" p" false"than"true"
.4 <0.8,"0,"0.2>"
AC BD <0.17,"0.79,"0.04>"
.2 AC BD
.5 <0,"0.95,"0.05>"
<1,"0,"0>"
remainder <0,"0,"1>"
A
.1 .1 .4
A BCD
Fig. 4. An FBM problem setup: how to aggregate streaming opinions under
source uncertainty?
Fig. 3. (Left) Example focal structure for n = 4. (Right) Opinion structures
for the three target sets R = A, AC, ABC.
Assume that Alice, Bob, and Cindy actually testify in
order, modeling a streaming ingest operation. Initially, we have
III. DATA I NGEST E XPERIMENT absolutely no knowledge about p, and thus the only valid
We next seek to demonstrate the value, feasability, and choice is the totally uncertain opinion w(p) = h0, 0, 1i. As
tractability of using FBMs in a data ingest experiment on an opinion wX (p) of p by source X arrives, we then update
the semantic web. The experiment reported on below is our opinion as:
provisional, an initial foray into the basic operation of the FBM w0 (p) = w(p) (w(X) ⌦ wX (p)),
approach, but as specifically applied to web-based analytics.
In particular, as will be described in more detail below, we first discounting X’s opinion of p with our opinion of X,
use opinion vectors which are relatively constrained examples, and then aggregating with our prior opinion. We are thus
being unequivocal for basic claims, but always with residual building the consensus opinions as more data arrives from
uncertainty in their aggregation. Further investigation can open more sources, and the only state that needs to be saved is a
the approach up to range over a wider array of values, seeking single opinion for p.
performance and sensitivity analyses. After Alice, the judge’s opinion of p is
A. FBM Problem Setup h0, 0, 1i (h1, 0, 0i ⌦ h0.8, 0, 0.2i) = h0.8, 0, 0.2i.
Consider the following decision problem. Three data Then Bob testifies that p is absolutely false, or alternatively,
sources make claims about certain facts. Alice and Cindy that ¬p is absolutely true. Now the judge’s opinion of p is
assert p, but Bob says ⇠p. The judge must determine whether
he/she believes p is true. The judge generally believes Alice h0.8, 0, 0.2i (h0, 1, 0i ⌦ h0.95, 0, 0.05i) = h0.17, 0.79, 0.04i.
and Bob (Bob more than Alice) and thinks that Cindy is lying. Bob has effectively changed the judge’s mind. Finally, Cindy
An FBM setup for this problem is shown in Fig. 4. First, the testifies that p is absolutely true, but the judge is nearly certain
base claims made by Alice, Bob, and Cindy are unequivocally that she is lying. In the end, the judge’s opinion of p is
either true or fale (Jøsang calls these “dogmatic”), and thus
lack the uncertainty parameter u. We have h.17, .79, .04i (h1, 0, 0i ⌦ h0, .99, .01i) = h.17, .79, .04i.
wA (p) = h1, 0, 0i , wB (p) = h0, 1, 0i , wC (p) = h1, 0, 0i . The judge’s mind did not change at all as a result of Cindy’s
testimony. In the end, the judge is inclined to believe that p
Next, the judge is inclined to believe Alice at about 80%, but is false.
it’s not that her remaining 20% disbelieves Alice, but she is
rather uncertain about Alice, not having further grounds to ei- B. Billion Triple Challenge
ther believe of disbelieve her. We thus have w(A) = h.8, 0, .2i. Over the last 14 years, the Resource Description Framework
The judge finds Bob convincing at 95%, even more than Alice, (RDF) [2] has become a popular knowledge representation
so w(B) = h.95, 0, .05i. Finally the judge is quite sure that with mature research and tools to support it. It also has
Cindy is lying, but there is always the residual possibility associated ontology languages such as RDF Schema (RDFS)
otherwise, so w(C) = h0, .99, .01i. [8] and the Web Ontology Language (OWL) [7] which allow
STIDS 2013 Proceedings Page 21
Triples Number
Any FOAF-related triple 164.3M FOAF captures information about people, webpages, and
With predicate foaf:name 7.8M their relationships. We considered documents as sources for
With predicate foaf:knows 17.3M determining beliefs. Relating to the judge example, we are
TABLE I the judge, and each document is a witness. We assume every
S TATISTICS ABOUT FOAF TRIPLES IN BTC. document asserts that it is telling the absolute truth for each
triple. That is, for each hs, p, o, gi such that d = http(g), d’s
belief of hs, p, oi is h1, 0, 0i (absolute belief). As the judge, we
must determine how to discount these beliefs since we are not
for declarative semantics that support reasoning tasks like inclined to believe anything absolutely just by mere testimony.
inference and consistency checking. There are many efforts Hogan et al. [9] have already established a notion of
to expose data as RDF (e.g., Facebook [17], Data.gov [4], authority for RDF triples based on the sources of the triples,
biomedical [1], etc.)2 , and major companies are employing and so we make use of that. It is important to understand
RDF to allow users to mark up their data (e.g., Open Graph that whether an RDF triple is authoritative depends on the
Protocol3 , schema.org4 , Twitter cards5 ). relationship between the subject of the RDF triple (that is, s
Thus there is an abundance of RDF data which is driving in hs, p, oi) and the graph name g and/or document URL d.
the kinds of data integration challenges we posit FBMs to The general idea is that any RDF triple in which the subject is
be valuable for. Even those which use different ontologies a term defined by the source, such an RDF triple is considered
can be meaningfully unified using a relatively simple “upper authoritative. The foundation for this notion is laid by concepts
ontology” given a basic knowledge of common ontologies of RDF namespaces and Linked Data principles8 , the scope
[19]. For non-RDF data sources, the heterogeneity problem of which are beyond this particular work.
can be solved by providing an RDF or SPARQL [15] interface The specific rules we used are as follows.
on data sources (either on the producer or consumer side, • For any URI u, let nofrag(u) be everything be-
like in [17], which is a coding task) and by providing an fore the fragment (if any fragment exists). Thus
appropriate ontology to give a unified view of the data (which nofrag(data:abc) = data:abc, and nofrag(data:abc#def) =
is a design task needing sufficiently knowledgeable persons or data:abc (the “fragment” is everything after and including
good documentation of the data source). the first “#” in a URI). hs, p, o, gi is considered authorita-
We experimented with this streaming FBM method on a tive iff nofrag(s)=nofrag(g) or nofrag(s)=nofrag(http(g)).
large RDF dataset crawled from the web. The 2012 Billion • We transform hs, p, o, gi RDF quads into quints
Triple Challenge dataset (BTC)6 is a set of RDF quads crawled hs, p, o, d, ai where d = http(g) and a = 1 if hs, p, o, gi is
from the Web for the purposes of challenging competitors to authoritative and a = 0 otherwise, and we consider only
work at scale. BTC was chosen because it represents one of the unique quints. If a = 1, our belief in d for the assertion
best and largest publicly available RDF datasets, and because of hs, p, oi is h0.9, 0, 0.1i (90% belief), and if a = 0, our
of our own past experience working with previous versions of belief in d for the assertion of hs, p, oi is h0.01, 0, 0.99i
it [11], [18]. (99% uncertainty). The values are chosen somewhat
The RDF quads in BTC are RDF triples with an additional arbitrarily. Since d’s belief of hs, p, oi is always h1, 0, 0i
component that we will refer to as the “graph name”. For and X ⌦h1, 0, 0i = X, our belief in d becomes our belief
an RDF quad hs, p, o, gi (where g is the graph name), let of d’s assertion of hs, p, oi. (That is, unsurprisingly, our
d = http(g) be the direct URL of the document retrieved belief in the source for a particular triple is the same as
over HTTP when following g (e.g., when you put g in your our belief in the triple stated by that source.)
browser). In some cases, d = g. However, http(g) can result • At this point, we have beliefs for every unique
in redirection (e.g., HTTP codes 301, 302, 303) in which case hs, p, o, d, ai, which we will denote b(hs, p, o, d, ai), but
d 6= g. Many graph names can map to the same document we wish to form some overall belief for each unique
URL, and these mappings are also captured in the (broader hs, p, oi. This is determined using the consensus operator
sense of the) BTC dataset. . For every hs, p, oi, our belief is
For our experiment, we limited ourselves to quads in BTC M
that utilized terms from the friends-of-a-friend (FOAF) ontol- b(hs, p, oi) = b(hs, p, o, d, ai).
ogy7 . FOAF contains 164.3M overall, including two specific hs,p,o,d,ai2BT C
sub-groups (foaf:knows and foaf:name) which we will C. Implementation
use below (see Table I). Our evaluation was run using simple Unix commands like
2 http://www.semantic-web-journal.net/accepted-datasets – last accessed
sort, cut, and uniq; and short, custom Perl scripts. This
August 8, 2013 – contains an entire list of such datasets. was simply the easiest path to a preliminary evaluation of
3 http://ogp.me/ – last accessed August 8, 2013 FBMs. In principle, though, the same computation is easily
4 http://schema.org/ – last accessed August 8, 2013
parallelizable. Let S be a set of data sources (documents),
5 https://dev.twitter.com/docs/cards – last accessed August 8, 2013
6 http://km.aifb.kit.edu/projects/btc-2012/ – last accessed August 8, 2013 8 http://www.w3.org/DesignIssues/LinkedData.html – last accessed August
7 http://xmlns.com/foaf/spec/ – last accessed August 8, 2013 8, 2013
STIDS 2013 Proceedings Page 22
and let W be a function associating our opinions about Distribu6on"of"Strongest"Consensus"Beliefs"for"FOAF"Triples"in"BTC"2012"
Authorita7ve& Non