=Paper= {{Paper |id=None |storemode=property |title=Scalable Detection of Sentiment-Based Contradictions |pdfUrl=https://ceur-ws.org/Vol-762/paper3.pdf |volume=Vol-762 }} ==Scalable Detection of Sentiment-Based Contradictions== https://ceur-ws.org/Vol-762/paper3.pdf

Scalable Detection of Sentiment-Based Contradictions

Mikalai Tsytsarau Themis Palpanas Kerstin Denecke
University of Trento University of Trento L3S Research Center
Trento, Italy Trento, Italy Hannover, Germany
tsytsarau@disi.unitn.eu themis@disi.unitn.eu denecke@L3S.de

ABSTRACT tral polarity. The information about the contradiction is then lost.
The analysis of user opinions expressed on the Web is becoming in- On the other hand, representative sentiments (which best describe
creasingly relevant to a variety of applications. It allows us to track opposite opinions) are likely to capture the meaning of contradic-
the evolution of opinions or discussions in the blogosphere, or per- tion, but not its level. Therefore, this problem essentially requires
form product surveys. The aggregation of sentiments and analysis a consistent definition and new methods to deal with it.
of contradictions is another important application, which becomes In this paper, we introduce a framework1 that defines the concepts
effective since we are able to capture the diversity in sentiments on of aggregated sentiment, sentiment variance and contradiction with
different topics with more precision and on a large scale. Though, respect to the time dimension, and formulates relevant problems of
there is still a need for a scalable way of sentiment aggregation with contradiction discovery. We say that we have a contradiction when
respect to the time dimension, which preserves enough information there are conflicting opinions for a specific topic, which is a form
to capture contradictions. of sentiment diversity. This kind of contradiction can occur at one
In this paper, we are focusing on the problem of finding sentiment- specific point of time or throughout a certain time period. Further-
based contradictions at a large scale. First, we define two types more, a contradiction can occur within one text when an author
of contradictions, depending on the distributions of opposite sen- presents different opinions on the same topic, or across texts when
timents over time. Second, we introduce a novel measure of con- different authors express different opinions on the same topic. We
tradiction based on the mean value and the variance of sentiments further extend this framework of contradiction detection by focus-
among different texts. Third, we propose a scalable method for ing on its performance and effectiveness for large-scale datasets.
identifying both types of contradictions at different time scales. We Our method operates on sentence-level sentiments, which are rep-
evaluate the performance of our method using synthetic and real- resented in a continuous scale. This allows us to exploit different
world datasets, as well as a user-study. The experiments demon- approaches for sentiment detection, which can be plugged in our
strate the effectiveness of the proposed method in capturing contra- framework. The use of mean and variance for contradiction de-
dictions in a scalable manner. tection allows our method to be fast and linearly scalable on the
number of texts, which is an important feature for large-scale anal-
ysis. Tests on real datasets, as well as a user-study, demonstrate
1. INTRODUCTION that our approach is able to efficiently and effectively identify con-
During the recent years we have been witnessing the Internet be- tradictions.
coming an open platform, where people can express their opinions The main contributions of this work can be summarized as follows.
and can be heard. There are many services that allow people to pub-
● We formally define the problem of contradiction detection, and
lish information and opinions, such as blogs, wikis, forums, social
further describe two variations of the problem, namely, synchronous
networks and others. They all represent a rich source of opinion-
and asynchronous contradictions.
ated information on different topics, which can be analyzed and
exploited in various applications and contexts. Sentiment analy- ● We present an approach for contradiction detection, which is
sis can be used, for example, to learn about a customer’s attitude based on fine-grained sentiment extraction. Moreover, we de-
to a product or its features, or to reveal people’s reaction to some scribe techniques that enable this approach to scale to very large
event. Such problems require a scalable analysis and some form of data collections.
sentiments aggregation to produce a representative result. ● We experimentally evaluate the proposed approach using several
The problem of contradictions, or sentiment diversity on some topic, synthetic and real datasets. The results show the effectiveness
has been studied in the context of different research areas, having and scalability of our solution. In addition, we perform a user-
a slightly varying notion in each case. For instance, in Information study that demonstrates the usefulness of the proposed frame-
Retrieval opposite opinions and sentiments introduce noise to the work.
fact-centric search and must be avoided [14]. In contrast, conflict-
ing sentiments is one of the desired targets of mining of product The remainder of this paper is structured as follows. In Section 2
reviews. Recently proposed methods can aggregate opinions ex- we discuss the related work, and in Section 3 we formally define
pressed in customer reviews and extract a representative summary the problem. We present our approach for detecting and storing
of sentiments on a feature-by-feature basis; or they can capture and contradictions in Section 4 and Section 5, respectively, and the ex-
aggregate sentiments on some topic among different texts [8]. perimental evaluation in Section 6. We discuss our experiences in
Although aggregated sentiments do represent some information on Section 7, and conclude in Section 8.
contradiction, this information may be biased. For example, if two
1
opposite sentiment values are averaged, the result may have a neu- Some preliminary ideas have appeared as a poster [16].
2. RELATED WORK 3. PROBLEM DEFINITION
In the past few years, we have witnessed an increasing research in- The problem we want to solve in this paper is the efficient detection
terest in the area of blog analysis and specifically in opinion mining of contradicting opinions2 (on specific topics).
[13]. Contradiction analysis is a rather new research area. In partic- Usually, a particular source of information covers some general
ular, contradictions in opinions as considered here, have not been topic T (e.g., health, politics) and has a tendency to publish more
addressed before. Harabagiu et al. [6] present a framework for con- texts about one topic than another. Yet, within a text, an author may
tradiction analysis that exploits linguistic information such as nega- discuss several topics. When using the term ’text’ we refer either to
tion or antonymy as well as semantic information, such as types of the entire web document or its individual sentences. With the term
verbs. De Marneffe et al. [3] introduce a classification of contra- sentence we assume a particular piece of text expressing an opin-
dictions consisting of seven types that are distinguished by the fea- ion about a certain topic, which can not be split into smaller parts
tures that contribute to a contradiction (e.g., antonymy, negation, without breaking its meaning. For each of the topics discussed in
numeric mismatches). They define contradictions as a situation some text, we wish to identify the sentiment expressed towards it.
where ’two sentences are extremely unlikely to be true’, and de- In this study, we restrict ourselves to identifying and recording the
scribe a contradiction detection approach to their textual entailment intensity of these sentiments, which we represent as numbers. In
application [12]. Ennals et al. [5] describe an approach that detects the following, we refer to sentiment polarity simply as sentiment.
contradicting claims by checking whether some particular claim
entails (i.e., has the same sense as) one of those that are known to D EFNINITION 1 (S ENTIMENT ). The sentiment S with respect
be disputed. For this purpose, they have aggregated disputed claims to a topic T is a real number in the range [−1, 1] that indicates the
from Snopes.com and Politifact.com into a database. Additionally, polarity of the author’s opinion on T expressed in a text. Nega-
they populated this database by selecting explicit statements of con- tive and positive values represent negative and positive opinions
tradiction or negation from web texts. respectively, while the absolute value of sentiment represents the
The above approaches are based on linguistic analysis and textual strength of the opinion.
entailment. In contrast, our approach is based on statistical princi- Apart from computing sentiments for individual texts, we also need
ples and intended for a large-scale operation, where pairwise com- to compute the polarity on some topic aggregated over multiple
parisons of texts may not be computationally efficient. In addition, texts (that may span different authors, as well as time periods).
we are considering a time dimension for contradiction, which al-
lows us to introduce such new types as, for example, change of D EFNINITION 2 (AGGREGATED S ENTIMENT ). The Aggregated
opinion (asynchronous contradiction). To the best of our knowl- Sentiment µS expressed in a collection of documents D on topic T ,
edge, this problem has not been studied so far. is defined as the mean value over all individual sentiments assigned
Problems related to the identification and analysis of contradictions in that collection. µS is defined on the same range of [−1, 1] as
have also been studied in the context of social networks and blogs. sentiments and calculated as follows: µS = n1 ∑n i=1 Si , where n is
A recent work by Liu et al. [10] introduces a system that allows to the cardinality of D.
compare contrasting opinions of experienced blog users on some
topic. In contrast, we take into account the opinions of all web By comparing the sentiment values of different collections of texts,
users, regardless of their expertise. Clustering accuracy as an in- contradictions are identified as follows.
dicator of blogosphere topic convergence was proposed by [17]. D EFNINITION 3 (C ONTRADICTION ). There is a contradiction
By analyzing how accurate clustering is in different time intervals, on a topic, T , between two groups of documents, D1 , D2 ⊂ D in a
one can estimate how correlated, or diverse, blog topics are. Such document collection D, where D1 ⋂ D2 = ∅, when the information
an approach can also be adapted to opinion contradictions as well, conveyed about T is considerably more different between D1 and
by replacing topic feature vectors by sentiment feature vectors. Our D2 than within each one of them.
work goes beyond trend analysis by automatically recognizing con-
tradictions regarding some topic within and across documents. In the above definition, we purposely not specify exactly what it
Analysis of product reviews is another opinion mining task that is means for a sentiment value to be very different from another one.
close to contradiction analysis. A system for mining the reputation We define contradiction on a pairwise basis, where we evaluate the
of products in the Web is described in [11]. A similar approach disagreement between two groups of documents in a collection. In
is proposed by the Opinion Observer system [9] that focuses on this case, the similarity of information within each group serves as
summarizing the strengths and weaknesses of a particular product. a reference point, providing a basic disagreement level. This defi-
Even though the above studies consider both positive and nega- nition can lead to different implementations, and each one of those
tive opinions, they do not aggregate these two classes. In our ap- will have a slightly different interpretation of the notion of contra-
proach, we describe an effective way for performing this aggrega- diction. We argue that our definition captures the essence of con-
tion, which leads to more insights on the user opinions. tradictions, without trying to impose any of the specific interpre-
Chen et al. [2] study precisely the problem of conflicting opinions tations. Nevertheless, in Section 4, we propose a specific method
on a corpus of book reviews, which they classify as positive and for computing contradictions, which incorporates many desirable
negative. Their main goal is to identify the most predictive terms properties.
for the above classification task, and visualize the results for man- When identifying contradictions in a document collection, it is im-
ual inspection. However, the results are only used to visualize op- portant to also take into account the time in which these documents
posite opinions without further aggregation. It is up to the user to were published. Let D1 be a group of documents containing some
visually inspect the results and draw some conclusions. In con- information on topic T , and all documents in D1 were published
trast, we propose a systematic and automated way of performing within some time interval t1 . Assume that t1 is followed by time
sentiment aggregation, revealing contradictions, and analyzing the interval t2 , and the documents published in t2 , D2 , contain a con-
evolution of these contradictions over time. flicting piece of information on T . In this case, we have a special
2
For the rest of this document we will use the terms sentiment and
opinion interchangeably.
type of contradiction, which we call Asynchronous Contradiction,
since D1 and D2 correspond to two different time intervals. Fol-
lowing the same line of thought, we say that we have a Synchronous
Contradiction when both D1 and D2 correspond to a single time
interval, t.
In order to detect contradicting opinions in collections of texts, we
first need to determine all the different topics and then calculate the Figure 1: Example of two possible sentiment distributions.
corresponding sentiments.
4.2 Measuring Contradictions
P ROBLEM 1 (S INGLE -T OPIC C ONTRADICTION D ETECTION ). In order to be able to identify contradicting opinions we need to
For a given time interval τ , and topic T , identify the time regions of define a measure of contradiction. Assume that we want to look
a predefined size w, where a contradiction level for T is exceeding for contradictions in a shifting time window3 w. For a particular
some threshold ρ. topic T , the set of documents D, which we use for calculation, will
The time interval, τ , is user-defined. As we will discuss later, be restricted to those, that were posted within the window w. We
the threshold, ρ, can either be user-defined, or automatically deter- denote this set as D(w), and n as its cardinality, n = ∣D(w)∣.
mined in an adaptive fashion based on the data under consideration. In this example, a value of aggregated sentiment µS close to zero
We can also determine all the topics in a dataset that are involved implies a high level of contradiction because positive and nega-
in contradictions, as follows. tive sentiments compensate each other. A problem with the above
way of calculating topic sentiment arises when there exists a large
P ROBLEM 2 (A LL -T OPICS C ONTRADICTION D ETECTION ). number of documents with very low sentiment values (neutral doc-
For a given time interval τ , identify topics T , which have high con- uments). In this case, the value of µS will be drawn close to zero,
tradiction level, or large number of contradicting regions above without necessarily reflecting the true situation of the contradiction.
some threshold. Therefore, we suggest to additionally consider the variance of the
The latter problem is interesting if we want to consider the popu- sentiments along with their mean value. The sentiment variance σS2
larity of certain web topics. Frequent contradictions may indicate is defined as follows:
"hot" topics, which attract the interest of the community. Due to 1 n
σS2 = ∑(Si − µS )
2
(1)
space limitations, in this paper we only discuss a solution to the first n i=1
problem, since a solution to the second one is its direct extension.
Though, the approach we propose in this work is general, and can According to the above definition, when there is a large uncertainty
lead to solutions for several other variations of the above problem, about the collective sentiment of a collection of documents on a
such as detection of topics with periodically repeating contradic- particular topic, the topic sentiment variance is large as well.
tions or with the most frequently alternating Aggregated Sentiment. Figure 1 shows two example sentiment distributions. Distribution
A with µS close to zero and a high variance indicates a very con-
tradictive topic. Distribution B shows a far less contradictive topic
4. CONTRADICTION DETECTION with sentiment mean µS in the positive range and low variance. For
Given the problems described before, we propose a three step ap- example, a group of documents with µS close to zero and a high
proach to contradiction analysis, that includes: variance (distribution A on the Figure 1) will be very contradictive,
● Detection of topics for each sentence, and another group with sentiment µS shifted to negative or positive
● Detection of sentiments for each sentence-topic pair, and with low variance is likely to be far less contradictive (distribution
● Analysis of sentiments for topic across multiple texts. B on the Figure 1). We note that neither the mean nor the variance
can be used independently to identify contradictions. For example,
Steps one and two can be achieved using existing methods, or adap-
a fairly large variance among sentiments does not lead to a con-
tations of existing methods. We will refer to these steps as ’prepro-
tradiction when only positive or negative sentiments are present.
cessing’ and describe them briefly in the following. The focus of
Moreover, a zero mean value may occur even when all posts are
this paper is then the contradiction detection approach.
neutral, which once again does not indicate a contradiction. When
4.1 Preprocessing assuming a large number of neutral sentiments in the collection,
we have two opposite trends: the average sentiment moves towards
For identifying topics per sentence, we apply the Latent Dirichlet
zero and sentiment variance decreases. If these trends will com-
Allocation (LDA) algorithm [1], which we extended to work on the
pensate each other, the neutral documents would not affect the con-
sentence level [4]. So sentences are considered as input documents
tradiction value much.
for the LDA and assigned with several most probable topics.
Evidently, we need to combine mean and variance of sentiments in
Then, for each sentence-topic pair we assign a continuous senti-
a single formula for computing contradictions. Then, the contra-
ment value in the range [-1;1] that indicates a polarity of the opinion
diction value C can be computed as:
expressed regarding the topic. For the sentiment assignment step,
we use an existing tool for fine-grained opinion analysis [7]. Nev-
ertheless, this tool can be replaced by any other suitable one that σS2
C= (2)
calculates continuous sentiment values at a sentence level. Then (µS )2
we average sentiments over text’s sentences having the same topic,
to get one sentiment value for each topic in a text. where µS is squared so that its units are the same as of σS2 .
Based on the analysis described so far, we can now describe our ap- This formula captures the intuition that contradiction values should
proach for contradiction detection with respect to different topics. be higher for topics whose sentiment value is close to zero, and
In the following paragraphs, we first propose a novel contradiction sentiment variance is large. Nevertheless, the contradiction values
measure, and then describe two simple approaches aiming at de- 3
Without loss of generality, in this work we consider windows of
tecting contradictive periods in time. days, weeks, months, and years.
generated by this formula are unbounded (i.e., they can grow arbi-
trarily high as µS approaches zero), and does not account for the
number of documents n. This latter point is important, because in
the extreme where D(w) contains only two documents with op-
posite values, C will be very high, and will compare unfavorably
to the contradiction value of a different set of T documents with a
much higher cardinality.
Incorporating to the contradiction formula the observations made
above, we propose the following final formula for computing con-
tradiction values:
ϑ ⋅ σS2
C= W (3)
ϑ + (µS )2
In the denominator, we add a small value, ϑ ≠ 0, which allows to
limit the level of contradiction when (µS )2 is close to zero. The
nominator is multiplied by ϑ to ensure that contradiction values
fall within the interval [0; 1]. Figure 2(c) shows how a contra-
diction value depends on ϑ in the denominator. Smaller ϑ values
emphasize contradiction points with µS close to zero, for example
changes of opinion. Larger ϑ values mask this difference, making
levels of contradictions more equal. In this study, we used a value
of ϑ set at 5% of the expected value of squared sentiment mean,
which was effective for its purpose, exhibiting a stable behavior
across datasets, without distorting the final results. Figure 2: Example of contradiction values computed
W is a weight function aiming to compensate the contradiction from a synthetic dataset with two planted contradictions.
value for the varying number of documents that may be involved
in the calculation of C. The weight function is defined as: 1
sentiments value

0.5

n − n −1 0

W = (1 + exp( )) (4) -0.5
β -1
1
sentiments mean

where the constant n reflects the average number of topic docu- 0.5

ments in the window, and β is a scaling factor. This weight func- 0

tion provides a multiplicative factor in the range [0; 1] Using W -0.5

-1
we can effectively limit C when there is a minor number of docu- 0.3
sentiments variance

ments, as well as when this same number of documents increases 0.2

significantly. What W achieves is essentially a normalization of 0.1
the contradiction values across different sets of documents, allow- 0
ing them to be meaningfully compared to each other. 0.2
contradiction value

with neutral without neutral
Figure 2 shows the operation of the proposed contradiction func-
0.1
tion. To demonstrate this, we generated a time series of sentiments
for a period of 8000 time units composed of 8000 normally dis- 0
0 200 400 600 800 1000
tributed points, half of which follow a custom trend with disper- time
sion 0.125 and another half with dispersion 0.25 and median 0 is
acting like noise. Time stamps of all points followed the Poisson Figure 3: The effect of neutral sentiments on contradiction.
distribution with parameter λ = 1 time units. We have chosen these
distributions because they are simple but still resemble the real data. sponding to a change of sentiment that manifests itself across the
The graph at the top (Figure 2(a)) shows generated sentiments. The entire dataset.
bold line in this graph depicts the custom trend, showing an initial Subjective sentences take a considerably small part in the text when
positive sentiment that later changes to negative (at time instance compared to objective statements. So neutral sentiments usually
t1 ), which represents a change of sentiment. There is also a point shift the aggregate sentiment towards zero, masking contradictions.
around time instance t2 , where the sentiments are divided between Our contradiction formula is designed to compensate such effects
positive and negative, a situation representing a simultaneous con- by exploiting the sentiment variance.We demonstrate such behavior
tradiction. Using this dataset, we verify the ability of the C function on another synthetic dataset shown in Figure 3. The bottom graph
to capture the planted contradictions. shows that the proposed formula can successfully identify the main
As can be seen in Figure 2(b), µS closely captures the aggregate contradicting regions, both with or without neutral sentiments.
trend of the raw sentiments. The following two graphs in the figure
show the contradiction value, calculated using a sliding window of
size 500 and 1000 time units. When we use a window of small size 5. STORING CONTRADICTIONS
(Figure 2(c)), C correctly identifies the two contradictions at points So far we have described a technique for processing web docu-
t1 and t2 , where the values of C are the largest. Using a larger ments to extract sentiments on various topics, and subsequently to
window has a smoothing effect in the values of C (Figure 2(d)). use this information in order to identify contradictions. But our
Nevertheless, we can still identify long-lasting contradictions: In final goal is to identify contradictions in large collections of docu-
this case, the largest value of C occurs at time instance t1 , corre- ments, what requires scalable methods. To this end, we demon-
strated the need to analyze sentiment information on each topic
across different time windows. Assuming this requirement, scal-
ability may be achieved by storing pre-computed values for win-
dows of different size. We now turn our attention to the problem of
organizing all these data in a way that will allow the efficient de-
tection of contradictions in large collections of data that span very
long time intervals.
An important observation is that the Formula 3 that calculates the
contradiction values is based on the mean and variance of the topic
sentiment. Remember that aggregated sentiment and sentiment Figure 4: Logical representation of the TimeTree.
variance can be written as the following:
5.2 Querying for Contradictions
1 n 1 n 1 n
µS = ∑ Si ; σS2 = ∑(Si − µS )2 = ∑ Si2 − µ2S When trying to detect contradictions, we would like to identify
n i=1 n i=1 n i=1 those that have a contradiction value above some threshold. The
intuition is that these contradictions are going to be more interest-
In the formula above, n is the number of documents published on ing than the rest in the same time interval. An obvious solution
topic T in a specific time window (see Definition 2). in this case is to define some fixed threshold, ρ, and only report
We now define the first- and second-order moments of the topic the contradictions above this threshold. We refer to this solution as
sentiment as M1 = ∑n i=1 Si and M2 = ∑i=1 Si , respectively. Based
n 2
fixed threshold. However, by adopting the above solution, we can-
on the above discussion, and using the sums M1 and M2 , we can not normalize the threshold to better fit the nature of the data within
rewrite Formula 3 as follows: each time window (that may vary over time and across topics).
nM2 − M12 In order to address this problem, we propose an adaptive threshold
C= W (5) technique, which computes a different threshold for each topic and
ϑn2 + M12
time window as follows. The adaptive threshold ̺w for a topic T
The above form of the contradiction values formula gives us ad- in time window w is based on the contradiction value Cwp that has
ditional flexibility, since we can now compute the contradiction of been calculated for T in the parent time window of w, wp , and is
a large time window by composing the corresponding values from defined for each time window and topic as ̺w = p ⋅ Cwp , 0 < p < 1.
the smaller windows contained in the large one. We can therefore In our experience with real datasets, p values between 0.5 − 0.7
build data structures that take advantage of this property. work well. In this work, we use p = 0.6.
In the next paragraphs, we describe such a data structure, and we Note that we cannot achieve the same result by using top-k queries
show how it can be used to identify contradictions. We also demon- (though, they can be complementary to our approach). The reason
strate that it can be easily maintained in an incremental fashion is that adaptive threshold does not impose a strict limit on the num-
when new documents are added in the system. ber of contradictions in the result, and can thus report the entire set
of interesting contradictions within some time interval.

5.1 TimeTree for Contradictions 5.3 Updating the Contradictions
The need to analyze contradictions at different time granularities As discussed earlier, the nature of the contradiction function (For-
predicts a hierarchical structure for contradiction storage. There is a mula 5) and the TimeTree nodes allows us to incrementally main-
number of ways to organize contradiction values by time. The first tain the TimeTree in the presence of updates. When new collec-
solution is to store a time-tree structure for each topic separately. tions or individual documents are analyzed, their contribution to
It allows to achieve a scalability on the number of topics, and has the contradiction of the corresponding topics and time windows in
a good performance when looking for contradictions at a single the TimeTree can be easily taken into account by updating the set
topic, but also brings larger update costs, because for each text the of relevant {n, M1 , M2 } values in the nodes of the tree.
storage needs to be parsed as many times as there are topics in that In order to reduce update costs, we propose first to accumulate sev-
text. Also it makes all-topic queries extremely ineffective, because eral updates and then submit them in a batch. When new documents
for each topic we need to navigate through a time structure to find arrive, as a preprocessing step, they are aggregated in time windows
the right interval. The second solution that we propose is to store of the finest granularity of the TimeTree. Then, these aggregated
contradiction values for different topics under the same time-tree values are used to update the counts and topic sentiment moments
structure. of all TimeTree nodes containing respective time windows.
We introduce the TimeTree for managing the information on sen- The update cost for each batch of aggregated documents depends
timents and contradictions. The TimeTree is organized around the on the depth of the TimeTree, d, and the number of topics, ∣T ∣ (in
sentiment moments, M1 and M2 , and a hierarchical segmentation the worst case), that participate in the time windows relevant to the
of time, as outlined in Figure 4. In this example, the time windows update. Thus, the complexity can be expressed as O(d ⋅ ∣T ∣)
are organized on days, weeks, months, and years (though, other hi-
erarchical time decompositions are applicable as well). Using this 6. EXPERIMENTAL EVALUATION
kind of structure, we can answer queries on adhoc time intervals, As mentioned earlier, the contradiction detection problem has not
by dynamically computing the contradiction values based on For- been considered before. Therefore, no annotated data set is avail-
mula 5. In the following, we will refer to the levels of the TimeTree able to measure the quality of the proposed approach in terms of
as the different granularities of the time decomposition, the root accuracy. Anyway, we applied the algorithm to real world data sets
node having granularity 0. and run several experiments with settings and results described in
Each node in the TimeTree corresponds to a time window, and sum- this section. The objectives of these experiments are to: Analyze
marizes information for all documents, whose timestamp is con- the quality of the approach; Study its usefulness from a user per-
tained in this time window. spective; Study the performance of the introduced approach.
6.1 Corpus Description 1

sentiments value
0.5

Our algorithms are applied to a data set of drug reviews collected 0

from the DrugRatingz website4 , a data set of comments to YouTube -0.5

videos from L3S [15] and a dataset with comments on postings -1
0.2

from Slashdot, provided for the CAW2 workshop5 .

sentiments mean
0.1

The first dataset contains 2701 positive, 352 neutral and 1616 neg- 0

ative reviews for 477 drugs. These reviews are provided by persons -0.1

-0.2
that took a specific drug. They describe their personal experience 0.1

sentiments variance
with the drug including contra-indications that occurred. 0.08
0.06
The second dataset contains approximately 6 million comments to 0.04

YouTube videos, with an average number of comments per each 0.02
0
video of five hundred. Unlike texts in review datasets which usually 0.03

contradiction value
1
contain opinions specific to a topic, some of these comments con- 0.02

tain information irrelevant to a topic, thus introducing extra noise 0.01

to sentiment detection. 0
Our third dataset, Slashdot, is from a popular website for people Sep, 05 Oct, 05 Nov, 05 Dec, 05 Jan, 06 Feb, 06 Mar, 06
date
Apr, 06 May, 06 Jun, 06 Jul, 06 Aug, 06 Sep, 06

interested in reading and discussing about technology and its ram-
ifications. It publishes short story posts which often incite many PRO: It would be helpful for restricting the flow of information, which is a
readers to comment on them and provoke discussions that may trail double edged sword.
PRO: I suppose we better wrap a firewall around our country and not let
for hours or even days. It contains about 140,000 comments under those damn foreigners access to our internet.
496 articles, covering the time period from August 2005 to Septem- CONS: And what exactly does a neutral Internet do? It takes away the right
ber 2006. Compared to usually brief comments on YouTube videos, of anyone who lays down the wires or installs the access points to control
comments from the latter dataset may span for several paragraphs what goes through their network. My point: don’t complain about taking
rights away when you advocate to take rights away.
and typically contain many objective statements. CONS: While it sounds like a decent idea, I’m really all for the whole
uncensored and unregulated internet. I really like my internet the way it is.
6.2 Evaluation of Contradictions CONS: Sure, they can ruin Internet inside USA, but the rest of the world
couldn’t care less.
We now apply the introduced contradiction analysis approach to CONS: We don’t need the FCC regulating the Internet. Not for "neutrality"
our datasets. In Figure 5, the top graph depicts the raw sentiment or any other excuse someone can think of.
values for the topic "internet government control" taken from the
Slashdot dataset, for the time interval September 2005 to Septem- Figure 5: Mean, variance and contradiction values of senti-
ber 2006. The following graphs show the aggregated sentiment ments for the topic "Internet government control".
and variance (two middle graphs), and contradiction values (bot-
tom graph) for the above topic and time interval. Contradiction
values have been calculated using a time window of ten days. Note 6.3 Evaluation of Usefulness
that contradiction values are high for the time windows where topic In the following paragraphs we describe a user study which we
sentiment is around zero and variance is high, which translates to conducted in order to evaluate the effectiveness and usefulness of
a set of posts with highly diverse sentiments. These situations are our approach for the task of contradiction discovery.
not easy to identify either with a quick visual inspection of the raw In our usefulness evaluation, we used four datasets corresponding
sentiments, aggregated sentiments or sentiment variance. to opinionated posts for four topics extracted from three diverse
The analysis shows that in this time interval there is one major con- real datasets (refer to Table 1). For each topic, we selected a vary-
tradiction (marked 1 in the bottom graph of Figure 5). This contra- ing number of posts, spanning in time from one to almost three
diction discusses the pros and cons of a law that would give the gov- years. The shortest list contained 60 posts, and the largest about
ernment more power in controlling the internet traffic, especially 480. Moreover, the quality of posts for topics also differed a lot.
personal correspondence. Minor peaks in contradiction level here The drug review datasets contained primarily brief and concise
correspond to the discussion of a possible transfer of jurisdiction opinions about drugs; Slashdot topics featured large and detailed
and control over top-level domains to United Nations. The table comments, with an average size of several paragraphs; YouTube
below shows extracts from several opposing posts that contributed comments were, on the contrary, short and often off-topic.
to this contradiction. By taking a closer look at the corresponding The group of users consisted of eight persons (PhD students at the
weblog posts, we find out that the discussion is about restricted in- University of Trento), and the experiment was conducted as fol-
ternet access and its advantages, while other contradictions contain lows. Users were asked to detect groups of contradicting posts
a general discussion on the possibility of organizing the content by for each of the topics in the above datasets (and label the posi-
several top-level domains and restricting access to them. tive and negative posts). We provided users with a web application
Another example of contradicting posts may be observed in Fig- that featured two approaches to help them identify time-intervals
ure 6, which illustrates conflicting opinions for the topic "Yaz"6 for with potentially contradicting posts (see Figure 6): The first ap-
a selected time interval. In this case, there was an opinion disagree- proach (marked as "stage 1" in the figure), based on the visualiza-
ment on the effectiveness and possible side-effects of this drug. tion method proposed by Chen et al. [2], displays to users the in-
Evidently, all the discovered contradictions correspond to discus- tensity over time of the positive and negative sentiments expressed
sions expressing different points of view on the same topic, and in the posts (Figure 6(a)). The second approach (marked as "stage
having an automated way of identifying them can be very useful. 2" in the figure) is based on the method proposed in this study, and
displays to users a graph that marks the time points at which contra-
4
http://drugratingz.com dictions were automatically detected (Figure 6(b)). Using our tool,
5
http://caw2.barcelonamedia.org/ the users could see the time intervals that our tool had identified as
6
Yaz is a drug for contraception contradictory, and could therefore, focus their exploration in these
Dataset Topic name Size ∆D ∆T ∆N P1 P2 ∆P
a) Average positive and negative sentiments (Stage 1)
Drug Ambien 60 1.50 0.60 0.88 0.70 0.81 1.20
Ratingz Yaz 300 1.58 0.93 0.78 0.75 0.95 1.32
Slashdot Int. control 159 1.17 0.89 0.58 0.37 0.63 2.14
YouTube Zune HD 472 2.07 0.68 0.62 0.36 0.61 2.09
Average 1.58 0.77 0.72 0.55 0.75 1.69
b) Contradiction level (Stage 2)
Table 1: Evaluation results for different topics.

volved many posts. In this case, going through the posts was not
c) Sentiments annotated by users (from the log data) easy, and our approach allowed users to focus their search and iden-
tify the contradicting posts.
Finally, in Table 1 we report an additional measure of usefulness:
since both approaches aim at guiding the users to the time-intervals
that are most promising for containing contradictions, we com-
puted the percentage, P1 and P2 , of the examined time-intervals
d) Texts from a selected time intervals (Stages 1 and 2)
that led to the identification of a contradiction, as well as the im-
provement of our approach when compared to the alternative, ∆P =
P2 /P1 . Even though the approach by Chen et al. [2] (stage 1) was
not designed with this measure in mind, in the case of our approach,
this measure is indicative of its precision since it measures how
many of the automatically identified contradictions were real ones
(i.e., verified by the users). The results show that our approach was
Figure 6: Annotation page for the dataset "Yaz" demonstrating always more successful in suggesting to users time-intervals that
opposite opinions. contained contradictions, with an overall average success rate of
75%, and as high as 95% (topic "Yaz").
The above results demonstrate that our approach can successfully
regions. Figure 6(d) shows some posts in a time-interval, which identify contradictions in an automated way, and quickly guide
have been marked with positive (green) and negative (red) senti- users to the relevant parts of the data.
ments. These sentiments values are also illustrated in the overall
time-line, depicted in Figure 6(c). In order not to favor any of the 6.4 Evaluation of Scalability
two approaches, in our experiments we alternated the approach re-
We evaluate the scalability of the TimeTree for solving Problems 1
quired to be completed first.
and 2, using a relational database implementation, where informa-
For both approaches, we measured the average time, T1 and T2 ,
tion is stored in a single table that contains contradiction values for
and the average number of time-intervals examined by the users
each topic with respect to time intervals of different granularities.
during the search, N1 and N2 , needed to identify a single contra-
This implementation leads to simple and efficient SQL queries for
diction. Additionally, we asked users to rate the overall difficulty,
detecting interesting contradictions. Remember that in the topic
D1 and D2 , of completing the task when using each one of the two
contradiction problem (Problem 1) we want to identify the contra-
approaches, according to the following scale: 1- very difficult; 2 -
dictions and corresponding time windows of a single topic within
somewhat difficult; 3 - normal; 4 - somewhat easy; 5 - very easy.
some time interval, while in the all topic contradictions problem
The aggregated results (averaged over all the users) of our evalua-
(Problem 2) we are interested in doing the same for all topics.
tion are reported in Table 1. We report the improvements7 we mea-
During this study, parameters of the contradiction formula were
sured when our approach was used (stage 2), compared to the al-
at their default values as described in Section 4. Changing for-
ternative approach (stage 1), computed as follows: ∆D = D2 /D1 ,
mula’s parameters will enlarge or reduce the number of contradic-
∆T = T2 /T1 , and ∆N = N2 /N1 .
tions being detected, but the computational efficiency will be the
We observe that when users employed our approach in order to de-
same. Performance of our approach does not depend on the value
tect contradictions, they were able to identify contradictions faster,
of threshold because we are not storing pre-computed contradiction
requiring 23% less time on average (ranging between 7% and 40%).
values, and so the database is unable to apply indices or filtering on
The biggest improvement was for the topic "Ambien"8 (∆T =
this parameter. Fixed and adaptive threshold approaches, however,
0.60), which had a few contradicting posts visible using our ap-
return slightly different sets of contradictions. The first one returns
proach, but otherwise hard to discover. Our approach also led to a
largest contradictions themselves, and the second returns contradic-
reduction by 28% of the time-intervals examined in order to iden-
tions that are greater than p-times values of their respective parent
tify contradictions (ranging between 12% and 42%). The largest
intervals. The value of p was empirically set at 0.6 to return a re-
reductions were observed for the topics "Zune HD" and "Internet
sult set with an average size equal to the one when using a fixed
Control" (∆N = 0.62 and 0.58, respectively), which contained
threshold. This allows us to compare the relative performance of
several posts that did not take a position, or were off topic. The av-
both methods.
erage difficulty ratings were also favorable for our approach, which
To test the performance of our solutions, we generated sets of 25
was consistently being marked as more helpful. This difference was
single-topic and all-topics queries (corresponding to the Topic and
most pronounced for the "Zune HD" topic (∆D = 2.07), which in-
Time Interval Contradictions problems, respectively), using granu-
7 larities and topic ids drawn uniformly at random. In these exper-
We omit presenting the detailed results for all parameters mea-
sured and each approach due to lack of space. iments, we used 1,000 topics. We measured the time needed to
8
Ambien is a drug for treating insomnia execute these queries against the database as a function of the time
1⋅103 1⋅105 or without neutral sentiments, allowing it to incorporate sentiment
time, ms

time, ms
Query 1: time interval test Query 2: time interval test
detection algorithms of different types.
1⋅102
As was mentioned previously, to build the contradiction formula
we used such values as mean and variance. We believe that the
1⋅104
effectiveness of our approach increases with the growing scale, re-
1⋅101
lying on the fact that representativeness of statistical metrics also
time interval, days time interval, days
increases when larger number of samples is involved in computa-
1⋅100 1⋅103
200 400 800 1600 200 400 800 1600
tion. Moreover, tests on the synthetic data proved our formula’s
stable behavior in the presence of noise.
1⋅103 1⋅106
Query 1: granularity test Query 2: granularity test Finally, we note that we are aware that the evaluation of our (and
1⋅102 1⋅105
related) approach to contradiction detection is still limited with re-
spect to the precision and recall measures. The main reason for this
1⋅101 1⋅104 is the absence of a benchmark dataset, and the difficulty in creating
one. We are currently working toward such a dataset, suitable for
1⋅100 1⋅103
time, ms

time, ms

testing different algorithms in this area.
granularity, days granularity, days
1⋅10-1 1⋅102
1 10 100 1000 1 10 100 1000
8. CONCLUSIONS
dbms adaptive dbms fixed In this paper, we proposed an approach to detect contradictions in
documents, which is the first general and systematic solution to
Figure 7: Scalability of single-topic and all-topics queries. the problem. The experimental evaluation, with synthetic data and
three diverse real-world datasets, as well we the user-study, demon-
strate the applicability and usefulness of the proposed solution.
We are currently working on extending our approach so that it can
interval, τ , and the granularity of the time windows (Figure 7). We
work in an online mode. This will enable us to continuously moni-
report results for both the fixed and the adaptive thresholds.
tor opinions in real-time.
The adaptive threshold queries require in all cases more time since
the threshold in this case has to be computed based on the contra-
diction value of the parent time window, which incurs more compu- 9. REFERENCES
tation. This difference is pronounced for the database implementa- [1] D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty. Latent dirichlet
tion, because it involves an extra join for obtaining the parent time allocation. JMLR, 3, 2003.
window. [2] C. Chen, F. Ibekwe-SanJuan, E. SanJuan, and C. Weaver. Visual anal-
ysis of conflicting opinions. In IEEE Symposium on Visual Analytics
We observe that both single-topic and all-topics queries (see Fig-
Science and Technology, pages 59–66, 2006.
ures 7(a-b)) scale linearly with the size of τ . This confirms our [3] M. C. de Marneffe, A. N. Rafferty, and C. D. Manning. Finding con-
analytic results, and is explained by the fact that the queries have tradictions in text. In ACL-08: HLT, pages 1039–1047, 2008.
to return contradictions for all time windows (of a specific granu- [4] K. Denecke and M. Brosowski. Topic detection in noisy data source.
larity) that are contained in τ . For single-topic queries with fixed In ICDIM, pages 50–55, 2010.
threshold, the database is able to use all its indices (i.e., on topic [5] R. Ennals, B. Trushkowsky, and J. M. Agosta. Highlighting disputed
claims on the web. In WWW, pages 341–350, 2010.
id, time windows, and granularity) to answer the queries, therefore,
[6] S. Harabagiu, A. Hickl, and F. Lacatusu. Negation, contrast and con-
achieving very fast response times. tradiction in text processing. In AAAI, pages 755–762, 2006.
Figures 7(c-d) depict the time results when we vary the granularity [7] R. Johansson and A. Moschitti. Reranking models in fine-grained
of the time windows specified by the queries. Increasing the granu- opinion analysis. In COLING, pages 519–527. ACL, 2010.
larity translates to larger time windows (i.e., moving up in the time [8] K. Lerman, S. Blair-Goldensohn, and R. Mcdonald. Sentiment sum-
hierarchy) and a smaller number of time windows for the same time marization: Evaluating and learning user preferences. In EACL, pages
514–522, 2009.
interval. Thus, response times get lower.
[9] B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and com-
paring opinions on the web. In WWW, pages 342–351. ACM, 2005.
[10] J. Liu, L. Birnbaum, and B. Pardo. Spectrum: Retrieving different
7. DISCUSSION points of view from the blogosphere. In ICWSM, pages 114–121,
2009.
The problem considered in this paper is new, in the sense that it [11] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima. Mining
considers contradictions on the large scale, while taking time into product reputations on the web. In KDD, pages 341–349, 2002.
account (i.e., we consider the timestamps of the texts, as opposed to [12] S. Pado, M.-C. de Marneffe, B. MacCartney, A. N. Rafferty, E. Yeh,
treating the text collections as sets). An approach that relies upon and C. D. Manning. Deciding entailment and contradiction with
sentiment information and that exploits data engineering methods stochastic and edit distance-based alignment. In TAC, 2008.
[13] B. Pang and L. Lee. Opinion mining and sentiment analysis. Founda-
to detect such contradictions in texts at a large scale has been intro- tions and Trends in Information Retrieval, 2(1-2):1–135, 2008.
duced and evaluated. [14] E. Riloff, J. Wiebe, and W. Phillips. Exploiting subjectivity classifi-
The evaluation of our approach on various datasets proved its abil- cation to improve information extraction. In AAAI, pages 1106–1111,
ity of discriminating highly contradicting regions provided with a 2005.
sequence of sentiments on some topic. Being scalable and com- [15] S. Siersdorfer, S. Chelaru, W. Nejdl, and J. San Pedro. How useful are
putationally efficient, it can serve as a preliminary step for more your comments?: analyzing and predicting youtube comments and
comment ratings. In WWW, pages 891–900. ACM, 2010.
sophisticated contradiction analysis, identifying the most interest- [16] M. Tsytsarau, T. Palpanas, and K. Denecke. Scalable discovery of
ing points for further processing. contradictions on the web. In WWW, pages 1195–1196, 2010.
An important feature of our contradiction detection method is its [17] I. Varlamis, V. Vassalos, and A. Palaios. Monitoring the evolution
ability to operate on data with neutral sentiments. The contradic- of interests in the blogosphere. In ICDE Workshops, pages 513–518,
tion formula we propose shows almost the same performance with 2008.