=Paper= {{Paper |id=None |storemode=property |title=Scalable Detection of Sentiment-Based Contradictions |pdfUrl=https://ceur-ws.org/Vol-762/paper3.pdf |volume=Vol-762 }} ==Scalable Detection of Sentiment-Based Contradictions== https://ceur-ws.org/Vol-762/paper3.pdf
     Scalable Detection of Sentiment-Based Contradictions

                 Mikalai Tsytsarau                          Themis Palpanas                          Kerstin Denecke
                  University of Trento                       University of Trento                   L3S Research Center
                     Trento, Italy                              Trento, Italy                        Hannover, Germany
            tsytsarau@disi.unitn.eu                       themis@disi.unitn.eu                      denecke@L3S.de



ABSTRACT                                                                  tral polarity. The information about the contradiction is then lost.
The analysis of user opinions expressed on the Web is becoming in-        On the other hand, representative sentiments (which best describe
creasingly relevant to a variety of applications. It allows us to track   opposite opinions) are likely to capture the meaning of contradic-
the evolution of opinions or discussions in the blogosphere, or per-      tion, but not its level. Therefore, this problem essentially requires
form product surveys. The aggregation of sentiments and analysis          a consistent definition and new methods to deal with it.
of contradictions is another important application, which becomes         In this paper, we introduce a framework1 that defines the concepts
effective since we are able to capture the diversity in sentiments on     of aggregated sentiment, sentiment variance and contradiction with
different topics with more precision and on a large scale. Though,        respect to the time dimension, and formulates relevant problems of
there is still a need for a scalable way of sentiment aggregation with    contradiction discovery. We say that we have a contradiction when
respect to the time dimension, which preserves enough information         there are conflicting opinions for a specific topic, which is a form
to capture contradictions.                                                of sentiment diversity. This kind of contradiction can occur at one
In this paper, we are focusing on the problem of finding sentiment-       specific point of time or throughout a certain time period. Further-
based contradictions at a large scale. First, we define two types         more, a contradiction can occur within one text when an author
of contradictions, depending on the distributions of opposite sen-        presents different opinions on the same topic, or across texts when
timents over time. Second, we introduce a novel measure of con-           different authors express different opinions on the same topic. We
tradiction based on the mean value and the variance of sentiments         further extend this framework of contradiction detection by focus-
among different texts. Third, we propose a scalable method for            ing on its performance and effectiveness for large-scale datasets.
identifying both types of contradictions at different time scales. We     Our method operates on sentence-level sentiments, which are rep-
evaluate the performance of our method using synthetic and real-          resented in a continuous scale. This allows us to exploit different
world datasets, as well as a user-study. The experiments demon-           approaches for sentiment detection, which can be plugged in our
strate the effectiveness of the proposed method in capturing contra-      framework. The use of mean and variance for contradiction de-
dictions in a scalable manner.                                            tection allows our method to be fast and linearly scalable on the
                                                                          number of texts, which is an important feature for large-scale anal-
                                                                          ysis. Tests on real datasets, as well as a user-study, demonstrate
1.    INTRODUCTION                                                        that our approach is able to efficiently and effectively identify con-
During the recent years we have been witnessing the Internet be-          tradictions.
coming an open platform, where people can express their opinions          The main contributions of this work can be summarized as follows.
and can be heard. There are many services that allow people to pub-
                                                                          ● We formally define the problem of contradiction detection, and
lish information and opinions, such as blogs, wikis, forums, social
                                                                            further describe two variations of the problem, namely, synchronous
networks and others. They all represent a rich source of opinion-
                                                                            and asynchronous contradictions.
ated information on different topics, which can be analyzed and
exploited in various applications and contexts. Sentiment analy-          ● We present an approach for contradiction detection, which is
sis can be used, for example, to learn about a customer’s attitude          based on fine-grained sentiment extraction. Moreover, we de-
to a product or its features, or to reveal people’s reaction to some        scribe techniques that enable this approach to scale to very large
event. Such problems require a scalable analysis and some form of           data collections.
sentiments aggregation to produce a representative result.                ● We experimentally evaluate the proposed approach using several
The problem of contradictions, or sentiment diversity on some topic,        synthetic and real datasets. The results show the effectiveness
has been studied in the context of different research areas, having         and scalability of our solution. In addition, we perform a user-
a slightly varying notion in each case. For instance, in Information        study that demonstrates the usefulness of the proposed frame-
Retrieval opposite opinions and sentiments introduce noise to the           work.
fact-centric search and must be avoided [14]. In contrast, conflict-
ing sentiments is one of the desired targets of mining of product         The remainder of this paper is structured as follows. In Section 2
reviews. Recently proposed methods can aggregate opinions ex-             we discuss the related work, and in Section 3 we formally define
pressed in customer reviews and extract a representative summary          the problem. We present our approach for detecting and storing
of sentiments on a feature-by-feature basis; or they can capture and      contradictions in Section 4 and Section 5, respectively, and the ex-
aggregate sentiments on some topic among different texts [8].             perimental evaluation in Section 6. We discuss our experiences in
Although aggregated sentiments do represent some information on           Section 7, and conclude in Section 8.
contradiction, this information may be biased. For example, if two
                                                                          1
opposite sentiment values are averaged, the result may have a neu-            Some preliminary ideas have appeared as a poster [16].
2.    RELATED WORK                                                       3. PROBLEM DEFINITION
In the past few years, we have witnessed an increasing research in-      The problem we want to solve in this paper is the efficient detection
terest in the area of blog analysis and specifically in opinion mining   of contradicting opinions2 (on specific topics).
[13]. Contradiction analysis is a rather new research area. In partic-   Usually, a particular source of information covers some general
ular, contradictions in opinions as considered here, have not been       topic T (e.g., health, politics) and has a tendency to publish more
addressed before. Harabagiu et al. [6] present a framework for con-      texts about one topic than another. Yet, within a text, an author may
tradiction analysis that exploits linguistic information such as nega-   discuss several topics. When using the term ’text’ we refer either to
tion or antonymy as well as semantic information, such as types of       the entire web document or its individual sentences. With the term
verbs. De Marneffe et al. [3] introduce a classification of contra-      sentence we assume a particular piece of text expressing an opin-
dictions consisting of seven types that are distinguished by the fea-    ion about a certain topic, which can not be split into smaller parts
tures that contribute to a contradiction (e.g., antonymy, negation,      without breaking its meaning. For each of the topics discussed in
numeric mismatches). They define contradictions as a situation           some text, we wish to identify the sentiment expressed towards it.
where ’two sentences are extremely unlikely to be true’, and de-         In this study, we restrict ourselves to identifying and recording the
scribe a contradiction detection approach to their textual entailment    intensity of these sentiments, which we represent as numbers. In
application [12]. Ennals et al. [5] describe an approach that detects    the following, we refer to sentiment polarity simply as sentiment.
contradicting claims by checking whether some particular claim
entails (i.e., has the same sense as) one of those that are known to        D EFNINITION 1 (S ENTIMENT ). The sentiment S with respect
be disputed. For this purpose, they have aggregated disputed claims      to a topic T is a real number in the range [−1, 1] that indicates the
from Snopes.com and Politifact.com into a database. Additionally,        polarity of the author’s opinion on T expressed in a text. Nega-
they populated this database by selecting explicit statements of con-    tive and positive values represent negative and positive opinions
tradiction or negation from web texts.                                   respectively, while the absolute value of sentiment represents the
The above approaches are based on linguistic analysis and textual        strength of the opinion.
entailment. In contrast, our approach is based on statistical princi-    Apart from computing sentiments for individual texts, we also need
ples and intended for a large-scale operation, where pairwise com-       to compute the polarity on some topic aggregated over multiple
parisons of texts may not be computationally efficient. In addition,     texts (that may span different authors, as well as time periods).
we are considering a time dimension for contradiction, which al-
lows us to introduce such new types as, for example, change of               D EFNINITION 2 (AGGREGATED S ENTIMENT ). The Aggregated
opinion (asynchronous contradiction). To the best of our knowl-          Sentiment µS expressed in a collection of documents D on topic T ,
edge, this problem has not been studied so far.                          is defined as the mean value over all individual sentiments assigned
Problems related to the identification and analysis of contradictions    in that collection. µS is defined on the same range of [−1, 1] as
have also been studied in the context of social networks and blogs.      sentiments and calculated as follows: µS = n1 ∑n   i=1 Si , where n is
A recent work by Liu et al. [10] introduces a system that allows to      the cardinality of D.
compare contrasting opinions of experienced blog users on some
topic. In contrast, we take into account the opinions of all web         By comparing the sentiment values of different collections of texts,
users, regardless of their expertise. Clustering accuracy as an in-      contradictions are identified as follows.
dicator of blogosphere topic convergence was proposed by [17].              D EFNINITION 3 (C ONTRADICTION ). There is a contradiction
By analyzing how accurate clustering is in different time intervals,     on a topic, T , between two groups of documents, D1 , D2 ⊂ D in a
one can estimate how correlated, or diverse, blog topics are. Such       document collection D, where D1 ⋂ D2 = ∅, when the information
an approach can also be adapted to opinion contradictions as well,       conveyed about T is considerably more different between D1 and
by replacing topic feature vectors by sentiment feature vectors. Our     D2 than within each one of them.
work goes beyond trend analysis by automatically recognizing con-
tradictions regarding some topic within and across documents.            In the above definition, we purposely not specify exactly what it
Analysis of product reviews is another opinion mining task that is       means for a sentiment value to be very different from another one.
close to contradiction analysis. A system for mining the reputation      We define contradiction on a pairwise basis, where we evaluate the
of products in the Web is described in [11]. A similar approach          disagreement between two groups of documents in a collection. In
is proposed by the Opinion Observer system [9] that focuses on           this case, the similarity of information within each group serves as
summarizing the strengths and weaknesses of a particular product.        a reference point, providing a basic disagreement level. This defi-
Even though the above studies consider both positive and nega-           nition can lead to different implementations, and each one of those
tive opinions, they do not aggregate these two classes. In our ap-       will have a slightly different interpretation of the notion of contra-
proach, we describe an effective way for performing this aggrega-        diction. We argue that our definition captures the essence of con-
tion, which leads to more insights on the user opinions.                 tradictions, without trying to impose any of the specific interpre-
Chen et al. [2] study precisely the problem of conflicting opinions      tations. Nevertheless, in Section 4, we propose a specific method
on a corpus of book reviews, which they classify as positive and         for computing contradictions, which incorporates many desirable
negative. Their main goal is to identify the most predictive terms       properties.
for the above classification task, and visualize the results for man-    When identifying contradictions in a document collection, it is im-
ual inspection. However, the results are only used to visualize op-      portant to also take into account the time in which these documents
posite opinions without further aggregation. It is up to the user to     were published. Let D1 be a group of documents containing some
visually inspect the results and draw some conclusions. In con-          information on topic T , and all documents in D1 were published
trast, we propose a systematic and automated way of performing           within some time interval t1 . Assume that t1 is followed by time
sentiment aggregation, revealing contradictions, and analyzing the       interval t2 , and the documents published in t2 , D2 , contain a con-
evolution of these contradictions over time.                             flicting piece of information on T . In this case, we have a special
                                                                         2
                                                                           For the rest of this document we will use the terms sentiment and
                                                                         opinion interchangeably.
type of contradiction, which we call Asynchronous Contradiction,
since D1 and D2 correspond to two different time intervals. Fol-
lowing the same line of thought, we say that we have a Synchronous
Contradiction when both D1 and D2 correspond to a single time
interval, t.
In order to detect contradicting opinions in collections of texts, we
first need to determine all the different topics and then calculate the      Figure 1: Example of two possible sentiment distributions.
corresponding sentiments.
                                                                           4.2 Measuring Contradictions
   P ROBLEM 1 (S INGLE -T OPIC C ONTRADICTION D ETECTION ).                In order to be able to identify contradicting opinions we need to
For a given time interval τ , and topic T , identify the time regions of   define a measure of contradiction. Assume that we want to look
a predefined size w, where a contradiction level for T is exceeding        for contradictions in a shifting time window3 w. For a particular
some threshold ρ.                                                          topic T , the set of documents D, which we use for calculation, will
The time interval, τ , is user-defined. As we will discuss later,          be restricted to those, that were posted within the window w. We
the threshold, ρ, can either be user-defined, or automatically deter-      denote this set as D(w), and n as its cardinality, n = ∣D(w)∣.
mined in an adaptive fashion based on the data under consideration.        In this example, a value of aggregated sentiment µS close to zero
We can also determine all the topics in a dataset that are involved        implies a high level of contradiction because positive and nega-
in contradictions, as follows.                                             tive sentiments compensate each other. A problem with the above
                                                                           way of calculating topic sentiment arises when there exists a large
   P ROBLEM 2 (A LL -T OPICS C ONTRADICTION D ETECTION ).                  number of documents with very low sentiment values (neutral doc-
For a given time interval τ , identify topics T , which have high con-     uments). In this case, the value of µS will be drawn close to zero,
tradiction level, or large number of contradicting regions above           without necessarily reflecting the true situation of the contradiction.
some threshold.                                                            Therefore, we suggest to additionally consider the variance of the
The latter problem is interesting if we want to consider the popu-         sentiments along with their mean value. The sentiment variance σS2
larity of certain web topics. Frequent contradictions may indicate         is defined as follows:
"hot" topics, which attract the interest of the community. Due to                                         1 n
                                                                                                  σS2 =     ∑(Si − µS )
                                                                                                                       2
                                                                                                                                              (1)
space limitations, in this paper we only discuss a solution to the first                                  n i=1
problem, since a solution to the second one is its direct extension.
Though, the approach we propose in this work is general, and can           According to the above definition, when there is a large uncertainty
lead to solutions for several other variations of the above problem,       about the collective sentiment of a collection of documents on a
such as detection of topics with periodically repeating contradic-         particular topic, the topic sentiment variance is large as well.
tions or with the most frequently alternating Aggregated Sentiment.        Figure 1 shows two example sentiment distributions. Distribution
                                                                           A with µS close to zero and a high variance indicates a very con-
                                                                           tradictive topic. Distribution B shows a far less contradictive topic
4.    CONTRADICTION DETECTION                                              with sentiment mean µS in the positive range and low variance. For
Given the problems described before, we propose a three step ap-           example, a group of documents with µS close to zero and a high
proach to contradiction analysis, that includes:                           variance (distribution A on the Figure 1) will be very contradictive,
● Detection of topics for each sentence,                                   and another group with sentiment µS shifted to negative or positive
● Detection of sentiments for each sentence-topic pair, and                with low variance is likely to be far less contradictive (distribution
● Analysis of sentiments for topic across multiple texts.                  B on the Figure 1). We note that neither the mean nor the variance
                                                                           can be used independently to identify contradictions. For example,
Steps one and two can be achieved using existing methods, or adap-
                                                                           a fairly large variance among sentiments does not lead to a con-
tations of existing methods. We will refer to these steps as ’prepro-
                                                                           tradiction when only positive or negative sentiments are present.
cessing’ and describe them briefly in the following. The focus of
                                                                           Moreover, a zero mean value may occur even when all posts are
this paper is then the contradiction detection approach.
                                                                           neutral, which once again does not indicate a contradiction. When
4.1 Preprocessing                                                          assuming a large number of neutral sentiments in the collection,
                                                                           we have two opposite trends: the average sentiment moves towards
For identifying topics per sentence, we apply the Latent Dirichlet
                                                                           zero and sentiment variance decreases. If these trends will com-
Allocation (LDA) algorithm [1], which we extended to work on the
                                                                           pensate each other, the neutral documents would not affect the con-
sentence level [4]. So sentences are considered as input documents
                                                                           tradiction value much.
for the LDA and assigned with several most probable topics.
                                                                           Evidently, we need to combine mean and variance of sentiments in
Then, for each sentence-topic pair we assign a continuous senti-
                                                                           a single formula for computing contradictions. Then, the contra-
ment value in the range [-1;1] that indicates a polarity of the opinion
                                                                           diction value C can be computed as:
expressed regarding the topic. For the sentiment assignment step,
we use an existing tool for fine-grained opinion analysis [7]. Nev-
ertheless, this tool can be replaced by any other suitable one that                                             σS2
                                                                                                          C=                                  (2)
calculates continuous sentiment values at a sentence level. Then                                               (µS )2
we average sentiments over text’s sentences having the same topic,
to get one sentiment value for each topic in a text.                       where µS is squared so that its units are the same as of σS2 .
Based on the analysis described so far, we can now describe our ap-        This formula captures the intuition that contradiction values should
proach for contradiction detection with respect to different topics.       be higher for topics whose sentiment value is close to zero, and
In the following paragraphs, we first propose a novel contradiction        sentiment variance is large. Nevertheless, the contradiction values
measure, and then describe two simple approaches aiming at de-             3
                                                                             Without loss of generality, in this work we consider windows of
tecting contradictive periods in time.                                     days, weeks, months, and years.
generated by this formula are unbounded (i.e., they can grow arbi-
trarily high as µS approaches zero), and does not account for the
number of documents n. This latter point is important, because in
the extreme where D(w) contains only two documents with op-
posite values, C will be very high, and will compare unfavorably
to the contradiction value of a different set of T documents with a
much higher cardinality.
Incorporating to the contradiction formula the observations made
above, we propose the following final formula for computing con-
tradiction values:
                                ϑ ⋅ σS2
                         C=              W                         (3)
                              ϑ + (µS )2
In the denominator, we add a small value, ϑ ≠ 0, which allows to
limit the level of contradiction when (µS )2 is close to zero. The
nominator is multiplied by ϑ to ensure that contradiction values
fall within the interval [0; 1]. Figure 2(c) shows how a contra-
diction value depends on ϑ in the denominator. Smaller ϑ values
emphasize contradiction points with µS close to zero, for example
changes of opinion. Larger ϑ values mask this difference, making
levels of contradictions more equal. In this study, we used a value
of ϑ set at 5% of the expected value of squared sentiment mean,
which was effective for its purpose, exhibiting a stable behavior
across datasets, without distorting the final results.                    Figure 2: Example of contradiction values computed
W is a weight function aiming to compensate the contradiction             from a synthetic dataset with two planted contradictions.
value for the varying number of documents that may be involved
in the calculation of C. The weight function is defined as:                                       1
                                                                          sentiments value




                                                                                                0.5

                                      n − n −1                                                    0

                     W = (1 + exp(         ))                      (4)                          -0.5
                                        β                                                        -1
                                                                                                  1
                                                                          sentiments mean




where the constant n reflects the average number of topic docu-                                 0.5

ments in the window, and β is a scaling factor. This weight func-                                 0

tion provides a multiplicative factor in the range [0; 1] Using W                               -0.5

                                                                                                 -1
we can effectively limit C when there is a minor number of docu-                                0.3
                                                                          sentiments variance




ments, as well as when this same number of documents increases                                  0.2

significantly. What W achieves is essentially a normalization of                                0.1
the contradiction values across different sets of documents, allow-                               0
ing them to be meaningfully compared to each other.                                             0.2
                                                                          contradiction value




                                                                                                                    with neutral          without neutral
Figure 2 shows the operation of the proposed contradiction func-
                                                                                                0.1
tion. To demonstrate this, we generated a time series of sentiments
for a period of 8000 time units composed of 8000 normally dis-                                    0
                                                                                                       0    200         400                600              800   1000
tributed points, half of which follow a custom trend with disper-                                                                  time
sion 0.125 and another half with dispersion 0.25 and median 0 is
acting like noise. Time stamps of all points followed the Poisson                               Figure 3: The effect of neutral sentiments on contradiction.
distribution with parameter λ = 1 time units. We have chosen these
distributions because they are simple but still resemble the real data.   sponding to a change of sentiment that manifests itself across the
The graph at the top (Figure 2(a)) shows generated sentiments. The        entire dataset.
bold line in this graph depicts the custom trend, showing an initial      Subjective sentences take a considerably small part in the text when
positive sentiment that later changes to negative (at time instance       compared to objective statements. So neutral sentiments usually
t1 ), which represents a change of sentiment. There is also a point       shift the aggregate sentiment towards zero, masking contradictions.
around time instance t2 , where the sentiments are divided between        Our contradiction formula is designed to compensate such effects
positive and negative, a situation representing a simultaneous con-       by exploiting the sentiment variance.We demonstrate such behavior
tradiction. Using this dataset, we verify the ability of the C function   on another synthetic dataset shown in Figure 3. The bottom graph
to capture the planted contradictions.                                    shows that the proposed formula can successfully identify the main
As can be seen in Figure 2(b), µS closely captures the aggregate          contradicting regions, both with or without neutral sentiments.
trend of the raw sentiments. The following two graphs in the figure
show the contradiction value, calculated using a sliding window of
size 500 and 1000 time units. When we use a window of small size          5. STORING CONTRADICTIONS
(Figure 2(c)), C correctly identifies the two contradictions at points    So far we have described a technique for processing web docu-
t1 and t2 , where the values of C are the largest. Using a larger         ments to extract sentiments on various topics, and subsequently to
window has a smoothing effect in the values of C (Figure 2(d)).           use this information in order to identify contradictions. But our
Nevertheless, we can still identify long-lasting contradictions: In       final goal is to identify contradictions in large collections of docu-
this case, the largest value of C occurs at time instance t1 , corre-     ments, what requires scalable methods. To this end, we demon-
strated the need to analyze sentiment information on each topic
across different time windows. Assuming this requirement, scal-
ability may be achieved by storing pre-computed values for win-
dows of different size. We now turn our attention to the problem of
organizing all these data in a way that will allow the efficient de-
tection of contradictions in large collections of data that span very
long time intervals.
An important observation is that the Formula 3 that calculates the
contradiction values is based on the mean and variance of the topic
sentiment. Remember that aggregated sentiment and sentiment                      Figure 4: Logical representation of the TimeTree.
variance can be written as the following:
                                                                          5.2 Querying for Contradictions
        1 n                  1 n            1 n
    µS = ∑ Si ;         σS2 = ∑(Si − µS )2 = ∑ Si2 − µ2S                  When trying to detect contradictions, we would like to identify
        n i=1                n i=1          n i=1                         those that have a contradiction value above some threshold. The
                                                                          intuition is that these contradictions are going to be more interest-
In the formula above, n is the number of documents published on           ing than the rest in the same time interval. An obvious solution
topic T in a specific time window (see Definition 2).                     in this case is to define some fixed threshold, ρ, and only report
We now define the first- and second-order moments of the topic            the contradictions above this threshold. We refer to this solution as
sentiment as M1 = ∑n   i=1 Si and M2 = ∑i=1 Si , respectively. Based
                                        n    2
                                                                          fixed threshold. However, by adopting the above solution, we can-
on the above discussion, and using the sums M1 and M2 , we can            not normalize the threshold to better fit the nature of the data within
rewrite Formula 3 as follows:                                             each time window (that may vary over time and across topics).
                              nM2 − M12                                   In order to address this problem, we propose an adaptive threshold
                        C=              W                          (5)    technique, which computes a different threshold for each topic and
                              ϑn2 + M12
                                                                          time window as follows. The adaptive threshold ̺w for a topic T
The above form of the contradiction values formula gives us ad-           in time window w is based on the contradiction value Cwp that has
ditional flexibility, since we can now compute the contradiction of       been calculated for T in the parent time window of w, wp , and is
a large time window by composing the corresponding values from            defined for each time window and topic as ̺w = p ⋅ Cwp , 0 < p < 1.
the smaller windows contained in the large one. We can therefore          In our experience with real datasets, p values between 0.5 − 0.7
build data structures that take advantage of this property.               work well. In this work, we use p = 0.6.
In the next paragraphs, we describe such a data structure, and we         Note that we cannot achieve the same result by using top-k queries
show how it can be used to identify contradictions. We also demon-        (though, they can be complementary to our approach). The reason
strate that it can be easily maintained in an incremental fashion         is that adaptive threshold does not impose a strict limit on the num-
when new documents are added in the system.                               ber of contradictions in the result, and can thus report the entire set
                                                                          of interesting contradictions within some time interval.

5.1 TimeTree for Contradictions                                           5.3 Updating the Contradictions
The need to analyze contradictions at different time granularities        As discussed earlier, the nature of the contradiction function (For-
predicts a hierarchical structure for contradiction storage. There is a   mula 5) and the TimeTree nodes allows us to incrementally main-
number of ways to organize contradiction values by time. The first        tain the TimeTree in the presence of updates. When new collec-
solution is to store a time-tree structure for each topic separately.     tions or individual documents are analyzed, their contribution to
It allows to achieve a scalability on the number of topics, and has       the contradiction of the corresponding topics and time windows in
a good performance when looking for contradictions at a single            the TimeTree can be easily taken into account by updating the set
topic, but also brings larger update costs, because for each text the     of relevant {n, M1 , M2 } values in the nodes of the tree.
storage needs to be parsed as many times as there are topics in that      In order to reduce update costs, we propose first to accumulate sev-
text. Also it makes all-topic queries extremely ineffective, because      eral updates and then submit them in a batch. When new documents
for each topic we need to navigate through a time structure to find       arrive, as a preprocessing step, they are aggregated in time windows
the right interval. The second solution that we propose is to store       of the finest granularity of the TimeTree. Then, these aggregated
contradiction values for different topics under the same time-tree        values are used to update the counts and topic sentiment moments
structure.                                                                of all TimeTree nodes containing respective time windows.
We introduce the TimeTree for managing the information on sen-            The update cost for each batch of aggregated documents depends
timents and contradictions. The TimeTree is organized around the          on the depth of the TimeTree, d, and the number of topics, ∣T ∣ (in
sentiment moments, M1 and M2 , and a hierarchical segmentation            the worst case), that participate in the time windows relevant to the
of time, as outlined in Figure 4. In this example, the time windows       update. Thus, the complexity can be expressed as O(d ⋅ ∣T ∣)
are organized on days, weeks, months, and years (though, other hi-
erarchical time decompositions are applicable as well). Using this        6. EXPERIMENTAL EVALUATION
kind of structure, we can answer queries on adhoc time intervals,         As mentioned earlier, the contradiction detection problem has not
by dynamically computing the contradiction values based on For-           been considered before. Therefore, no annotated data set is avail-
mula 5. In the following, we will refer to the levels of the TimeTree     able to measure the quality of the proposed approach in terms of
as the different granularities of the time decomposition, the root        accuracy. Anyway, we applied the algorithm to real world data sets
node having granularity 0.                                                and run several experiments with settings and results described in
Each node in the TimeTree corresponds to a time window, and sum-          this section. The objectives of these experiments are to: Analyze
marizes information for all documents, whose timestamp is con-            the quality of the approach; Study its usefulness from a user per-
tained in this time window.                                               spective; Study the performance of the introduced approach.
6.1 Corpus Description                                                                                   1




                                                                           sentiments value
                                                                                                       0.5

Our algorithms are applied to a data set of drug reviews collected                                       0

from the DrugRatingz website4 , a data set of comments to YouTube                                      -0.5

videos from L3S [15] and a dataset with comments on postings                                            -1
                                                                                                       0.2

from Slashdot, provided for the CAW2 workshop5 .




                                                                                     sentiments mean
                                                                                                       0.1

The first dataset contains 2701 positive, 352 neutral and 1616 neg-                                      0

ative reviews for 477 drugs. These reviews are provided by persons                                     -0.1

                                                                                                       -0.2
that took a specific drug. They describe their personal experience                                      0.1




                                                                         sentiments variance
with the drug including contra-indications that occurred.                                         0.08
                                                                                                  0.06
The second dataset contains approximately 6 million comments to                                   0.04

YouTube videos, with an average number of comments per each                                       0.02
                                                                                                     0
video of five hundred. Unlike texts in review datasets which usually                              0.03




                                                                         contradiction value
                                                                                                                           1
contain opinions specific to a topic, some of these comments con-                                 0.02

tain information irrelevant to a topic, thus introducing extra noise                              0.01

to sentiment detection.                                                                                  0
Our third dataset, Slashdot, is from a popular website for people                                       Sep, 05 Oct, 05   Nov, 05 Dec, 05   Jan, 06   Feb, 06 Mar, 06
                                                                                                                                                               date
                                                                                                                                                                        Apr, 06 May, 06   Jun, 06   Jul, 06   Aug, 06 Sep, 06


interested in reading and discussing about technology and its ram-
ifications. It publishes short story posts which often incite many                                        PRO: It would be helpful for restricting the flow of information, which is a
readers to comment on them and provoke discussions that may trail                                         double edged sword.
                                                                                                          PRO: I suppose we better wrap a firewall around our country and not let
for hours or even days. It contains about 140,000 comments under                                          those damn foreigners access to our internet.
496 articles, covering the time period from August 2005 to Septem-                                        CONS: And what exactly does a neutral Internet do? It takes away the right
ber 2006. Compared to usually brief comments on YouTube videos,                                           of anyone who lays down the wires or installs the access points to control
comments from the latter dataset may span for several paragraphs                                          what goes through their network. My point: don’t complain about taking
                                                                                                          rights away when you advocate to take rights away.
and typically contain many objective statements.                                                          CONS: While it sounds like a decent idea, I’m really all for the whole
                                                                                                          uncensored and unregulated internet. I really like my internet the way it is.
6.2 Evaluation of Contradictions                                                                          CONS: Sure, they can ruin Internet inside USA, but the rest of the world
                                                                                                          couldn’t care less.
We now apply the introduced contradiction analysis approach to                                            CONS: We don’t need the FCC regulating the Internet. Not for "neutrality"
our datasets. In Figure 5, the top graph depicts the raw sentiment                                        or any other excuse someone can think of.
values for the topic "internet government control" taken from the
Slashdot dataset, for the time interval September 2005 to Septem-        Figure 5: Mean, variance and contradiction values of senti-
ber 2006. The following graphs show the aggregated sentiment             ments for the topic "Internet government control".
and variance (two middle graphs), and contradiction values (bot-
tom graph) for the above topic and time interval. Contradiction
values have been calculated using a time window of ten days. Note        6.3 Evaluation of Usefulness
that contradiction values are high for the time windows where topic      In the following paragraphs we describe a user study which we
sentiment is around zero and variance is high, which translates to       conducted in order to evaluate the effectiveness and usefulness of
a set of posts with highly diverse sentiments. These situations are      our approach for the task of contradiction discovery.
not easy to identify either with a quick visual inspection of the raw    In our usefulness evaluation, we used four datasets corresponding
sentiments, aggregated sentiments or sentiment variance.                 to opinionated posts for four topics extracted from three diverse
The analysis shows that in this time interval there is one major con-    real datasets (refer to Table 1). For each topic, we selected a vary-
tradiction (marked 1 in the bottom graph of Figure 5). This contra-      ing number of posts, spanning in time from one to almost three
diction discusses the pros and cons of a law that would give the gov-    years. The shortest list contained 60 posts, and the largest about
ernment more power in controlling the internet traffic, especially       480. Moreover, the quality of posts for topics also differed a lot.
personal correspondence. Minor peaks in contradiction level here         The drug review datasets contained primarily brief and concise
correspond to the discussion of a possible transfer of jurisdiction      opinions about drugs; Slashdot topics featured large and detailed
and control over top-level domains to United Nations. The table          comments, with an average size of several paragraphs; YouTube
below shows extracts from several opposing posts that contributed        comments were, on the contrary, short and often off-topic.
to this contradiction. By taking a closer look at the corresponding      The group of users consisted of eight persons (PhD students at the
weblog posts, we find out that the discussion is about restricted in-    University of Trento), and the experiment was conducted as fol-
ternet access and its advantages, while other contradictions contain     lows. Users were asked to detect groups of contradicting posts
a general discussion on the possibility of organizing the content by     for each of the topics in the above datasets (and label the posi-
several top-level domains and restricting access to them.                tive and negative posts). We provided users with a web application
Another example of contradicting posts may be observed in Fig-           that featured two approaches to help them identify time-intervals
ure 6, which illustrates conflicting opinions for the topic "Yaz"6 for   with potentially contradicting posts (see Figure 6): The first ap-
a selected time interval. In this case, there was an opinion disagree-   proach (marked as "stage 1" in the figure), based on the visualiza-
ment on the effectiveness and possible side-effects of this drug.        tion method proposed by Chen et al. [2], displays to users the in-
Evidently, all the discovered contradictions correspond to discus-       tensity over time of the positive and negative sentiments expressed
sions expressing different points of view on the same topic, and         in the posts (Figure 6(a)). The second approach (marked as "stage
having an automated way of identifying them can be very useful.          2" in the figure) is based on the method proposed in this study, and
                                                                         displays to users a graph that marks the time points at which contra-
4
  http://drugratingz.com                                                 dictions were automatically detected (Figure 6(b)). Using our tool,
5
  http://caw2.barcelonamedia.org/                                        the users could see the time intervals that our tool had identified as
6
  Yaz is a drug for contraception                                        contradictory, and could therefore, focus their exploration in these
                                                                         Dataset       Topic name     Size   ∆D     ∆T     ∆N     P1     P2     ∆P
   a)      Average positive and negative sentiments (Stage 1)
                                                                         Drug          Ambien         60     1.50   0.60   0.88   0.70   0.81   1.20
                                                                         Ratingz       Yaz            300    1.58   0.93   0.78   0.75   0.95   1.32
                                                                         Slashdot      Int. control   159    1.17   0.89   0.58   0.37   0.63   2.14
                                                                         YouTube       Zune HD        472    2.07   0.68   0.62   0.36   0.61   2.09
                                                                         Average                             1.58   0.77   0.72   0.55   0.75   1.69
   b)   Contradiction level (Stage 2)
                                                                                 Table 1: Evaluation results for different topics.



                                                                         volved many posts. In this case, going through the posts was not
   c) Sentiments annotated by users (from the log data)                  easy, and our approach allowed users to focus their search and iden-
                                                                         tify the contradicting posts.
                                                                         Finally, in Table 1 we report an additional measure of usefulness:
                                                                         since both approaches aim at guiding the users to the time-intervals
                                                                         that are most promising for containing contradictions, we com-
                                                                         puted the percentage, P1 and P2 , of the examined time-intervals
   d) Texts from a selected time intervals (Stages 1 and 2)
                                                                         that led to the identification of a contradiction, as well as the im-
                                                                         provement of our approach when compared to the alternative, ∆P =
                                                                         P2 /P1 . Even though the approach by Chen et al. [2] (stage 1) was
                                                                         not designed with this measure in mind, in the case of our approach,
                                                                         this measure is indicative of its precision since it measures how
                                                                         many of the automatically identified contradictions were real ones
                                                                         (i.e., verified by the users). The results show that our approach was
Figure 6: Annotation page for the dataset "Yaz" demonstrating            always more successful in suggesting to users time-intervals that
opposite opinions.                                                       contained contradictions, with an overall average success rate of
                                                                         75%, and as high as 95% (topic "Yaz").
                                                                         The above results demonstrate that our approach can successfully
regions. Figure 6(d) shows some posts in a time-interval, which          identify contradictions in an automated way, and quickly guide
have been marked with positive (green) and negative (red) senti-         users to the relevant parts of the data.
ments. These sentiments values are also illustrated in the overall
time-line, depicted in Figure 6(c). In order not to favor any of the     6.4 Evaluation of Scalability
two approaches, in our experiments we alternated the approach re-
                                                                         We evaluate the scalability of the TimeTree for solving Problems 1
quired to be completed first.
                                                                         and 2, using a relational database implementation, where informa-
For both approaches, we measured the average time, T1 and T2 ,
                                                                         tion is stored in a single table that contains contradiction values for
and the average number of time-intervals examined by the users
                                                                         each topic with respect to time intervals of different granularities.
during the search, N1 and N2 , needed to identify a single contra-
                                                                         This implementation leads to simple and efficient SQL queries for
diction. Additionally, we asked users to rate the overall difficulty,
                                                                         detecting interesting contradictions. Remember that in the topic
D1 and D2 , of completing the task when using each one of the two
                                                                         contradiction problem (Problem 1) we want to identify the contra-
approaches, according to the following scale: 1- very difficult; 2 -
                                                                         dictions and corresponding time windows of a single topic within
somewhat difficult; 3 - normal; 4 - somewhat easy; 5 - very easy.
                                                                         some time interval, while in the all topic contradictions problem
The aggregated results (averaged over all the users) of our evalua-
                                                                         (Problem 2) we are interested in doing the same for all topics.
tion are reported in Table 1. We report the improvements7 we mea-
                                                                         During this study, parameters of the contradiction formula were
sured when our approach was used (stage 2), compared to the al-
                                                                         at their default values as described in Section 4. Changing for-
ternative approach (stage 1), computed as follows: ∆D = D2 /D1 ,
                                                                         mula’s parameters will enlarge or reduce the number of contradic-
∆T = T2 /T1 , and ∆N = N2 /N1 .
                                                                         tions being detected, but the computational efficiency will be the
We observe that when users employed our approach in order to de-
                                                                         same. Performance of our approach does not depend on the value
tect contradictions, they were able to identify contradictions faster,
                                                                         of threshold because we are not storing pre-computed contradiction
requiring 23% less time on average (ranging between 7% and 40%).
                                                                         values, and so the database is unable to apply indices or filtering on
The biggest improvement was for the topic "Ambien"8 (∆T =
                                                                         this parameter. Fixed and adaptive threshold approaches, however,
0.60), which had a few contradicting posts visible using our ap-
                                                                         return slightly different sets of contradictions. The first one returns
proach, but otherwise hard to discover. Our approach also led to a
                                                                         largest contradictions themselves, and the second returns contradic-
reduction by 28% of the time-intervals examined in order to iden-
                                                                         tions that are greater than p-times values of their respective parent
tify contradictions (ranging between 12% and 42%). The largest
                                                                         intervals. The value of p was empirically set at 0.6 to return a re-
reductions were observed for the topics "Zune HD" and "Internet
                                                                         sult set with an average size equal to the one when using a fixed
Control" (∆N = 0.62 and 0.58, respectively), which contained
                                                                         threshold. This allows us to compare the relative performance of
several posts that did not take a position, or were off topic. The av-
                                                                         both methods.
erage difficulty ratings were also favorable for our approach, which
                                                                         To test the performance of our solutions, we generated sets of 25
was consistently being marked as more helpful. This difference was
                                                                         single-topic and all-topics queries (corresponding to the Topic and
most pronounced for the "Zune HD" topic (∆D = 2.07), which in-
                                                                         Time Interval Contradictions problems, respectively), using granu-
7                                                                        larities and topic ids drawn uniformly at random. In these exper-
  We omit presenting the detailed results for all parameters mea-
sured and each approach due to lack of space.                            iments, we used 1,000 topics. We measured the time needed to
8
  Ambien is a drug for treating insomnia                                 execute these queries against the database as a function of the time
1⋅103                                                               1⋅105                                                                 or without neutral sentiments, allowing it to incorporate sentiment
             time, ms




                                                                                time, ms
                         Query 1: time interval test                                            Query 2: time interval test
                                                                                                                                          detection algorithms of different types.
1⋅102
                                                                                                                                          As was mentioned previously, to build the contradiction formula
                                                                                                                                          we used such values as mean and variance. We believe that the
                                                                    1⋅104
                                                                                                                                          effectiveness of our approach increases with the growing scale, re-
1⋅101
                                                                                                                                          lying on the fact that representativeness of statistical metrics also
                              time interval, days                                                         time interval, days
                                                                                                                                          increases when larger number of samples is involved in computa-
1⋅100                                                               1⋅103
         200                 400              800          1600             200                          400              800    1600
                                                                                                                                          tion. Moreover, tests on the synthetic data proved our formula’s
                                                                                                                                          stable behavior in the presence of noise.
1⋅103                                                               1⋅106
                         Query 1: granularity test                                                   Query 2: granularity test            Finally, we note that we are aware that the evaluation of our (and
1⋅102                                                               1⋅105
                                                                                                                                          related) approach to contradiction detection is still limited with re-
                                                                                                                                          spect to the precision and recall measures. The main reason for this
1⋅101                                                               1⋅104                                                                 is the absence of a benchmark dataset, and the difficulty in creating
                                                                                                                                          one. We are currently working toward such a dataset, suitable for
1⋅100                                                               1⋅103
             time, ms




                                                                                     time, ms


                                                                                                                                          testing different algorithms in this area.
                               granularity, days                                                           granularity, days
1⋅10-1                                                              1⋅102
         1                   10                100           1000           1                            10                100     1000
                                                                                                                                          8. CONCLUSIONS
         dbms adaptive                        dbms fixed                                                                                  In this paper, we proposed an approach to detect contradictions in
                                                                                                                                          documents, which is the first general and systematic solution to
         Figure 7: Scalability of single-topic and all-topics queries.                                                                    the problem. The experimental evaluation, with synthetic data and
                                                                                                                                          three diverse real-world datasets, as well we the user-study, demon-
                                                                                                                                          strate the applicability and usefulness of the proposed solution.
                                                                                                                                          We are currently working on extending our approach so that it can
 interval, τ , and the granularity of the time windows (Figure 7). We
                                                                                                                                          work in an online mode. This will enable us to continuously moni-
 report results for both the fixed and the adaptive thresholds.
                                                                                                                                          tor opinions in real-time.
 The adaptive threshold queries require in all cases more time since
 the threshold in this case has to be computed based on the contra-
 diction value of the parent time window, which incurs more compu-                                                                        9. REFERENCES
 tation. This difference is pronounced for the database implementa-                                                                        [1] D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty. Latent dirichlet
 tion, because it involves an extra join for obtaining the parent time                                                                         allocation. JMLR, 3, 2003.
 window.                                                                                                                                   [2] C. Chen, F. Ibekwe-SanJuan, E. SanJuan, and C. Weaver. Visual anal-
                                                                                                                                               ysis of conflicting opinions. In IEEE Symposium on Visual Analytics
 We observe that both single-topic and all-topics queries (see Fig-
                                                                                                                                               Science and Technology, pages 59–66, 2006.
 ures 7(a-b)) scale linearly with the size of τ . This confirms our                                                                        [3] M. C. de Marneffe, A. N. Rafferty, and C. D. Manning. Finding con-
 analytic results, and is explained by the fact that the queries have                                                                          tradictions in text. In ACL-08: HLT, pages 1039–1047, 2008.
 to return contradictions for all time windows (of a specific granu-                                                                       [4] K. Denecke and M. Brosowski. Topic detection in noisy data source.
 larity) that are contained in τ . For single-topic queries with fixed                                                                         In ICDIM, pages 50–55, 2010.
 threshold, the database is able to use all its indices (i.e., on topic                                                                    [5] R. Ennals, B. Trushkowsky, and J. M. Agosta. Highlighting disputed
                                                                                                                                               claims on the web. In WWW, pages 341–350, 2010.
 id, time windows, and granularity) to answer the queries, therefore,
                                                                                                                                           [6] S. Harabagiu, A. Hickl, and F. Lacatusu. Negation, contrast and con-
 achieving very fast response times.                                                                                                           tradiction in text processing. In AAAI, pages 755–762, 2006.
 Figures 7(c-d) depict the time results when we vary the granularity                                                                       [7] R. Johansson and A. Moschitti. Reranking models in fine-grained
 of the time windows specified by the queries. Increasing the granu-                                                                           opinion analysis. In COLING, pages 519–527. ACL, 2010.
 larity translates to larger time windows (i.e., moving up in the time                                                                     [8] K. Lerman, S. Blair-Goldensohn, and R. Mcdonald. Sentiment sum-
 hierarchy) and a smaller number of time windows for the same time                                                                             marization: Evaluating and learning user preferences. In EACL, pages
                                                                                                                                               514–522, 2009.
 interval. Thus, response times get lower.
                                                                                                                                           [9] B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and com-
                                                                                                                                               paring opinions on the web. In WWW, pages 342–351. ACM, 2005.
                                                                                                                                          [10] J. Liu, L. Birnbaum, and B. Pardo. Spectrum: Retrieving different
 7.                     DISCUSSION                                                                                                             points of view from the blogosphere. In ICWSM, pages 114–121,
                                                                                                                                               2009.
 The problem considered in this paper is new, in the sense that it                                                                        [11] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima. Mining
 considers contradictions on the large scale, while taking time into                                                                           product reputations on the web. In KDD, pages 341–349, 2002.
 account (i.e., we consider the timestamps of the texts, as opposed to                                                                    [12] S. Pado, M.-C. de Marneffe, B. MacCartney, A. N. Rafferty, E. Yeh,
 treating the text collections as sets). An approach that relies upon                                                                          and C. D. Manning. Deciding entailment and contradiction with
 sentiment information and that exploits data engineering methods                                                                              stochastic and edit distance-based alignment. In TAC, 2008.
                                                                                                                                          [13] B. Pang and L. Lee. Opinion mining and sentiment analysis. Founda-
 to detect such contradictions in texts at a large scale has been intro-                                                                       tions and Trends in Information Retrieval, 2(1-2):1–135, 2008.
 duced and evaluated.                                                                                                                     [14] E. Riloff, J. Wiebe, and W. Phillips. Exploiting subjectivity classifi-
 The evaluation of our approach on various datasets proved its abil-                                                                           cation to improve information extraction. In AAAI, pages 1106–1111,
 ity of discriminating highly contradicting regions provided with a                                                                            2005.
 sequence of sentiments on some topic. Being scalable and com-                                                                            [15] S. Siersdorfer, S. Chelaru, W. Nejdl, and J. San Pedro. How useful are
 putationally efficient, it can serve as a preliminary step for more                                                                           your comments?: analyzing and predicting youtube comments and
                                                                                                                                               comment ratings. In WWW, pages 891–900. ACM, 2010.
 sophisticated contradiction analysis, identifying the most interest-                                                                     [16] M. Tsytsarau, T. Palpanas, and K. Denecke. Scalable discovery of
 ing points for further processing.                                                                                                            contradictions on the web. In WWW, pages 1195–1196, 2010.
 An important feature of our contradiction detection method is its                                                                        [17] I. Varlamis, V. Vassalos, and A. Palaios. Monitoring the evolution
 ability to operate on data with neutral sentiments. The contradic-                                                                            of interests in the blogosphere. In ICDE Workshops, pages 513–518,
 tion formula we propose shows almost the same performance with                                                                                2008.