=Paper=
{{Paper
|id=Vol-2411/paper8
|storemode=property
|title=Squabble: An Efficient, Scalable Controversy Classifier
|pdfUrl=https://ceur-ws.org/Vol-2411/paper8.pdf
|volume=Vol-2411
|authors=Shiri Dori-Hacohen, Elinor Brondwine,Jeremy Gollehon
|dblpUrl=https://dblp.org/rec/conf/sigir/Dori-HacohenBG19
}}
==Squabble: An Efficient, Scalable Controversy Classifier==
<pdf width="1500px">https://ceur-ws.org/Vol-2411/paper8.pdf</pdf>
<pre>
        Squabble: an efficient, scalable controversy classifier

                      Shiri Dori-Hacohen, Elinor Brondwine, and Jeremy Gollehon
                                               AuCoDe
                                     {firstname}@controversies.info


                                                                   Controversy Language Models. Jang et al. de-
                                                                scribed a framework for detecting controversy prob-
                       Abstract                                 abilistically [JFDHA16] and introduced a novel ap-
                                                                proach based on controversy language models (CLM).
    We introduce Squabble, an efficient, scalable               CLM evaluates whether a document better matches a
    API for classifying controversial text such as              controversy vs. a non-controversy (or background)
    news articles. Squabble is designed and im-                 language model, relying on the following compari-
    plemented in python for commercial purposes                 son: log P (D|C) − log P (D|N C) > α. Here, P (D|C)
    with industry best practices, which can be fol-             and P (D|N C) represent the probabilities that a docu-
    lowed by other researchers aiming to commer-                ment was generated from a controversial or a non-
    cialize their innovations. We demonstrate mul-              controversial unigram language model, respectively.
    tiple orders of magnitude speedup compared                  CLM can be estimated by constructing a collection
    to prior work, while retaining effectiveness.               of controversial documents; here, we refer to one
                                                                of construction approaches, the Wikipedia Contro-
                                                                versy Feature (WCF) [JFDHA16], which uses the
1    Introduction & Prior Work                                  top K Wikipedia articles that have high controversy
The last few years have seen growing interest in com-           scores. Once trained, the classifier no longer relies on
putational analysis of controversies (cf. [GDFMGM17,            Wikipedia for its success; rather, the language model
MZDC14]). Recent work demonstrated a clear link be-             trained from Wikipedia can be applied to any text,
tween controversies and disinformation, demonstrating           whether or not it discusses topics covered in Wikipedia.
that polarizing topics are more vulnerable to fake news,
and proposing controversy as a feature in prediction            2   The Squabble API
and classification of mis- and disinformation [VQSZ19],         As is common in research labs, the original CLM code in
highlighting the importance of classifying controversy.         Java was built with much attention paid to effectiveness,
Others explored the connection between controversy              but little paid to efficiency and organization. As an
and sentiment, highlighting that they are related but           early-stage startup, we wanted to provide our research
not interchangeable constructs [KLF18, MZDC14]. Re-             team the ability to iterate their models quickly on large
cent work generated unsupervised explanations of what           data sets without waiting days for answers - a situation
makes a topic controversial [KA19]. Social good and             that slowed our team down significantly. In order to
commercial applications of detecting controversy in-            bring CLM out of the research lab into a commercially
clude crisis management, public relations, and defense.         viable product that could handle large volumes of text
Consider, for example, that recent mass shootings and           from different genres, we built a production-ready sys-
terrorist attacks were preceded by activity on social           tem in Python, dubbed “Squabble”. We will describe
media referencing highly controversial matters1 .               two main elements relevant for efficiency: the technical
Copyright c 2019 for the individual papers by the papers’ au-
                                                                infrastructure and a redesign of the CLM approach.
thors. Copying permitted for private and academic purposes.        Technical infrastructure. We created a system
This volume is published and copyrighted by its editors.        that can ingest, filter, and store raw data from a wide
In: A. Aker, D. Albakour, A. Barrón-Cedeño, S. Dori-Hacohen,    variety of sources such as news and social media. As
M. Martinez, J. Stray, S. Tippmann (eds.): Proceedings of the   an initial testbed and to stress-test our infrastructure,
NewsIR’19 Workshop at SIGIR, Paris, France, 25-July-2019,
published at http://ceur-ws.org
                                                                we used a 10 year history of the Twitter Gardenhose
   1 https://www.npr.org/2019/03/15/703911997/the-role-         collection - a random 1% sample of all tweets from 2008-
social-media-plays-in-mass-shootings                            2018 [OBRS10]. We created a multi-threaded Python
    Figure 1: The Squabble API architecture. See Figure 2 for a zoomed in version of the “LM Generation” process.
                                                            Table 1: Before, After & Success Metrics for Squabble.
                                                            Infrastructure speeds reported on a per core basis on a
                                                            server. Controversy scoring reported on a dual core laptop.

                                                                                    Infrastructure            Controversy Scoring
 Figure 2: Procedure for generating Language Models.               Before           100 tweets/sec/core       7 requests/sec
                                                                   Success metric   1,000 tweets/sec/core     500 requests/sec
program storing data in a PostgreSQL database hosted               After            100,000 tweets/sec/core   700 requests/sec
                                                                   Speedup          1000x                     100x
on Amazon Web Services RDS system. We added a
component that extracts both the tweet text as well
                                                              Table 2: Dataset from prior work [DHA15] with key statis-
as any externally linked article text when a link was         tics
included in the tweet. We kept the filtering stage simple
yet flexible, accepting as input a text file with a list of                            # Docs (%)     Terms: Mean     Terms: Std
                                                                   Controversial       78 (25.75%)    828.43          1159.78
keywords or hashtags of interest. Tweets containing any
                                                                   Non-controversial   225 (74.25%)   367.1           564.11
of the keywords or tags are included in the database.              All                 303 (100%)     485.86          787.28
   CLM revisited . We reimplemented and refined
the controversy detection algorithm described in Sec-         the biggest core technological hurdle in the infrastruc-
tion 1, with system architecture is presented in Fig-         ture portion of our system. PostgreSQL’s concurrency
ure 1. We used established Python packages such as            behaviour couldn’t handle the amount of data being
NLTK [LB02] and Scikit-learn [PVG+ 11], and created           sent. Once the underlying issue was resolved, data
a research testbed in order to evaluate the Squabble          storage speed immediately increased by multiple or-
API. We describe evaluation details in Section 4.             ders of magnitude, not only meeting our success metric
   Squabble accepts data with a text stream via SQL           but also exceeding our most ambitious projections for
queries or CSV files. In pilot efforts, prospective cus-      speed, as seen in Table 1. Since this code was struc-
tomers sent data via large CSV files which we were able       tured for scalability, we can add multi-threading or
to ingest and run through Squabble rapidly. Prior to          multi-processing with no additional effort. As seen in
creating Squabble, such pilots would take days to run,        Table 1, we easily met our success metrics and achieved
slowing development down. Like our data processing            orders of magnitude speedup.
code, the Squabble code can likewise scale via multi-
threading. In addition, Squabble can be applied in a          4     Evaluation
wide variety of verticals, such as finance, defense, and      We leverage the dataset introduced by Dori-Hacohen
public relations. We constructed Squabble explicitly to       and Allan [DHA15] that consists of judgments for 303
allow for that possibility. As an early-stage startup, this   webpages2 from the ClueWeb09 collection3 , which is
also gives us flexibility to pivot easily should the need     presented in Table 2. The evaluation set is imbalanced,
arise, without expensive retooling of the technology.         in the sense that it contains more non-controversial
                                                              (225 documents) that controversial documents (78 doc-
3    Efficiency improvements                                  uments). Therefore, we focused on balanced accuracy
Prior to commencing this project, we set internal suc-        as our metric of choice against several baselines such as
cess metrics for efficiency (see Table 1) that we esti-       sentiment and Naive Bayes, and we also report other
mated would allow us to successfully process customer         metrics for completeness’ sake (see Table 3). Our re-
requests and internal research at scale for the fore-         sults with the WCF approach were in-line with prior
seeable future. Processing a massive data set into a              2 http://ciir.cs.umass.edu/downloads

structured database, on a budget, turned out to be                3 http://lemurproject.org/clueweb09/
Table 3: Classification scores for Squabble compared to             [Fol18]       John Foley. Explainable agreement through
several baselines. Squabble outperforms in all metrics evalu-                     simulation for tasks with subjective labels.
ated (other than recall, which a trivial baseline accomplishes                    arXiv preprint arXiv:1806.05004, 2018.
by definition).
                                                                    [GDFMGM17] Kiran Garimella, Gianmarco De Fran-
                          B. Acc.   R       Acc.    P       F1                 cisci Morales, Aristides Gionis, and Michael
     Squabble score       0.876     0.955   0.835   0.600   0.737              Mathioudakis. Reducing controversy by
     Sentiment            0.476     0.909   0.253   0.233   0.370              connecting opposing views. In Proceedings
     Random               0.545     0.727   0.451   0.267   0.390              of the Tenth ACM International Conference
     MultinomialNB        0.816     0.864   0.791   0.543   0.667
     All controversial    0.500     1.000   0.242   0.242   0.389              on Web Search and Data Mining, pages 81–
     None controversial   0.500     0.000   0.758   NaN     NaN                90. ACM, 2017.
                                                                    [JFDHA16]     Myungha Jang, John Foley, Shiri Dori-
work [JFDHA16], demonstrating reproducibility from
                                                                                  Hacohen, and James Allan. Probabilistic
the original paper4 , and also show that effectiveness                            approaches to controversy detection. In
was not sacrificed for the sake of efficiency.                                    Proceedings of the 25th ACM international
                                                                                  on conference on information and knowl-
5     Conclusion                                                                  edge management, pages 2069–2072. ACM,
                                                                                  2016.
We presented Squabble, a robust, commercially-viable
API for controversy classification, which is efficient and          [KA19]        Youngwoo Kim and James Allan. Unsu-
                                                                                  pervised explainable controversy detection
scalable. Squabble can be applied in a vertical-agnostic
                                                                                  from online news. In European Conference
manner. By reimplementing the controversy language
                                                                                  on Information Retrieval, pages 836–843.
model [JFDHA16] in python using industry best prac-                               Springer, 2019.
tices, we increased its efficiency by orders of magnitude,
                                                                    [KLF18]       Kateryna       Kaplun,        Christopher
without sacrificing effectiveness. Efficiency and scal-
                                                                                  Leberknight, and Anna Feldman. Con-
ability position Squabble to be used in commercial
                                                                                  troversy and sentiment: An exploratory
settings for a wide variety of applications. Its modu-                            study. In Proceedings of the 10th Hellenic
larity and research testbed allow for extensibility and                           Conference on Artificial Intelligence,
improvement in the future, as more effective methods                              page 37. ACM, 2018.
of classifying controversy are discovered.
                                                                    [LB02]        Edward Loper and Steven Bird. NLTK: the
   Limitations & Future Work . The dataset from                                   natural language toolkit. arXiv preprint
prior work is somewhat limited [DHA15]; length be-                                cs/0205028, 2002.
tween controversial and non-controversial documents
                                                                    [MZDC14]      Yelena Mejova, Amy X Zhang, Nicholas
is inconsistent (see Table 2). It is possible that
                                                                                  Diakopoulos, and Carlos Castillo. Contro-
CLM’s effectiveness improves by biasing longer docu-                              versy and sentiment in online news. arXiv
ments. Foley showed that AUC on this dataset has                                  preprint arXiv:1409.8152, 2014.
been maximized compared to its inter-annotator agree-
                                                                    [OBRS10]      Brendan O’Connor, Ramnath Balasubra-
ment [Fol18]. In future work, we hope to create new
                                                                                  manyan, Bryan R Routledge, and Noah A
ground truth data sets for controversy, and encourage                             Smith. From tweets to polls: Linking text
other researchers to do the same. Future work will                                sentiment to public opinion time series. In
scale our architecture to run concurrently on multiple                            Fourth International AAAI Conference on
servers and speed up controversy scoring further. More                            Weblogs and Social Media, 2010.
work is needed to understand the connection between                 [PVG+ 11]     Fabian Pedregosa, Gaël Varoquaux, Alexan-
controversy, mis- and dis- information.                                           dre Gramfort, Vincent Michel, Bertrand
   Acknowledgements. This material is based upon work                             Thirion, Olivier Grisel, Mathieu Blondel,
supported by the National Science Foundation under Grant                          Peter Prettenhofer, Ron Weiss, Vincent
No. 1819477. Any opinions, findings and conclusions or                            Dubourg, et al. Scikit-learn: Machine learn-
recommendations expressed in this material are those of                           ing in python. Journal of machine learning
the author(s) and do not necessarily reflect the views of the                     research, 12(Oct):2825–2830, 2011.
National Science Foundation.
                                                                    [VQSZ19]      Michela Del Vicario, Walter Quattrocioc-
                                                                                  chi, Antonio Scala, and Fabiana Zollo. Po-
References                                                                        larization and fake news: Early warning
[DHA15]           Shiri Dori-Hacohen and James Allan. Auto-                       of potential misinformation targets. ACM
                  mated controversy detection on the web. In                      Trans. Web, 13(2):10:1–10:22, March 2019.
                  European Conference on Information Re-
                  trieval, pages 423–434. Springer, 2015.
    4 Details omitted for space considerations.

</pre>