=Paper= {{Paper |id=Vol-2322/BigVis_3 |storemode=property |title=A User Centric Visual Analytics Framework for News Discussions |pdfUrl=https://ceur-ws.org/Vol-2322/BigVis_3.pdf |volume=Vol-2322 |authors=Jakob Smedegaard Andersen |dblpUrl=https://dblp.org/rec/conf/edbt/Andersen19 }} ==A User Centric Visual Analytics Framework for News Discussions== https://ceur-ws.org/Vol-2322/BigVis_3.pdf
A User Centric Visual Analytics Framework for News Discussions
                                                     Jakob Smedegaard Andersen
                                                      Department of Computer Science
                                                              HAW Hamburg
                                                            Hamburg, Germany
                                                     jakob.andersen@haw-hamburg.de

ABSTRACT                                                                  2 PRELIMINARIES
Visual Analytics has achieved a lot of attention for its abilities to     2.1 Visual Analytics
support exploratory knowledge discovery in large data sets. In
                                                                          This section shortly introduces the central concepts of VA.
this work-in-progress paper, we develop a Visual Analytics frame-
                                                                             VA is defined as "the science of analytical reasoning facilitated
work for user comments in the domain of online journalism. First,
                                                                          by interactive visual interfaces" [2]. By combining methods from
we examine how journalists’ needs can be mapped to a visual in-
                                                                          IV with ML and other automated techniques, VA seeks to improve
teractive interface to make sense of user comments. We further
                                                                          the process of knowledge discovery out of complex structured and
investigate how different classes of Machine Learning algorithms
                                                                          unstructured data [9]. This approach is gaining more and more
like supervised and unsupervised learning can be integrated into
                                                                          importance due its abilities of integrating human knowledge into
Visual Analytics to enable a more user centric analysis. Due to the
                                                                          computational data processing. VA enables explorative data analyt-
variety of Machine Learning approaches, we expect that different
                                                                          ics in large scale data scenarios. Subject is the effective acquisition,
forms of integration will be needed. Our goal is to place the domain
                                                                          expansion and generation of knowledge to finally make better de-
experts (e.g. journalists) in the loop to improve analytical reasoning.
                                                                          cisions. Within VA the user takes an active role, as he or she steers
                                                                          and supervises the analysis. The interaction becomes the crucial
1   INTRODUCTION                                                          part in which the user communicates his or her knowledge. Sev-
As data accessibility increases analytical methods and techniques         eral approaches have emerged effectively combining the strengths
to handle the data become more important. Visual Analytics (VA)           of human cognition and machine processing [5, 8, 10]. However,
is a novel approach to gain insights from heterogeneous and un-           further research regarding the user centric coupling of ML and
structured data [3, 9]. The basic concept of VA is to combine the         interactive interfaces is needed.
processing capabilities of machines with human abilities of pattern
detection to overcome the flaws of pure analytical or visual ap-
proaches. Interactive Visualisation (IV) is used to bridge these parts    2.2    The Human in the Loop Paradigm
together and to enable a more user centric discourse with data. The       In the early stages of ML the overall question was, "how to construct
goal of VA is to provide tools to effectively gain knowledge out of       computer programs that automatically improve with experience"
data for better decision-making.                                          [12]. However, fully automated ML (aML) is not applicable for all
   In order to create such tools for big data scenarios, a precise        real world scenarios. Pure automatic approaches presuppose a good
understanding of the coupling of Machine Learning (ML) and IV is          understanding of the problem to achieve beneficial results. They
required. As not much work is done on how to modify and steer ML          are unsuitable for ill-defined or a priori undefined questions or if
methods through interactive interfaces, we focus on user centric          needed training data is not available. VA addresses tasks which
possibilities to adapt ML algorithms in the progress of the analysis.     are explorative in nature [3]. A more seamless approach is needed
   In this paper, we investigate how different types of ML like           which fits into the existing interactive process.
supervised and unsupervised learning as well as specific methods             A ML approach which can utilise domain knowledge is described
like clustering, classification, regression and dimension reduction       by the phrase “Human in the Loop” (HitL). HitL is a special case
can be integrated in VA. Differences are expected, because of the         of interactive ML (iML) and can be defined as algorithms that can
diversity of those approaches. We use our findings to create a VA         optimize their learning behaviour through the interaction with
framework for user comments in the domain of online journalism.           humans [7]. The human is directly integrated in the train, tune and
   First, we shortly introduce the VA approach, the concept of “Hu-       test phase of the algorithm to obtain a higher quality of results.
man in the Loop” and the domain of making sense of news discus-           Although ML is a central component of VA, the integration of HitL
sions. Secondly, we present our VA Framework for user comments.           is not much investigated.
Hereafter, we argue in section 3.2 for a "Human in the Loop" ap-             The primary advantage of HitL is the ability to reach inside the
proach in VA in order to improve our framework. Then, we outline          models black box. The approach enables the advanced use of the
questions for our upcoming research. Finally, we describe related         human knowledge and expertise inside a continuous feedback loop.
work and end up with our conclusion.                                      User interaction empowers model steering and is not limited to
                                                                          model selection and parameterization. However, the HitL approach
 © 2019 Copyright held by the author(s). Published in the Workshop        rises new challenges in getting the most out of the participation
Proceedings of the EDBT/ICDT 2019 Joint Conference (March 26, 2019,       with humans. Novel approaches are needed to make this interplay
Lisbon, Portugal) on CEUR-WS.org.                                         more intuitive and beneficial.
   A typical use case or HitL is supervised learning. In this setting,
the goal is to learn a mapping between X and Y given a set of
training pairs (x i , yi ), whereas x i ∈ X are called examples and yi ∈
Y are labels for i ∈ {1, ..., n}. A typical supervised HitL approach is
to increase n as the number of example pairs to gain more accuracy
through enhancing the grounded truth. However, HitL can also
be applied to unsupervised learning settings. Here only a set of n
examples X = {x 1, ..., x n } is given. The goal is to find interesting
structures in the examples X . By applying HitL the problem can
be transformed into a semi-supervised learning problem if a user
provides l labels for some examples in X . The examples get divided
in X l := {x 1, ..., xl } for which the labels Yl := {y1, ..., yl } are given
and Xu := {xl +1, ..., x n } as the set with no labels. In this case, the
training pairs can be used to dynamically guide the computation
by constraints or additional information. See Capelle et al. [1] for
details.

2.3     User Feedback in News Discussions
This section introduces and motivates the domain of analysing
user feedback in news discussions. User feedbacks refers to various
forms of user participation regarding journalistic contents. The
focus lies on textual comments from multiple channels like news
webpages, emails and social media.                                                             Figure 1: The Article Selection view.
   There is a high demand of gaining insights from user comments,
as they are a valuable source of information. They can contain
useful aspects like feedback, critics, new perspectives and expertise.          3.1     Visual Analytics Framework
Furthermore, comments mirror personal opinions which are nor-                   We have developed a fully functional VA framework for user com-
mally hard to capture [13]. On the backside, comments can include               ments. The framework covers the needs of journalists. Our research
insults, hustles and advertisements which can negatively affect the             builds upon the findings of Loosen et al. [11]. They propose require-
overall quality of the discussion.                                              ments for an analytics tool in the field of user comments which
   Studies show that user comments support the daily work of                    covers the needs of journalists. The aim of our work is develop and
journalists and editors [16]. This includes the obtaining of new                further examine these findings in a VA tool. The primary question
ideas for further articles, additional facts and direct feedback for            is how to map the analytical requirements into a suitable combina-
improving the work of journalists. Thus, observing user comments                tion of visualisations and interactions to fulfil journalists’ needs. As
has clearly a positive value. However, the heterogeneity and large              our data collection, we use a set of pre annotated user comments.
volume raises a number of challenges, such as moderation overhead               Our framework consists of the following analytical dimensions: See
costs and overview of the current state of the discussion [11].                 Loosen et al. [11] for further details.
   Consequently, newsrooms are faced with an increasing demand                        • Article Selection
for computer supported ways to analyse, aggregate and visualize                         The occurrence of comments in time/progress of a discus-
user comments. Unfortunately, there is a lack of analytical tools to                    sion. The user can select samples of articles from which the
provide high quality comments that can be leveraged for journalistic                    comments are analysed.
purposes [11].                                                                        • Topics and Addressees
   From a data science point of view, user comments are documents                       What is discussed and who is mentioned and directly ad-
with heterogeneous information contexts and several connections                         dressed in a sample of user comments.
between each other. They consist of multiple attributes like a com-                   • Discussion and Argumentation
menter identification, the related article, a timestamp, a title, a                     The direction of a discussion and risen arguments over the
ranking from other readers and several annotations from manually                        time. The user can analyse the development of pro- and
or automated classifications like sentiment analysis and swearword                      contra-arguments towards a certain question or topic over
detection.                                                                              time.
                                                                                      • Quality
3     MAKING SENSE OF USER COMMENTS                                                     Metrics to quantify the quality of the user comments to offer
As the manual analysis of user comments is resource consuming and                       a condensed overview.
unpractical with a rising volume, velocity and variety of user com-                   • Selected User Comments
ments, various researchers focus on approaches to detect patterns                       A close read function for selected user comments.
automatically [6, 19]. In order to enable better analytical reasoning,            Each dimension is implemented as a separate view within a web
we constructed a VA framework for annotated user comments.                      application. Views supply a set of visualisations and interactions to
                                                                                           Figure 3: The Quality view.
     Figure 2: The Discussion and Argumentation view.



enable and support an analytical discourse with the user. The views
are coordinated and implement different interaction strategies like
                                                                        3.2    From Interactive Filtering to Human in the
selections, filtering, focus+context and linking+brushing. Every view          Loop
is an optional part of the analysis. The user decides dynamically,      In our first prototype the processing capabilities of VA is not ex-
which views he or she wants to use. The layout of the views are         hausted. One drawback of our prototype is the lack of interactive
based on a “filter flow” metaphor. The views possess a specified        model adaptation. The prototype lacks in integrating users’ exper-
ordering in which they are placed on the screen. Changes inside a       tise and experience in the analysis. The current interaction takes
view are only forwarded to subsequent views. The user is able to        place as a sequence of selections, overviews and filter operations.
filter in arbitrary order and there are no top-down restrictions.       The integration of ML is limited to pre-processing.
    Figure 1 shows the Article Selection view. The upper part of the       To enable HitL and interactive model steering and thus exploit
figure depicts the distribution of comments over time, whereas          the possibilities of VA, we extend our prototype with interactive
the lower part offers several selection options. Comments can be        ML components. We have to develop specific use cases that include
selected individually or grouped by sections, topics and authors.       HitL approach, which can also be used in a more general form
The Discussion and Argumentation view is depicted in figure 2. After    across domains.
the user selected a stance or topic the distribution of the for and        The question arises why it could be beneficial to place the jour-
against comments are shown in a line-graph. Related arguments           nalist in the loop. At first, our framework is based on several classifi-
are listed in the bottom part. Furthermore, selected comments are       cation models that provide annotations for interactive visualisation.
plotted regarding to their sentiment and user ratings. The grey         Since these algorithms are based on simplified models of reality,
boxes relate to filtering operation and only comments within are        misclassifications are to be expected. The misclassification likeli-
considered. Figure 3 illustrates the Quality view. User comments are    hood can be quantified by relative error frequencies. By using HitL,
represented as a set of different indicators. In the upper part user    the user can interactively correct spotted misclassifications and
comments are categorised along the dimensions article reference,        initiate new training cycles to improve the model’s accuracy, which
compliance and originality. For further analysis, each of the above     affect the visual correctness.
selected comments are than depicted as polylines inside parallel           Secondly, HitL opens up new possibilities of individualisation. It
coordinates. As indicators e.g. the length, sentiment, or number of     is difficult to satisfy the desire for customizable queries through a
references are used.                                                    predefined set of filter operations. HitL makes it possible to dynam-
    Another central part in our investigations is the evaluation of     ically select data that are of special interest. The user can be offered
our prototype. We conduct a quantitative usability study after the      opportunities to create and apply generic classification models at
end of the first implementation cycle to assess the overall system      runtime.
and to counteract any weak points. Seven participants are observed         Furthermore, many ML algorithms work on restricted informa-
while solving real world problems with our prototype. In addition,      tion basis. There is only a limited amount of data available, only a
they answer a questionnaire about the usability. The findings are       sample is used for computation or the algorithms are incorrectly
used for improvements and further requirements. The evaluation          configured. This leads to different understandings of similarities or
also reveals that the participants unconditionally trust the results.   weights between human expectations and computational results.
This is not surprising as the tool does not show the accuracy of the    HitL makes it possible through experience of the users to correct
analytical results.                                                     these deviations and thus to improve the results.
4   NEXT STEPS                                                         within an iterative process, how these adaptations can be visual
For our next steps to integrate different ML methods in VA, the        translated and mapped to a visual metaphor as well as how oc-
following questions arises:                                            curred uncertainties can be communicated and constructively used.
                                                                       A broader understanding of the coupling of ML and IV is necessary
RQ1: To what extent can different ML methods be adapted within
                                                                       to fully exploit the strengths of VA.
        an iterative process?
A taxonomy for user centric adaptations of ML methods will be          ACKNOWLEDGMENTS
developed to guide the development of VA applications. The inte-
                                                                       The paper was supported by BWFG Hamburg within the “Forum
gration of HitL aspects is carried out with regard to the difference
                                                                       4.0” project as part of the ahoi.digital funding line.
of supervised and unsupervised approaches. In addition, specific
methods like classification, regression, clustering and dimension
                                                                       REFERENCES
reductions will be considered.                                          [1] Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised
RQ2: How can model adaptations and responses be translated and              learning. MIT Press.
                                                                        [2] Kristin A. Cook and James J. Thomas. 2005. Illuminating the path: The research
        mapped into a visual metaphor?                                      and development agenda for visual analytics. IEEE Computer Society.
It is a difficult task to steer computational models to match expec-    [3] Geoffrey Ellis and Florian Mansmann. 2010. Mastering the information age solving
                                                                            problems with visual analytics. Taylor & Francis Group.
tations. There is a gap between computational processing on the         [4] Alex Endert, Patrick Fiaux, and Chris North. 2012. Semantic interaction for
machine side and cognition on the human side that can lead to hard          visual text analytics. In Proceedings of the SIGCHI conference on Human factors in
usability problems [17]. The user has to translate his or her mental        computing systems. ACM, 473–482.
                                                                        [5] Alex Endert, William Ribarsky, Cagatay Turkay, BL William Wong, Ian Nabney,
model into numeric variables to steer the computation. The chal-            I Díaz Blanco, and Fabrice Rossi. 2017. The state of the art in integrating machine
lenge is to provide a visual layer which abstracts the computational        learning into visual analytics. In Computer Graphics Forum, Vol. 36. Wiley Online
perspective, as pretty much every journalist (or end user) do not           Library, 458–486.
                                                                        [6] Marlo Häring, Wiebke Loosen, and Walid Maalej. 2018. Who is addressed in
want to deal with ML details.                                               this comment?: Automatically classifying meta-comments in news comments.
                                                                            Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 67.
RQ3: How can the uncertainties of ML method be visually com-            [7] Andreas Holzinger. 2016. Interactive machine learning for health informatics:
        municated and utilised in a constructive manner?                    When do we need the human-in-the-loop? Brain Informatics 3, 2 (2016), 119–131.
                                                                        [8] Liu Jiang, Shixia Liu, and Changjian Chen. 2018. Recent research advances on
Uncertainty is created and communicated over the complete VA pro-           interactive machine learning. Journal of Visualization (2018), 1–17.
cess [15]. We will focus on uncertainties created by ML algorithms.     [9] Daniel A. Keim, Florian Mansmann, Jörn Schneidewind, Jim Thomas, and Hart-
There exist several quality measures which quantify the accuracy            mut Ziegler. 2008. Visual Data Mining. In Theory, techniques and tools for visual
                                                                            analytics. Springer, Chapter Visual analytics: Scope and challenges, 76–90.
of ML algorithms. We want to discover how these uncertainties can      [10] Shixia Liu, Xiting Wang, Mengchen Liu, and Jun Zhu. 2017. Towards better
be communicated in an appropriate manner to create awareness.               analysis of machine learning models: A visual analytics perspective. Visual
Furthermore, we want to utilize the uncertainty in combination              Informatics 1, 1 (2017), 48–56.
                                                                       [11] Wiebke Loosen, Marlo Häring, Zijad Kurtanović, Lisa Merten, Julius Reimer, Lies
with the HitL approach to reduce misclassifications.                        van Roessel, and Walid Maalej. 2018. Making sense of user comments: Identifying
    In order to answer these questions, we examine the integration          journalists’ requirements for a comment analysis framework. SCM Studies in
                                                                            Communication and Media 6, 4 (2018), 333–364.
of ML and IV from a user centric point of view. To support the suit-   [12] Tom M. Mitchell. 1997. Machine learning. McGraw-Hill.
ability of our findings, we develop and evaluate several prototypes    [13] Deokgun Park, Simranjit Sachar, Nicholas Diakopoulos, and Niklas Elmqvist.
in the domain of news discussions. A goal is to cover different ML          2016. Supporting comment moderators in identifying high quality online news
                                                                            comments. In Proceedings of the 2016 CHI Conference on Human Factors in Com-
approaches. We will carry out quantitative usability tests with rep-        puting Systems (CHI ’16). ACM, 1114–1125.
resentatives to spot and document strengths and weaknesses. The        [14] Dominik Sacha, Michael Sedlmair, Leishi Zhang, John A. Lee, Jaakko Peltonen,
observation will take place inside our usability lab which consists         Daniel Weiskopf, Stephen C. North, and Daniel A. Keim. 2017. What you see is
                                                                            what you can change: Human-centered machine learning by interactive visual-
of eye-tracers, cameras, screen captures and key loggers.                   ization. Neurocomputing 268 (2017), 164–175.
                                                                       [15] Dominik Sacha, Hansi Senaratne, Bum C. Kwon, Geoffrey Ellis, and Daniel A.
                                                                            Keim. 2016. The role of uncertainty, awareness, and trust in visual analytics.
5   RELATED WORK                                                            IEEE Transactions on Visualization and Computer Graphics 22, 1 (2016), 240–249.
In [18, 20] the authors present VA tools to make sense of text         [16] Arthur D. Santana. 2011. Online readers’ comments represent new opinion
                                                                            pipeline. Newspaper Research Journal 32, 3 (2011), 66–81.
collections. In contrast to our approach they do not place the human   [17] Jessica Z. Self, Radha K. Vinayagam, J. T. Fry, and Chris North. 2016. Bridging the
in the loop and rely heavily on pre-processing. Our modelling of            gap between user intention and model parameters for human-in-the-loop data
the human interactions with ML is similar to [14], but they do              analytics. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics
                                                                            (HILDA ’16). ACM, New York, NY, USA, Article 3, 6 pages.
not distinguish between various ML strategies like supervised and      [18] John Stasko, Carsten Görg, and Zhicheng Liu. 2008. Jigsaw: Supporting inves-
unsupervised learning. Additionally, [4] is related to our work as          tigative analysis through interactive visualization. Information Visualization 7, 2
                                                                            (2008), 118–132.
they come up with a novel principle for analytical interactions        [19] Gregor Wiedemann, Eugen Ruppert, Raghav Jindal, and Chris Biemann. 2018.
called semantic interactions. We will build upon these findings to          Transfer learning from LDA to BiLSTM-CNN for offensive language detection in
provide interactions that derive from the user’s analytic process.          twitter. Austrian Academy of Sciences, Vienna September 21, 2018 (2018), 85–94.
                                                                       [20] Yi Yang, Quanming Yao, and Huamin Qu. 2017. VISTopic: A visual analytics
                                                                            system for making sense of large document collections using hierarchical topic
6   CONCLUSION                                                              modeling. Visual Informatics 1, 1 (2017), 40 – 47.
We have developed a VA tool for user comments in online jour-
nalism and have outlined next steps for a user centric integration
of ML in our prototype. The next steps cover how different ML
methods like supervised and unsupervised learning can be adapted