=Paper=
{{Paper
|id=Vol-2322/BigVis_3
|storemode=property
|title=A User Centric Visual Analytics Framework for News Discussions
|pdfUrl=https://ceur-ws.org/Vol-2322/BigVis_3.pdf
|volume=Vol-2322
|authors=Jakob Smedegaard Andersen
|dblpUrl=https://dblp.org/rec/conf/edbt/Andersen19
}}
==A User Centric Visual Analytics Framework for News Discussions==
A User Centric Visual Analytics Framework for News Discussions Jakob Smedegaard Andersen Department of Computer Science HAW Hamburg Hamburg, Germany jakob.andersen@haw-hamburg.de ABSTRACT 2 PRELIMINARIES Visual Analytics has achieved a lot of attention for its abilities to 2.1 Visual Analytics support exploratory knowledge discovery in large data sets. In This section shortly introduces the central concepts of VA. this work-in-progress paper, we develop a Visual Analytics frame- VA is defined as "the science of analytical reasoning facilitated work for user comments in the domain of online journalism. First, by interactive visual interfaces" [2]. By combining methods from we examine how journalists’ needs can be mapped to a visual in- IV with ML and other automated techniques, VA seeks to improve teractive interface to make sense of user comments. We further the process of knowledge discovery out of complex structured and investigate how different classes of Machine Learning algorithms unstructured data [9]. This approach is gaining more and more like supervised and unsupervised learning can be integrated into importance due its abilities of integrating human knowledge into Visual Analytics to enable a more user centric analysis. Due to the computational data processing. VA enables explorative data analyt- variety of Machine Learning approaches, we expect that different ics in large scale data scenarios. Subject is the effective acquisition, forms of integration will be needed. Our goal is to place the domain expansion and generation of knowledge to finally make better de- experts (e.g. journalists) in the loop to improve analytical reasoning. cisions. Within VA the user takes an active role, as he or she steers and supervises the analysis. The interaction becomes the crucial 1 INTRODUCTION part in which the user communicates his or her knowledge. Sev- As data accessibility increases analytical methods and techniques eral approaches have emerged effectively combining the strengths to handle the data become more important. Visual Analytics (VA) of human cognition and machine processing [5, 8, 10]. However, is a novel approach to gain insights from heterogeneous and un- further research regarding the user centric coupling of ML and structured data [3, 9]. The basic concept of VA is to combine the interactive interfaces is needed. processing capabilities of machines with human abilities of pattern detection to overcome the flaws of pure analytical or visual ap- proaches. Interactive Visualisation (IV) is used to bridge these parts 2.2 The Human in the Loop Paradigm together and to enable a more user centric discourse with data. The In the early stages of ML the overall question was, "how to construct goal of VA is to provide tools to effectively gain knowledge out of computer programs that automatically improve with experience" data for better decision-making. [12]. However, fully automated ML (aML) is not applicable for all In order to create such tools for big data scenarios, a precise real world scenarios. Pure automatic approaches presuppose a good understanding of the coupling of Machine Learning (ML) and IV is understanding of the problem to achieve beneficial results. They required. As not much work is done on how to modify and steer ML are unsuitable for ill-defined or a priori undefined questions or if methods through interactive interfaces, we focus on user centric needed training data is not available. VA addresses tasks which possibilities to adapt ML algorithms in the progress of the analysis. are explorative in nature [3]. A more seamless approach is needed In this paper, we investigate how different types of ML like which fits into the existing interactive process. supervised and unsupervised learning as well as specific methods A ML approach which can utilise domain knowledge is described like clustering, classification, regression and dimension reduction by the phrase “Human in the Loop” (HitL). HitL is a special case can be integrated in VA. Differences are expected, because of the of interactive ML (iML) and can be defined as algorithms that can diversity of those approaches. We use our findings to create a VA optimize their learning behaviour through the interaction with framework for user comments in the domain of online journalism. humans [7]. The human is directly integrated in the train, tune and First, we shortly introduce the VA approach, the concept of “Hu- test phase of the algorithm to obtain a higher quality of results. man in the Loop” and the domain of making sense of news discus- Although ML is a central component of VA, the integration of HitL sions. Secondly, we present our VA Framework for user comments. is not much investigated. Hereafter, we argue in section 3.2 for a "Human in the Loop" ap- The primary advantage of HitL is the ability to reach inside the proach in VA in order to improve our framework. Then, we outline models black box. The approach enables the advanced use of the questions for our upcoming research. Finally, we describe related human knowledge and expertise inside a continuous feedback loop. work and end up with our conclusion. User interaction empowers model steering and is not limited to model selection and parameterization. However, the HitL approach © 2019 Copyright held by the author(s). Published in the Workshop rises new challenges in getting the most out of the participation Proceedings of the EDBT/ICDT 2019 Joint Conference (March 26, 2019, with humans. Novel approaches are needed to make this interplay Lisbon, Portugal) on CEUR-WS.org. more intuitive and beneficial. A typical use case or HitL is supervised learning. In this setting, the goal is to learn a mapping between X and Y given a set of training pairs (x i , yi ), whereas x i ∈ X are called examples and yi ∈ Y are labels for i ∈ {1, ..., n}. A typical supervised HitL approach is to increase n as the number of example pairs to gain more accuracy through enhancing the grounded truth. However, HitL can also be applied to unsupervised learning settings. Here only a set of n examples X = {x 1, ..., x n } is given. The goal is to find interesting structures in the examples X . By applying HitL the problem can be transformed into a semi-supervised learning problem if a user provides l labels for some examples in X . The examples get divided in X l := {x 1, ..., xl } for which the labels Yl := {y1, ..., yl } are given and Xu := {xl +1, ..., x n } as the set with no labels. In this case, the training pairs can be used to dynamically guide the computation by constraints or additional information. See Capelle et al. [1] for details. 2.3 User Feedback in News Discussions This section introduces and motivates the domain of analysing user feedback in news discussions. User feedbacks refers to various forms of user participation regarding journalistic contents. The focus lies on textual comments from multiple channels like news webpages, emails and social media. Figure 1: The Article Selection view. There is a high demand of gaining insights from user comments, as they are a valuable source of information. They can contain useful aspects like feedback, critics, new perspectives and expertise. 3.1 Visual Analytics Framework Furthermore, comments mirror personal opinions which are nor- We have developed a fully functional VA framework for user com- mally hard to capture [13]. On the backside, comments can include ments. The framework covers the needs of journalists. Our research insults, hustles and advertisements which can negatively affect the builds upon the findings of Loosen et al. [11]. They propose require- overall quality of the discussion. ments for an analytics tool in the field of user comments which Studies show that user comments support the daily work of covers the needs of journalists. The aim of our work is develop and journalists and editors [16]. This includes the obtaining of new further examine these findings in a VA tool. The primary question ideas for further articles, additional facts and direct feedback for is how to map the analytical requirements into a suitable combina- improving the work of journalists. Thus, observing user comments tion of visualisations and interactions to fulfil journalists’ needs. As has clearly a positive value. However, the heterogeneity and large our data collection, we use a set of pre annotated user comments. volume raises a number of challenges, such as moderation overhead Our framework consists of the following analytical dimensions: See costs and overview of the current state of the discussion [11]. Loosen et al. [11] for further details. Consequently, newsrooms are faced with an increasing demand • Article Selection for computer supported ways to analyse, aggregate and visualize The occurrence of comments in time/progress of a discus- user comments. Unfortunately, there is a lack of analytical tools to sion. The user can select samples of articles from which the provide high quality comments that can be leveraged for journalistic comments are analysed. purposes [11]. • Topics and Addressees From a data science point of view, user comments are documents What is discussed and who is mentioned and directly ad- with heterogeneous information contexts and several connections dressed in a sample of user comments. between each other. They consist of multiple attributes like a com- • Discussion and Argumentation menter identification, the related article, a timestamp, a title, a The direction of a discussion and risen arguments over the ranking from other readers and several annotations from manually time. The user can analyse the development of pro- and or automated classifications like sentiment analysis and swearword contra-arguments towards a certain question or topic over detection. time. • Quality 3 MAKING SENSE OF USER COMMENTS Metrics to quantify the quality of the user comments to offer As the manual analysis of user comments is resource consuming and a condensed overview. unpractical with a rising volume, velocity and variety of user com- • Selected User Comments ments, various researchers focus on approaches to detect patterns A close read function for selected user comments. automatically [6, 19]. In order to enable better analytical reasoning, Each dimension is implemented as a separate view within a web we constructed a VA framework for annotated user comments. application. Views supply a set of visualisations and interactions to Figure 3: The Quality view. Figure 2: The Discussion and Argumentation view. enable and support an analytical discourse with the user. The views are coordinated and implement different interaction strategies like 3.2 From Interactive Filtering to Human in the selections, filtering, focus+context and linking+brushing. Every view Loop is an optional part of the analysis. The user decides dynamically, In our first prototype the processing capabilities of VA is not ex- which views he or she wants to use. The layout of the views are hausted. One drawback of our prototype is the lack of interactive based on a “filter flow” metaphor. The views possess a specified model adaptation. The prototype lacks in integrating users’ exper- ordering in which they are placed on the screen. Changes inside a tise and experience in the analysis. The current interaction takes view are only forwarded to subsequent views. The user is able to place as a sequence of selections, overviews and filter operations. filter in arbitrary order and there are no top-down restrictions. The integration of ML is limited to pre-processing. Figure 1 shows the Article Selection view. The upper part of the To enable HitL and interactive model steering and thus exploit figure depicts the distribution of comments over time, whereas the possibilities of VA, we extend our prototype with interactive the lower part offers several selection options. Comments can be ML components. We have to develop specific use cases that include selected individually or grouped by sections, topics and authors. HitL approach, which can also be used in a more general form The Discussion and Argumentation view is depicted in figure 2. After across domains. the user selected a stance or topic the distribution of the for and The question arises why it could be beneficial to place the jour- against comments are shown in a line-graph. Related arguments nalist in the loop. At first, our framework is based on several classifi- are listed in the bottom part. Furthermore, selected comments are cation models that provide annotations for interactive visualisation. plotted regarding to their sentiment and user ratings. The grey Since these algorithms are based on simplified models of reality, boxes relate to filtering operation and only comments within are misclassifications are to be expected. The misclassification likeli- considered. Figure 3 illustrates the Quality view. User comments are hood can be quantified by relative error frequencies. By using HitL, represented as a set of different indicators. In the upper part user the user can interactively correct spotted misclassifications and comments are categorised along the dimensions article reference, initiate new training cycles to improve the model’s accuracy, which compliance and originality. For further analysis, each of the above affect the visual correctness. selected comments are than depicted as polylines inside parallel Secondly, HitL opens up new possibilities of individualisation. It coordinates. As indicators e.g. the length, sentiment, or number of is difficult to satisfy the desire for customizable queries through a references are used. predefined set of filter operations. HitL makes it possible to dynam- Another central part in our investigations is the evaluation of ically select data that are of special interest. The user can be offered our prototype. We conduct a quantitative usability study after the opportunities to create and apply generic classification models at end of the first implementation cycle to assess the overall system runtime. and to counteract any weak points. Seven participants are observed Furthermore, many ML algorithms work on restricted informa- while solving real world problems with our prototype. In addition, tion basis. There is only a limited amount of data available, only a they answer a questionnaire about the usability. The findings are sample is used for computation or the algorithms are incorrectly used for improvements and further requirements. The evaluation configured. This leads to different understandings of similarities or also reveals that the participants unconditionally trust the results. weights between human expectations and computational results. This is not surprising as the tool does not show the accuracy of the HitL makes it possible through experience of the users to correct analytical results. these deviations and thus to improve the results. 4 NEXT STEPS within an iterative process, how these adaptations can be visual For our next steps to integrate different ML methods in VA, the translated and mapped to a visual metaphor as well as how oc- following questions arises: curred uncertainties can be communicated and constructively used. A broader understanding of the coupling of ML and IV is necessary RQ1: To what extent can different ML methods be adapted within to fully exploit the strengths of VA. an iterative process? A taxonomy for user centric adaptations of ML methods will be ACKNOWLEDGMENTS developed to guide the development of VA applications. The inte- The paper was supported by BWFG Hamburg within the “Forum gration of HitL aspects is carried out with regard to the difference 4.0” project as part of the ahoi.digital funding line. of supervised and unsupervised approaches. In addition, specific methods like classification, regression, clustering and dimension REFERENCES reductions will be considered. [1] Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised RQ2: How can model adaptations and responses be translated and learning. MIT Press. [2] Kristin A. Cook and James J. Thomas. 2005. Illuminating the path: The research mapped into a visual metaphor? and development agenda for visual analytics. IEEE Computer Society. It is a difficult task to steer computational models to match expec- [3] Geoffrey Ellis and Florian Mansmann. 2010. Mastering the information age solving problems with visual analytics. Taylor & Francis Group. tations. There is a gap between computational processing on the [4] Alex Endert, Patrick Fiaux, and Chris North. 2012. Semantic interaction for machine side and cognition on the human side that can lead to hard visual text analytics. In Proceedings of the SIGCHI conference on Human factors in usability problems [17]. The user has to translate his or her mental computing systems. ACM, 473–482. [5] Alex Endert, William Ribarsky, Cagatay Turkay, BL William Wong, Ian Nabney, model into numeric variables to steer the computation. The chal- I Díaz Blanco, and Fabrice Rossi. 2017. The state of the art in integrating machine lenge is to provide a visual layer which abstracts the computational learning into visual analytics. In Computer Graphics Forum, Vol. 36. Wiley Online perspective, as pretty much every journalist (or end user) do not Library, 458–486. [6] Marlo Häring, Wiebke Loosen, and Walid Maalej. 2018. Who is addressed in want to deal with ML details. this comment?: Automatically classifying meta-comments in news comments. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 67. RQ3: How can the uncertainties of ML method be visually com- [7] Andreas Holzinger. 2016. Interactive machine learning for health informatics: municated and utilised in a constructive manner? When do we need the human-in-the-loop? Brain Informatics 3, 2 (2016), 119–131. [8] Liu Jiang, Shixia Liu, and Changjian Chen. 2018. Recent research advances on Uncertainty is created and communicated over the complete VA pro- interactive machine learning. Journal of Visualization (2018), 1–17. cess [15]. We will focus on uncertainties created by ML algorithms. [9] Daniel A. Keim, Florian Mansmann, Jörn Schneidewind, Jim Thomas, and Hart- There exist several quality measures which quantify the accuracy mut Ziegler. 2008. Visual Data Mining. In Theory, techniques and tools for visual analytics. Springer, Chapter Visual analytics: Scope and challenges, 76–90. of ML algorithms. We want to discover how these uncertainties can [10] Shixia Liu, Xiting Wang, Mengchen Liu, and Jun Zhu. 2017. Towards better be communicated in an appropriate manner to create awareness. analysis of machine learning models: A visual analytics perspective. Visual Furthermore, we want to utilize the uncertainty in combination Informatics 1, 1 (2017), 48–56. [11] Wiebke Loosen, Marlo Häring, Zijad Kurtanović, Lisa Merten, Julius Reimer, Lies with the HitL approach to reduce misclassifications. van Roessel, and Walid Maalej. 2018. Making sense of user comments: Identifying In order to answer these questions, we examine the integration journalists’ requirements for a comment analysis framework. SCM Studies in Communication and Media 6, 4 (2018), 333–364. of ML and IV from a user centric point of view. To support the suit- [12] Tom M. Mitchell. 1997. Machine learning. McGraw-Hill. ability of our findings, we develop and evaluate several prototypes [13] Deokgun Park, Simranjit Sachar, Nicholas Diakopoulos, and Niklas Elmqvist. in the domain of news discussions. A goal is to cover different ML 2016. Supporting comment moderators in identifying high quality online news comments. In Proceedings of the 2016 CHI Conference on Human Factors in Com- approaches. We will carry out quantitative usability tests with rep- puting Systems (CHI ’16). ACM, 1114–1125. resentatives to spot and document strengths and weaknesses. The [14] Dominik Sacha, Michael Sedlmair, Leishi Zhang, John A. Lee, Jaakko Peltonen, observation will take place inside our usability lab which consists Daniel Weiskopf, Stephen C. North, and Daniel A. Keim. 2017. What you see is what you can change: Human-centered machine learning by interactive visual- of eye-tracers, cameras, screen captures and key loggers. ization. Neurocomputing 268 (2017), 164–175. [15] Dominik Sacha, Hansi Senaratne, Bum C. Kwon, Geoffrey Ellis, and Daniel A. Keim. 2016. The role of uncertainty, awareness, and trust in visual analytics. 5 RELATED WORK IEEE Transactions on Visualization and Computer Graphics 22, 1 (2016), 240–249. In [18, 20] the authors present VA tools to make sense of text [16] Arthur D. Santana. 2011. Online readers’ comments represent new opinion pipeline. Newspaper Research Journal 32, 3 (2011), 66–81. collections. In contrast to our approach they do not place the human [17] Jessica Z. Self, Radha K. Vinayagam, J. T. Fry, and Chris North. 2016. Bridging the in the loop and rely heavily on pre-processing. Our modelling of gap between user intention and model parameters for human-in-the-loop data the human interactions with ML is similar to [14], but they do analytics. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA ’16). ACM, New York, NY, USA, Article 3, 6 pages. not distinguish between various ML strategies like supervised and [18] John Stasko, Carsten Görg, and Zhicheng Liu. 2008. Jigsaw: Supporting inves- unsupervised learning. Additionally, [4] is related to our work as tigative analysis through interactive visualization. Information Visualization 7, 2 (2008), 118–132. they come up with a novel principle for analytical interactions [19] Gregor Wiedemann, Eugen Ruppert, Raghav Jindal, and Chris Biemann. 2018. called semantic interactions. We will build upon these findings to Transfer learning from LDA to BiLSTM-CNN for offensive language detection in provide interactions that derive from the user’s analytic process. twitter. Austrian Academy of Sciences, Vienna September 21, 2018 (2018), 85–94. [20] Yi Yang, Quanming Yao, and Huamin Qu. 2017. VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic 6 CONCLUSION modeling. Visual Informatics 1, 1 (2017), 40 – 47. We have developed a VA tool for user comments in online jour- nalism and have outlined next steps for a user centric integration of ML in our prototype. The next steps cover how different ML methods like supervised and unsupervised learning can be adapted