Shaping the Information Nutrition Label Tim Gollub1 Martin Potthast2 Benno Stein1 1 2 Bauhaus-Universität Weimar Leipzig University tim.gollub@uni-weimar.de martin.potthast@uni-leipzig.de benno.stein@uni-weimar.de Abstract 17 32 31 64 B We take up on the idea of a nutrition facts min °C % dB class label for online documents: the Information Nutrition Label. Such a label has the poten- Figure 1: Visual representation of the proposed label. tial to increase the readers’ ability to make Time, temperature, transparency, volume, and credibility an informed decision before the “consumption” are taken as quantities to describe the nutrition facts of a of a news article or some other published on- document. line document. The basic ideas along with the unaware of alternative arguments and opinions. Fur- dimensions (manifest, measurable text qual- thermore, part of the news articles are spread not only ities, etc.) of such a label were proposed in for the purpose of informing people but come with [FGG+ 17]. The paper in hand focuses on the a commercial or a political incentive. Publishers are problem of an intuitive, unambiguous, and not loath to use exaggerated or misleading claims and intelligible label presentation. For this pur- promises in teasers or headlines in order to catch read- pose we (1) categorize the originally proposed ers’ attention (clickbaiting), making it hard to assess information nutrition dimensions and (2) in- the (trust-) worthiness of an article ahead of reading. terpret them in terms of well-known physical To improve this situation, the authors of [FGG+ 17] quantities from which we belief that they are propose a so-called “information nutrition label” for intuitively understandable for the general pub- online news. Like its food counterpart, the label is sup- lic. To give an impression of our ideas, a visual posed to help people making more informed decisions representation as well as the results of a pre- upon which news items to consume. liminary crowd-sourcing study are presented. Starting from a set of nine information nutrition dimensions that have been proposed in the original 1 Introduction work (see Section 2), we now further shape these ideas The World Wide Web is a great source for news. How- towards fewer categories as well as an intuitively un- ever, relying on online news does not come without derstandable representation of the underlying, often difficulties both for the individual and for a society as a complex text analysis results (see Section 3). The pro- whole [BMA15]. With the web’s sheer endless stream of posed label is based on a categorization of the nine in- news on virtually any topic, readers get easily trapped formation nutrition dimensions into the five categories into filter bubbles and may become disconnected from I Effort, II Kairos, III Logos, IV Pathos, and V Ethos. important public discourses. Social networks stimulate The interpretation of these categories shall be simplified the formation of echo chambers where groups of like- if associating them with well-known quantities from minded people share hyperpartisan news while being physics (or finance in the case of V); Figure 1 shows a possible implementation of the idea. I.e., a text is Copyright © 2018 for the individual papers by the papers’ au- first analyzed regarding the nine information nutrition thors. Copying permitted for private and academic purposes. dimensions whose resulting values are then combined This volume is published and copyrighted by its editors. and rescaled to match the interpretation of the five In: D. Albakour, D. Corney, J. Gonzalo, M. Martinez, B. Poblete, categories. In this regard, the value ranges that are A. Vlachos (eds.): Proceedings of the NewsIR’18 Workshop at ECIR, Grenoble, France, 26-March-2018, published at employed for the proposed categories were determined http://ceur-ws.org with a small crowdsourcing study (see Section 4). Table 1: Categorization of the information nutrition dimensions (Column 1) into five categories (Column 2). Column 3 and 4 show the related physical/financial quantities and the proposed value ranges respectively. Column 5 states prototypical user questions that the respective category addresses. Dimension Category Quantity Range Addressed User Question Readability I Effort Time 0 - 120 min Does time allow the reading? Technicality Verbosity* Topicality II Kairos Temperature 0 - 100 °C Do others care? Virality Factuality III Logos Transparency 0 - 100 % How professional is the writing? Verifiability* Emotion IV Pathos Sound pressure 0 - 120 dB Is the article subjective? Opinion Controversy Authority V Ethos Credit rating class A+. . . D How reliable is the source? Credibility Trust 2 Related Work and the virality of an article both represent temporal and sociological phenomenons, and both may be used The idea of computing an information nutrition label to answer a question like “How much do others care is an outcome of the Dagstuhl Seminar “User Gener- about the article?”. Because of correlations like these, ated Content in Social Media”, held in July 20171 , and we presume that the nutrition dimensions can be sub- has recently been published as a SIGIR Forum arti- sumed into categories without a significant pragmatic cle [FGG+ 17]. In their proposal, the authors suggest loss of information. nine dimensions as attributes for the information nutri- In the second column of Table 1, our proposal for tion label, with one dimension, “authority / credibility such a categorization of the altogether 13 dimensions / trust”, actually comprising three dimensions into one. into five groups is shown. The labels for the categories Table 1 lists these dimensions in the left column; for a II . . . V have been chosen in accordance to Aristotle’s detailed description as well as for a related discussion modes of persuasion [AK07], including the less well we refer to the SIGIR forum article. Note that Table 1 known concept of “kairos”, which stands for the “right, introduces two additional information nutrition dimen- critical, or opportune moment”.2 Column 3 shows our sions, namely “verbosity” and “verifiability”, which we proposal for a category interpretation in physical terms see as complementary to the originally proposed ones. (financial in case of V), along with sensible value ranges While “verbosity” refers to the length of an article, in Column 4. In Column 5, a prototypical user question “verifiability” refers to the extent to which an article which is addressed by the respective category is stated. provides pointers to resources that help to verify claims In the following, each category is discussed in detail. made [HVER15]. Category I, effort, groups all dimensions that affect We presume that a label that displays the nine orig- the time a reader has to allot to comprehend an article. inal dimensions (along with their respective statistical Besides verbosity, which has been used already by measurement units) will receive attention mainly from media websites to provide an estimated article reading experts. In order to open the results of an intricate time (e.g. by Medium3 ), the dimensions readability and document analysis to the general public (cf. the mo- technicality fall into this category. To express effort, tivation for a traffic light system to simplify the food we consider time in minutes as an intuitive choice. The nutrition label [foo12]), we ask the question whether a effort category allows readers to check whether they simpler, yet equally informative and hence preferable have enough time to read an article and to identify label can be derived by merging those dimensions that articles of a specific depth. make pragmatically similar statements. Category II, kairos, groups all dimensions that per- tain to the trendiness, momentum, or hotness of an 3 Categorizing Information Dimensions article or a topic, i.e., topicality and virality. As an When studying the original nutrition dimensions it be- existing attempt to provide this category, the velocity comes apparent that some of them, like topicality and graph [Pet13] on the media website Mashable4 can be virality, or emotion, opinion, and controversy, are simi- 2 https://en.wikipedia.org/wiki/Kairos\#In_classical_rhetoric lar from a pragmatic point of view. E.g., the topicality 3 https://medium.com 1 http://www.dagstuhl.de/17301 4 https://mashable.com counted. As a quantity to express kairos, we consider the temperature in the range of 0 − 100°C as intuitive. 1.0 The kairos category can bring articles to readers’ atten- tions which would be “out of their bubble” otherwise. 0.5 Category III, logos, groups all dimensions that cap- 0.0 ture how well an author supports her claims with evi- dence, i.e., factuality and verifiability. As a quantity Effort Kairos Logos Pathos Ethos to express logos, we consider transparency in the range from 0 - 100% as intuitive. The logos category can help Figure 2: Violin plot for the annotations made by 42 Ama- readers to assess the journalistic quality of an article zon Mechanical Turk workers for a news article from bre- up front. itbart.com. The value range of all nutrition categories on Category IV, pathos, groups all dimensions that are the y-axis are scaled to the interval [0, 1]. related to subjectivity and discrepancies, i.e., emotion, position in the value range are depicted. To make the opinion, and controversy. As a quantity to express values in the depicted label more sensible, we asked pathos, we consider volume, measured as sound pres- a crowd of 42 workers to read and then annotate the sure, as intuitive. The pathos category can help readers news article exemplified in the SIGIR forum article creating awareness that communities sharing alterna- in the light of our categories.6 As final value for the tive arguments or opinions likely exist. label we took the mean of all 42 annotations. For Category V, ethos, finally groups all dimensions the sake of completeness, a violin plot showing the related to the credibility of an author or publisher, distribution of all annotations is shown in Figure 2. i.e., authority, credibility, and trust. As a quantity to The violin plot indicates that the above article takes express ethos, we consider credit ratings in the range little effort to comprehend and, it is obviously not very from A+ to D, as used in finance,5 as an adequate hot anymore (kairos). In terms of transparency and choice. The ethos category can help readers assessing sound pressure (logos and pathos), no clear consensus the risk of becoming misinformed or, alternatively, the is reached, while the publisher is clearly not top rated potential of learning about non-mainstream viewpoints. in terms of credibility (ethos). For future work, we consider to present a computa- 4 Discussion tional model for the information nutrition label and We see three advantages when using the proposed cate- to further investigate the correlation of its constituent gories as attributes for the envisaged information nutri- parts with human intuition. tion label instead of the original dimensions. First, the reduced number of attributes makes the label both eas- References ier to present and easier to digest in practical settings. [AK07] Aristotle and G.A. Kennedy. On Rhetoric: A Second, by resorting to well-known quantities for the Theory of Civic Discourse. Oxford Press, 2007. categories, readers can intuitively interpret the label [BMA15] E. Bakshy, S. Messing, and L. A. Adamic. Exposure without the need of detailed instructions. Third, the to ideologically diverse news and opinion on chosen quantities allow for the design of a non-textual facebook. Science, 348(6239):1130–1132, 2015. visualization of the nutrition label. On the other hand, [FGG+ 17] N. Fuhr, A. Giachanou, G. Grefenstette, I. the potential concerns should not be overlooked: first, Gurevych, A. Hanselowski, K. Jarvelin, R. Jones, Y. Liu, J. Mothe, W. Nejdl, I. Peters, and B. Stein. An the categorization may be not as lossless as anticipated, Information Nutritional Label for Online such that the five categories convey much less helpful Documents. SIGIR Forum, 51(3):44–66, 2017. information than do the original dimensions. Second, [foo12] foodwatch. Research supports traffic light colours. the quantities (or their visualizations) may lead to false https://www.foodwatch.org/more-information/ intuitions about the document they belong to. research-supports-traffic-light-colours/, 2012. Accessed: 2018-02-02. As a very first step towards clarifying some of these [HVER15] R. H. Harder, A. J. Velasco, M. S. Evans, and D. N. concerns, we have designed a mainly non-textual repre- Rockmore. Measuring verifiability in online sentation for our label (see Figure 1), which allows for information. CoRR, abs/1509.05631, 2015. a visual comparison with the tabular label presented in [Pet13] Robyn Peterson. Mashable launches google glass the SIGIR forum article. In Figure 1, each category is viral prediction app. visualized by a rounded rectangle featuring a category https://mashable.com/2013/05/14/mashable- symbol and an article-specific category value. For the launches-velocity-for-google-glass/, 2013. Accessed: 2018-03-17. latter, both the absolute value as well as its relative 5 https://en.wikipedia.org/wiki/Bond_credit_rating\#Credit_ 6 http://www.breitbart.com/big-government/2017/07/25/trumps- rating_tiers attack-on-sessions-over-clinton-prosecution-highlights-his- own-weak-stance/