=Paper=
{{Paper
|id=Vol-2173/paper11
|storemode=property
|title=Towards Crowdsourcing Clickbait Labels for YouTube Videos
|pdfUrl=https://ceur-ws.org/Vol-2173/paper11.pdf
|volume=Vol-2173
|authors=Jiani Qu,Anny Marleen Hißbach,Tim Gollub,Martin Potthast
|dblpUrl=https://dblp.org/rec/conf/hcomp/QuHGP18
}}
==Towards Crowdsourcing Clickbait Labels for YouTube Videos==
Towards Crowdsourcing Clickbait Labels for YouTube Videos Jiani Qu Anny Marleen Hißbach Tim Gollub Martin Potthast Bauhaus-Universität Weimar and Leipzig University. @uni-weimar.de and martin.potthast@uni-leipzig.de Abstract Clickbait is increasingly used by publishers on social me- dia platforms to spark their users’ natural curiosity and to elicit clicks on their content. Every click earns them display advertisement revenue. Social media users who are tricked into clicking may experience a sense of disappointment or agitation, and social media operators have been observing growing amounts of clickbait on their platforms. As largest video-sharing platform on the web, YouTube, too, suffers from clickbait. Many users and YouTubers alike have complained about this development. In this paper, we lay the foundation for crowdsourcing the first YouTube clickbait corpus by (1) aug- menting the YouTube 8M dataset with meta data to obtain a Figure 1: Videos advertised using clickbait teasers. large-scale base population of videos, and by (2) studying the task design suitable to manual clickbait identification. Clickbait became a focus of computer science research only recently, two of the earliest contributions having been made by Chakraborty et al. (2016) and Potthast et al. (2016). Introduction Both rely on a rich set of hand-crafted features to detect click- Clickbait is a marketing instrument employed by many pub- bait headlines and clickbait tweets, respectively. Wei and Wan lishers on social media that entices and manipulates users to (2017) and Biyani, Tsioutsiouliklis, and Blackmer (2016) at- click on a certain link by using eye-catching teaser content, tempted to categorize clickbait news headlines, where the exaggerated descriptions, by omitting key information, or former distinguishes only ambiguous and misleading ones, even via outright deception—irrespective of whether users and the latter distinguishes eight categories. Agrawal (2016) are actually interested in the content’s topic or not. This usu- and Anand, Chakraborty, and Park (2016) employed deep ally serves the purpose of maximizing the revenue generated neural networks for clickbait detection, reporting higher pre- through display advertisement on the content’s page. At the cision than the aforementioned studies. Moreover, for the same time, it induces a frustrating user experience both on Clickbait Challenge 2017, Potthast et al. (2018) introduced the social media platform as well as on the publisher’s page. a new large-scale benchmark corpus on which twelve new In recent years, clickbait has been on the rise, threatening to approaches have been evaluated, which almost exclusively clog up the social media channel just as spam almost did for employ deep learning to various degrees of effectiveness.1 All email, and causing quality content to be buried. News pub- of the aforementioned studies focused on the news domain. lishers are considered a primary source of clickbait, which is While news clickbait appears to be rather well-understood, usually in direct violation of journalistic codes of ethics (Pot- this is not the case for entertainment clickbait. Figure 1 shows thast et al. 2016). However, also on entertainment platforms, examples we encountered in our annotation study. While such as YouTube, this problem is increasingly observed due closely resembling the most extreme forms of clickbait en- to the considerable amount of advertisement revenue earned countered in the news domain, it is still unclear how entertain- by YouTubers (i.e., professional video uploaders) through ment clickbait differs from news clickbait. To render clickbait views on their videos. Many well-known YouTubers have detection on YouTube feasible, however, we have to first un- expressed their concerns about this situation: such a market derstand how it can be reliably identified by a human, so environment is basically a race to the bottom, where people that a valid clickbait dataset can be constructed for training. are more or less forced to employ clickbait to avoid their An important prerequisite for this purpose is an extensive content from being lost among all the catchy titles. population of YouTube videos containing both clickbait and non-clickbait videos from which to sample. This paper con- Copyright c 2018for this paper by its authors. Copying permitted 1 for private and academic purposes. http://www.clickbait-challenge.org/ tributes by laying the foundation for the construction of a Table 1: Overview of the augmented YouTube 8M corpus. YouTube clickbait corpus: (1) the YouTube 8M dataset is Text lengths are measured in words, counts are numbers of augmented with meta information not originally found in it, videos; for tags also the sum totals of unique tags is given. yielding a base population of YouTube videos, and (2) we Data item Count Distribution conduct a preliminary study on annotating YouTube videos with regard to clickbait as a first step toward crowdsourcing min mean max stdev clickbait annotations. In what follows, we detail the corpus Videos 6,192,353 construction and report on the results of our preliminary an- length (s) 120 229.6 500 107.8 notation study. views 1,000 60,552.4 >2 billion 803.7 Thumbnails 6,192,353 Corpus Construction: Base Population Titles 6,192,353 length 0 7.1 44 0.0 An important prerequisite for the construction of a valid cor- Descriptions 5,917,215 pus is to draw a representative sample of documents from length 0 45.9 2355 0.0 the underlying population. YouTube offers little to no help in Tags 5,738,782 this regard, since neither its web front end nor its APIs allow count 72,700,304 0 12.7 146 0.0 for enumerating all videos available, nor tapping into the Comments 4,732,577 0 41.7 946,863 0.4 length 0 13.9 190,080 0.0 stream of videos uploaded every day. If not for the recently Captions 1,662,459 1 1.4 48 0.0 released YouTube 8M dataset (Abu-El-Haija et al. 2016), length 0 472.4 499,650 0.3 which has been constructed by researchers working at You- Tube, we would be left with no choice but to crawl YouTube language, since this information is not mandatory when up- ourselves. Below, we briefly review the construction and orig- loading a video. Also, the API provides only information inal purpose of the YouTube 8M dataset, describe our efforts about the available captions, but not the captions themselves. to augment the dataset with the meta data necessary for click- Only the uploader of a video is given access to its captions bait detection (rendering the dataset also useful for tackling via the API; we extracted them using youtube-dl.4 For each other research questions), and give a brief overview of the video, all manually created captions were downloaded, and corpus statistics. Altogether, the resulting corpus compiles (if auto-generated captions in the “default” language and En- available) the meta data, comments, thumbnails, and subtitles glish. The “default” auto-generated caption gives perhaps the ("captions") of 6,192,353 videos in a unified format, which only hint at a video’s original language. Finally, we down- we make available to other researchers on request.2 Table 1 loaded all thumbnails used to advertise a video, which are gives an overview of the corpus. not available via the API, but only via a canonical URL. YouTube 8M is a large benchmark for multi-label video Our corpus provides the possibility to recreate the way a classification; it compiles about 500,000 hours of videos an- video is presented on YouTube (meta data and thumbnail), notated with visually recognizable entities. It is available what the actual content is ((sub)titles and descriptions), and for download in the form of precomputed frame-based fea- how its viewers reacted (comments), forming the basis to ture representations, allowing for the development of classi- studying the requirements of clickbait annotation. fication technology, but comes without any other meta data about the videos. Given that there are currently no other datasets of significant size for YouTube which have been Annotation Study drawn in a sensible manner from the population of all You- Answers to the questions “What are characteristics of click- Tube videos, this corpus provides for the best alternative bait on YouTube?” and “Can it be systematically, yet manu- available to study YouTube-related tasks. As a consequence, ally identified?” are important prerequisites to crowdsourcing its sampling criteria will also apply to our corpus: a video has clickbait annotations. We conducted a controlled annotation a length between 120 and 500 seconds, it corresponds to one study by manually examining a total of 109 YouTube videos. of 10,000 visual entities of the dataset, and it has more than The study was carried out in three stages by two reviewers. 1,000 views. The videos we use in our corpus are extracted Video review procedure. The review of each video was from the publicly available partitions "Train" and "Validate", done according to a rigorous plan: Each video was indepen- which contain 90% of the dataset’s videos. dently reviewed twice to reduce bias with regard to clickbait Augmented YouTube 8M. We used the YouTube Data classification. The reviewed video properties are title, de- API3 to crawl a variety of meta data for the videos of You- scription, thumbnail, the video itself, and its comments. Also, Tube 8M. First point of interest was the “video resource,” likes and dislikes were noted and their ratio was calculated. which comprises data about the video, such as the video’s Observations and a clickbait classification were written down title, description, uploader name, tags, view count, and more. in a structured lab notebook after reviewing the video teaser Also included in the meta data is whether comments have (title, thumbnail, and the first 123 characters of the descrip- been left for the video. If so, we downloaded them as well, tion) as displayed on YouTube’s web front end, and after including information about their authors, likes, dislikes, and watching the video (when views, likes, and comments are responses. There is no property which specifies a video’s first visible). Finally, as an exercise to better understand click- bait on YouTube, each video judged as clickbait was given 2 A public download link is not available for licensing reasons. 3 4 https://developers.google.com/youtube/v3/docs/ https://rg3.github.io/youtube-dl/ a non-clickbait title and a short, teaser-fitting description clickbait videos employ thumbnails with brightly colored hinting its actual content, and vice versa. In Stages 2 and 3, extra text, extreme facial expressions or emojis, and unnat- clickbait was not judged on a binary scale, anymore, but on a ural, surprising, or suggestive pictures. The aforementioned scale from 0 (no clickbait) to 5 (strong clickbait). characteristics can be utilized for detection. Stage 1: random sampling. 24 random videos from the However, the teaser information alone does not suffice corpus were reviewed. The classifications of both reviewers to detect clickbait videos. Of the 87 videos labeled as non- agreed with each other, albeit only 1 video was judged to clickbait, 9 have titles that meet the aforementioned criteria be clickbait for using excessive exclamation marks in the (excessive capitalization or punctuation) with 5 medium-to- title and having a poor match between title and content. The severe cases. In spite of that, the teasers have a strong relation graded clickbait score for each video was similar (i.e., always to the video’s content and hence were not considered as within 1 point of each other). Given the small sample size, clickbait. This may result in a high false-positive rate when we can only say that the prevalence of clickbait on YouTube crowdsourcing annotations based on teasers alone. is low, which led us to adjust our sampling procedure. Providing the video and user reactions like comments as Stage 2: stratified sampling. Dividing the view count well helps to resolve ambiguous cases at the cost of a sig- distribution of the videos into 4 parts (0-2180, 2181-4720, nificantly higher workload. Watching, say, the first minute 4721-14936, 14937-max, where the first three intervals’ up- of a video suffices to resolve all of the aforementioned false per bounds are medians), we randomly chose 10 videos per positive cases. Reviewing comments, however, was not so part, presuming a correlation between the number of clicks useful: the types of comments we identified include discus- a video receives and its clickbaitiness. Again, the results sions about the video’s topic, questions or remarks directed showed no differences between reviewers, and the determi- to the uploader, feedback on the video itself, and random nation of clickbaitiness was mostly the same. Among the thoughts. Reviewing all comments on a video is feasible only 40 videos, only one was considered as debatable clickbait for less than, say, 20 comments, but selecting comments for in the group with fewest views (0-2180): the title was mis- review, e.g., based on a sentiment analysis, will prove dif- leading and its correspondence with the video’s content was ficult: content-related expressions of anger, e.g., against an weak. However, the title and description seemed to be auto- evil antagonist from a short film, are intermingled with meta generated from YouTube itself, which, to the best of our discussions about the video (quality). A better approach may knowledge, happens for direct uploads from video cameras. be selecting keywords or phrases (e.g., when a commenter Finally, we decided to hunt down clickbait. explicitly says ’clickbait’) from the comments. Other video Stage 3: targeted selection. Obviously, targeted selection properties were unhelpful, which may be due to sample size. causes reviewer bias. Still, since unbiased sampling failed Altogether, we will proceed with crowdsourcing based on with respect to our goal of determining the difficulty of man- a two-stage process, where in the first stage, video teasers ual clickbait classification on YouTube, we were left with will be reviewed, which can be done at a glance and therefore no alternative. In this stage, we also did not review each at reasonable costs, and in the second stage, the clickbait video twice, but both reviewers worked in close collabora- identified will be reviewed more in-depth to ensure a low false tion: 25 “obvious” non-clickbait videos and 20 “obvious” positive rate by asking the workers to watch the first 30 to clickbait videos5 were handpicked for review based on their 60 seconds of a video plus some extracts from the comments. teasers. Video selection was done based on reviewers’ prior As an aside, we discovered a kind of clickbait special to experience and by browsing videos. Both reviewers agreed video platforms: “Staybait” refers to videos that foretell or in their discussions on the non-clickbait videos, but opinions promise something exciting in the video to build up suspense, differed on two “clickbait” videos, where their content turned but fail to deliver, or barely mention it later. The two exam- out to be highly relevant to the teaser, though the titles were ples of staybait we found were both from vlogs (personal considered too brief on their own. experience documentation in video form). Observations. Recall that clickbait pertains to how a video is advertised, whereas the video’s actual content (quality) is Conclusion not in question. This is why our video review took special Automatic clickbait detection on YouTube is still far out of note of the video teaser. The title and thumbnail are com- reach, since a reliable training corpus needs to be constructed. parable to Twitter teaser messages: characteristics, such as Doing so requires access to the base population of videos, excessive use of capitalization and punctuation, other high- which we represent using the YouTube 8M corpus, video lighting and use of certain words like ’this’ and ’unthinkable’, meta data, which we crawl for that corpus, and a reliable emotional writing, and deliberate ambiguities or omission annotation procedure, which we developed as part of this of information also appear on most clickbaits we observed. study. As we do not expect to annotate the entire YouTube 8M The short video descriptions are a unique addition, and they corpus, what is still missing, is a reliable way of sampling fall into three categories: (1) blank or same as/paraphrase from YouTube 8M so that the sampling strategy is unbiased, of title; (2) additional information about content; and (3) en- yet, yields a non-trivial amount of clickbait videos as part couragement for channel subscription. The thumbnail images of a reasonably-sized sample of, say, 100,000 videos. For very often contain extra textual information, especially in Twitter, Potthast et al. (2018) solved this problem by looking the case of user-defined custom thumbnails. Many of the only at the top news publishers. Given the high diversity of successful YouTube channels, however, it is questionable 5 5 more videos were deleted on YouTube and had to be omitted. whether the same strategy can be applied here as well. References Abu-El-Haija, S.; Kothari, N.; Lee, J.; Natsev, P.; Toderici, G.; Varadarajan, B.; and Vijayanarasimhan, S. 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675. Agrawal, A. 2016. Clickbait detection using deep learning. In 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), 268–272. Anand, A.; Chakraborty, T.; and Park, N. 2016. We used neural networks to detect clickbaits: You won’t believe what happened next! CoRR abs/1612.01340. Biyani, P.; Tsioutsiouliklis, K.; and Blackmer, J. 2016. " 8 amazing secrets for getting more clicks": Detecting clickbaits in news streams using article informality. In AAAI Conference on Artificial Intelligence. Chakraborty, A.; Paranjape, B.; Kakarla, S.; and Ganguly, N. 2016. Stop clickbait: Detecting and preventing clickbaits in online news media. CoRR abs/1610.09786. Potthast, M.; Köpsel, S.; Stein, B.; and Hagen, M. 2016. Clickbait Detection. In Proceedings of the 38th European Conference on Information Retrieval (ECIR 16), 810–817. Springer. Potthast, M.; Gollub, T.; Komlossy, K.; Schuster, S.; Wiegmann, M.; Garces, E.; Hagen, M.; and Stein, B. 2018. Crowdsourcing a Large Corpus of Clickbait on Twitter. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 18). Wei, W., and Wan, X. 2017. Learning to identify ambiguous and misleading news headlines. CoRR abs/1705.06031.