=Paper=
{{Paper
|id=Vol-2173/paper11
|storemode=property
|title=Towards Crowdsourcing Clickbait Labels for YouTube Videos
|pdfUrl=https://ceur-ws.org/Vol-2173/paper11.pdf
|volume=Vol-2173
|authors=Jiani Qu,Anny Marleen Hißbach,Tim Gollub,Martin Potthast
|dblpUrl=https://dblp.org/rec/conf/hcomp/QuHGP18
}}
==Towards Crowdsourcing Clickbait Labels for YouTube Videos==
<pdf width="1500px">https://ceur-ws.org/Vol-2173/paper11.pdf</pdf>
<pre>
                 Towards Crowdsourcing Clickbait Labels for YouTube Videos

                Jiani Qu            Anny Marleen Hißbach                Tim Gollub                Martin Potthast
                               Bauhaus-Universität Weimar and Leipzig University
                 <first>.<last>@uni-weimar.de and martin.potthast@uni-leipzig.de


                            Abstract
  Clickbait is increasingly used by publishers on social me-
  dia platforms to spark their users’ natural curiosity and to
  elicit clicks on their content. Every click earns them display
  advertisement revenue. Social media users who are tricked
  into clicking may experience a sense of disappointment or
  agitation, and social media operators have been observing
  growing amounts of clickbait on their platforms. As largest
  video-sharing platform on the web, YouTube, too, suffers from
  clickbait. Many users and YouTubers alike have complained
  about this development. In this paper, we lay the foundation for
  crowdsourcing the first YouTube clickbait corpus by (1) aug-
  menting the YouTube 8M dataset with meta data to obtain a                 Figure 1: Videos advertised using clickbait teasers.
  large-scale base population of videos, and by (2) studying the
  task design suitable to manual clickbait identification.              Clickbait became a focus of computer science research
                                                                     only recently, two of the earliest contributions having been
                                                                     made by Chakraborty et al. (2016) and Potthast et al. (2016).
                        Introduction                                 Both rely on a rich set of hand-crafted features to detect click-
Clickbait is a marketing instrument employed by many pub-            bait headlines and clickbait tweets, respectively. Wei and Wan
lishers on social media that entices and manipulates users to        (2017) and Biyani, Tsioutsiouliklis, and Blackmer (2016) at-
click on a certain link by using eye-catching teaser content,        tempted to categorize clickbait news headlines, where the
exaggerated descriptions, by omitting key information, or            former distinguishes only ambiguous and misleading ones,
even via outright deception—irrespective of whether users            and the latter distinguishes eight categories. Agrawal (2016)
are actually interested in the content’s topic or not. This usu-     and Anand, Chakraborty, and Park (2016) employed deep
ally serves the purpose of maximizing the revenue generated          neural networks for clickbait detection, reporting higher pre-
through display advertisement on the content’s page. At the          cision than the aforementioned studies. Moreover, for the
same time, it induces a frustrating user experience both on          Clickbait Challenge 2017, Potthast et al. (2018) introduced
the social media platform as well as on the publisher’s page.        a new large-scale benchmark corpus on which twelve new
In recent years, clickbait has been on the rise, threatening to      approaches have been evaluated, which almost exclusively
clog up the social media channel just as spam almost did for         employ deep learning to various degrees of effectiveness.1 All
email, and causing quality content to be buried. News pub-           of the aforementioned studies focused on the news domain.
lishers are considered a primary source of clickbait, which is          While news clickbait appears to be rather well-understood,
usually in direct violation of journalistic codes of ethics (Pot-    this is not the case for entertainment clickbait. Figure 1 shows
thast et al. 2016). However, also on entertainment platforms,        examples we encountered in our annotation study. While
such as YouTube, this problem is increasingly observed due           closely resembling the most extreme forms of clickbait en-
to the considerable amount of advertisement revenue earned           countered in the news domain, it is still unclear how entertain-
by YouTubers (i.e., professional video uploaders) through            ment clickbait differs from news clickbait. To render clickbait
views on their videos. Many well-known YouTubers have                detection on YouTube feasible, however, we have to first un-
expressed their concerns about this situation: such a market         derstand how it can be reliably identified by a human, so
environment is basically a race to the bottom, where people          that a valid clickbait dataset can be constructed for training.
are more or less forced to employ clickbait to avoid their           An important prerequisite for this purpose is an extensive
content from being lost among all the catchy titles.                 population of YouTube videos containing both clickbait and
                                                                     non-clickbait videos from which to sample. This paper con-
Copyright c 2018for this paper by its authors. Copying permitted
                                                                        1
for private and academic purposes.                                          http://www.clickbait-challenge.org/
tributes by laying the foundation for the construction of a             Table 1: Overview of the augmented YouTube 8M corpus.
YouTube clickbait corpus: (1) the YouTube 8M dataset is                 Text lengths are measured in words, counts are numbers of
augmented with meta information not originally found in it,             videos; for tags also the sum totals of unique tags is given.
yielding a base population of YouTube videos, and (2) we                Data item           Count                  Distribution
conduct a preliminary study on annotating YouTube videos
with regard to clickbait as a first step toward crowdsourcing                                            min       mean       max      stdev
clickbait annotations. In what follows, we detail the corpus            Videos            6,192,353
construction and report on the results of our preliminary an-             length (s)                     120    229.6       500 107.8
notation study.                                                           views                        1,000 60,552.4 >2 billion 803.7
                                                                        Thumbnails    6,192,353
        Corpus Construction: Base Population                            Titles        6,192,353
                                                                           length                           0        7.1          44     0.0
An important prerequisite for the construction of a valid cor-          Descriptions 5,917,215
pus is to draw a representative sample of documents from                   length                           0       45.9      2355       0.0
the underlying population. YouTube offers little to no help in          Tags          5,738,782
this regard, since neither its web front end nor its APIs allow            count     72,700,304             0       12.7       146       0.0
for enumerating all videos available, nor tapping into the              Comments      4,732,577             0       41.7   946,863       0.4
                                                                           length                           0       13.9   190,080       0.0
stream of videos uploaded every day. If not for the recently            Captions      1,662,459             1        1.4        48       0.0
released YouTube 8M dataset (Abu-El-Haija et al. 2016),                    length                           0      472.4   499,650       0.3
which has been constructed by researchers working at You-
Tube, we would be left with no choice but to crawl YouTube              language, since this information is not mandatory when up-
ourselves. Below, we briefly review the construction and orig-          loading a video. Also, the API provides only information
inal purpose of the YouTube 8M dataset, describe our efforts            about the available captions, but not the captions themselves.
to augment the dataset with the meta data necessary for click-          Only the uploader of a video is given access to its captions
bait detection (rendering the dataset also useful for tackling          via the API; we extracted them using youtube-dl.4 For each
other research questions), and give a brief overview of the             video, all manually created captions were downloaded, and
corpus statistics. Altogether, the resulting corpus compiles (if        auto-generated captions in the “default” language and En-
available) the meta data, comments, thumbnails, and subtitles           glish. The “default” auto-generated caption gives perhaps the
("captions") of 6,192,353 videos in a unified format, which             only hint at a video’s original language. Finally, we down-
we make available to other researchers on request.2 Table 1             loaded all thumbnails used to advertise a video, which are
gives an overview of the corpus.                                        not available via the API, but only via a canonical URL.
    YouTube 8M is a large benchmark for multi-label video                  Our corpus provides the possibility to recreate the way a
classification; it compiles about 500,000 hours of videos an-           video is presented on YouTube (meta data and thumbnail),
notated with visually recognizable entities. It is available            what the actual content is ((sub)titles and descriptions), and
for download in the form of precomputed frame-based fea-                how its viewers reacted (comments), forming the basis to
ture representations, allowing for the development of classi-           studying the requirements of clickbait annotation.
fication technology, but comes without any other meta data
about the videos. Given that there are currently no other
datasets of significant size for YouTube which have been                                       Annotation Study
drawn in a sensible manner from the population of all You-              Answers to the questions “What are characteristics of click-
Tube videos, this corpus provides for the best alternative              bait on YouTube?” and “Can it be systematically, yet manu-
available to study YouTube-related tasks. As a consequence,             ally identified?” are important prerequisites to crowdsourcing
its sampling criteria will also apply to our corpus: a video has        clickbait annotations. We conducted a controlled annotation
a length between 120 and 500 seconds, it corresponds to one             study by manually examining a total of 109 YouTube videos.
of 10,000 visual entities of the dataset, and it has more than          The study was carried out in three stages by two reviewers.
1,000 views. The videos we use in our corpus are extracted                 Video review procedure. The review of each video was
from the publicly available partitions "Train" and "Validate",          done according to a rigorous plan: Each video was indepen-
which contain 90% of the dataset’s videos.                              dently reviewed twice to reduce bias with regard to clickbait
    Augmented YouTube 8M. We used the YouTube Data                      classification. The reviewed video properties are title, de-
API3 to crawl a variety of meta data for the videos of You-             scription, thumbnail, the video itself, and its comments. Also,
Tube 8M. First point of interest was the “video resource,”              likes and dislikes were noted and their ratio was calculated.
which comprises data about the video, such as the video’s               Observations and a clickbait classification were written down
title, description, uploader name, tags, view count, and more.          in a structured lab notebook after reviewing the video teaser
Also included in the meta data is whether comments have                 (title, thumbnail, and the first 123 characters of the descrip-
been left for the video. If so, we downloaded them as well,             tion) as displayed on YouTube’s web front end, and after
including information about their authors, likes, dislikes, and         watching the video (when views, likes, and comments are
responses. There is no property which specifies a video’s               first visible). Finally, as an exercise to better understand click-
                                                                        bait on YouTube, each video judged as clickbait was given
   2
       A public download link is not available for licensing reasons.
   3                                                                       4
       https://developers.google.com/youtube/v3/docs/                          https://rg3.github.io/youtube-dl/
a non-clickbait title and a short, teaser-fitting description         clickbait videos employ thumbnails with brightly colored
hinting its actual content, and vice versa. In Stages 2 and 3,        extra text, extreme facial expressions or emojis, and unnat-
clickbait was not judged on a binary scale, anymore, but on a         ural, surprising, or suggestive pictures. The aforementioned
scale from 0 (no clickbait) to 5 (strong clickbait).                  characteristics can be utilized for detection.
    Stage 1: random sampling. 24 random videos from the                  However, the teaser information alone does not suffice
corpus were reviewed. The classifications of both reviewers           to detect clickbait videos. Of the 87 videos labeled as non-
agreed with each other, albeit only 1 video was judged to             clickbait, 9 have titles that meet the aforementioned criteria
be clickbait for using excessive exclamation marks in the             (excessive capitalization or punctuation) with 5 medium-to-
title and having a poor match between title and content. The          severe cases. In spite of that, the teasers have a strong relation
graded clickbait score for each video was similar (i.e., always       to the video’s content and hence were not considered as
within 1 point of each other). Given the small sample size,           clickbait. This may result in a high false-positive rate when
we can only say that the prevalence of clickbait on YouTube           crowdsourcing annotations based on teasers alone.
is low, which led us to adjust our sampling procedure.                   Providing the video and user reactions like comments as
    Stage 2: stratified sampling. Dividing the view count             well helps to resolve ambiguous cases at the cost of a sig-
distribution of the videos into 4 parts (0-2180, 2181-4720,           nificantly higher workload. Watching, say, the first minute
4721-14936, 14937-max, where the first three intervals’ up-           of a video suffices to resolve all of the aforementioned false
per bounds are medians), we randomly chose 10 videos per              positive cases. Reviewing comments, however, was not so
part, presuming a correlation between the number of clicks            useful: the types of comments we identified include discus-
a video receives and its clickbaitiness. Again, the results           sions about the video’s topic, questions or remarks directed
showed no differences between reviewers, and the determi-             to the uploader, feedback on the video itself, and random
nation of clickbaitiness was mostly the same. Among the               thoughts. Reviewing all comments on a video is feasible only
40 videos, only one was considered as debatable clickbait             for less than, say, 20 comments, but selecting comments for
in the group with fewest views (0-2180): the title was mis-           review, e.g., based on a sentiment analysis, will prove dif-
leading and its correspondence with the video’s content was           ficult: content-related expressions of anger, e.g., against an
weak. However, the title and description seemed to be auto-           evil antagonist from a short film, are intermingled with meta
generated from YouTube itself, which, to the best of our              discussions about the video (quality). A better approach may
knowledge, happens for direct uploads from video cameras.             be selecting keywords or phrases (e.g., when a commenter
Finally, we decided to hunt down clickbait.                           explicitly says ’clickbait’) from the comments. Other video
    Stage 3: targeted selection. Obviously, targeted selection        properties were unhelpful, which may be due to sample size.
causes reviewer bias. Still, since unbiased sampling failed              Altogether, we will proceed with crowdsourcing based on
with respect to our goal of determining the difficulty of man-        a two-stage process, where in the first stage, video teasers
ual clickbait classification on YouTube, we were left with            will be reviewed, which can be done at a glance and therefore
no alternative. In this stage, we also did not review each            at reasonable costs, and in the second stage, the clickbait
video twice, but both reviewers worked in close collabora-            identified will be reviewed more in-depth to ensure a low false
tion: 25 “obvious” non-clickbait videos and 20 “obvious”              positive rate by asking the workers to watch the first 30 to
clickbait videos5 were handpicked for review based on their           60 seconds of a video plus some extracts from the comments.
teasers. Video selection was done based on reviewers’ prior              As an aside, we discovered a kind of clickbait special to
experience and by browsing videos. Both reviewers agreed              video platforms: “Staybait” refers to videos that foretell or
in their discussions on the non-clickbait videos, but opinions        promise something exciting in the video to build up suspense,
differed on two “clickbait” videos, where their content turned        but fail to deliver, or barely mention it later. The two exam-
out to be highly relevant to the teaser, though the titles were       ples of staybait we found were both from vlogs (personal
considered too brief on their own.                                    experience documentation in video form).
    Observations. Recall that clickbait pertains to how a video
is advertised, whereas the video’s actual content (quality) is                                Conclusion
not in question. This is why our video review took special            Automatic clickbait detection on YouTube is still far out of
note of the video teaser. The title and thumbnail are com-            reach, since a reliable training corpus needs to be constructed.
parable to Twitter teaser messages: characteristics, such as          Doing so requires access to the base population of videos,
excessive use of capitalization and punctuation, other high-          which we represent using the YouTube 8M corpus, video
lighting and use of certain words like ’this’ and ’unthinkable’,      meta data, which we crawl for that corpus, and a reliable
emotional writing, and deliberate ambiguities or omission             annotation procedure, which we developed as part of this
of information also appear on most clickbaits we observed.            study. As we do not expect to annotate the entire YouTube 8M
The short video descriptions are a unique addition, and they          corpus, what is still missing, is a reliable way of sampling
fall into three categories: (1) blank or same as/paraphrase           from YouTube 8M so that the sampling strategy is unbiased,
of title; (2) additional information about content; and (3) en-       yet, yields a non-trivial amount of clickbait videos as part
couragement for channel subscription. The thumbnail images            of a reasonably-sized sample of, say, 100,000 videos. For
very often contain extra textual information, especially in           Twitter, Potthast et al. (2018) solved this problem by looking
the case of user-defined custom thumbnails. Many of the               only at the top news publishers. Given the high diversity of
                                                                      successful YouTube channels, however, it is questionable
   5
       5 more videos were deleted on YouTube and had to be omitted.   whether the same strategy can be applied here as well.
                       References
Abu-El-Haija, S.; Kothari, N.; Lee, J.; Natsev, P.; Toderici,
G.; Varadarajan, B.; and Vijayanarasimhan, S. 2016.
Youtube-8m: A large-scale video classification benchmark.
arXiv preprint arXiv:1609.08675.
Agrawal, A. 2016. Clickbait detection using deep learning.
In 2016 2nd International Conference on Next Generation
Computing Technologies (NGCT), 268–272.
Anand, A.; Chakraborty, T.; and Park, N. 2016. We used
neural networks to detect clickbaits: You won’t believe what
happened next! CoRR abs/1612.01340.
Biyani, P.; Tsioutsiouliklis, K.; and Blackmer, J. 2016. " 8
amazing secrets for getting more clicks": Detecting
clickbaits in news streams using article informality. In AAAI
Conference on Artificial Intelligence.
Chakraborty, A.; Paranjape, B.; Kakarla, S.; and Ganguly, N.
2016. Stop clickbait: Detecting and preventing clickbaits in
online news media. CoRR abs/1610.09786.
Potthast, M.; Köpsel, S.; Stein, B.; and Hagen, M. 2016.
Clickbait Detection. In Proceedings of the 38th European
Conference on Information Retrieval (ECIR 16), 810–817.
Springer.
Potthast, M.; Gollub, T.; Komlossy, K.; Schuster, S.;
Wiegmann, M.; Garces, E.; Hagen, M.; and Stein, B. 2018.
Crowdsourcing a Large Corpus of Clickbait on Twitter. In
Proceedings of the 27th International Conference on
Computational Linguistics (COLING 18).
Wei, W., and Wan, X. 2017. Learning to identify ambiguous
and misleading news headlines. CoRR abs/1705.06031.

</pre>