<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring the Indian Political YouTube Landscape: A Multimodal Multi-Task Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Adwita Arora</string-name>
          <email>adwita.ug20@nsut.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Naman Dhingra</string-name>
          <email>naman.dhingra@nsut.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Divya Chaudhary</string-name>
          <email>d.chaudhary@northeastern.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ian Gorton</string-name>
          <email>i.gorton@northeastern.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bijendra Kumar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Multimodal Analysis, Political Analysis, Social Media Analysis</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Netaji Subhas University of Technology</institution>
          ,
          <addr-line>New Delhi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Northeastern University</institution>
          ,
          <addr-line>Boston, MA 02115</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>Social media profoundly influences all facets of our lives, including politics. Political parties, politicians, and media outlets have strategically cultivated their social media presence to engage with the public. However, with the advent of freely available Internet services in India, there has been a rising proliferation in the community of independent content creators on YouTube, with many getting millions of views per video. In this study, we present a novel multimodal dataset of videos, taken from 20 independent and influential content creators, annotated for five socially and politically relevant labels with a high inter-annotator score (0.820 - 0.956 Cohen's Kappa Score) falling under the categories - Humour/Satire, Opposition/Criticism, Support/Advocacy, and Informational/Analysis. We consider three modalities in our dataset - textual (title and description of the video), visual (thumbnail) and audio (MFCC coeficients and additional spectral and temporal features) modalities. We also perform preliminary classification on our dataset using an early fusion multimodal model, combining audio, visual and textual modalities, which performs better than other unimodal and bimodal approaches, yielding a Macro-F1 score of 0.8742 and ROC-AUC score of 0.769. By introducing this novel dataset, we aim to stimulate further investigation within the domains of opinion dissemination across social networks and the analysis of multimodal content, especially within the Indian context.</p>
      </abstract>
      <kwd-group>
        <kwd>Approach</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>India, being the world’s largest democracy, has a rich and vibrant political and social history.
Politics is an essential aspect of the lives of Indians, making it one of the country’s most
deliberated and debated topics. With the advent of social media, people are voicing their
opinions and concerns in a manner that has never been more convenient. This could be through
tweets on Twitter, consuming or creating content on YouTube, or posting on Facebook, among
many other such avenues. Political parties and politicians also maintain social media profiles to
connect and engage with citizens.</p>
      <p>With the popularity of YouTube in India and the importance of politics in Indian society,
analyzing the content put out by Indian creators on politics, as well as the response of the
audience to it becomes essential. This research aims to analyze politically and socially relevant
videos uploaded by independent content creators on YouTube. For this study, we manually
chose 20 prominent and diverse YouTubers regularly making content on the politics and society
of India. We selected the 20 most viewed relevant videos of each YouTuber to form our dataset.</p>
      <p>Videos uploaded to YouTube rarely subscribe to a single topic, given their detailed nature.
Especially with political videos, creators can employ diferent communication techniques that
strategically capture the audience’s attention and convey their point across. Videos can difer
within the stance taken by the creator as well as what justification they provide towards it. This
makes it important to model the understanding of these videos on a multi-task basis, where a
single video can have multiple labels.</p>
      <p>Therefore, for each video, we annotate it in five categories - the presence or absence of
humour/satire/irony, support or opposition of any political entity, and whether the video is a
fact-based analysis or personal opinion. Each of these categories is independent of the other,
and the presence of one category does not afect another. We draw insights on the data collected
using topic modelling on comments and keyphrase analysis on titles as well as text extracted
from the thumbnails.</p>
      <p>The final portion of this paper is devoted to a multi-task, multimodal classification of the
dataset. Each video can be represented as a combination of three modalities - audio, visual and
textual. After extracting relevant features from them, we employ an early fusion classification
model on these modalities.</p>
      <p>
        YouTube has been a source of ample classification and analysis tasks in the past due to the
massive volume of public data available on various topics. Apart from the videos, comments
also act as a rich source for analysis as they serve as responses from the audience, both of which
can be studied individually or in unison. Kang et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] analyzed the Mukbang-related content
on YouTube along with news and observed how behaviours like overeating are linked directly
to the video’s popularity, while Papadamou et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] analyzed the Incel community and the
abundance of toxic and misogynistic comments on these forums.
      </p>
      <p>
        Works like [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are focused on the analysis of YouTube comments for exposure of
children to inappropriate content and transphobic/homophobic content identification,
respectively. Latorre and Amores [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] presented a topic modelling analysis of xenophobic and racist
comments in Spanish directed at migrants and refugees.
      </p>
      <p>
        A single video comprises many modalities, and several methods of classification and analysis
have been developed to handle these modalities. Yousaf and Nawaz [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposed a novel
EficientNet-BiLSTM approach for detecting inappropriate content in animated cartoon videos
targeted at children. They extracted video descriptions using EficientNet, a pre-trained CNN
model, which was then fed to BiLSTM to learn representations. They showcased how deep
learning methods perform better than traditional machine learning approaches. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
proposed techniques that deal with internet memes, which refer to the image + textual modality.
Chauhan et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] extended the M2H2 data by adding parallel English translations and annotating
each entry for emotion and sentiment classes. They also proposed a multimodal multitask
classification using a context transformer with sentiment and emotion embeddings baseline.
They displayed that combining all three modalities led to the best results.
      </p>
      <p>
        The creation of a well-annotated dataset is the backbone of any systematic research. Shahi
[10] presented a semi-automated annotation framework for multilingual, multimodal social
media data. Expertly annotated multimodal and multilabel datasets have also been proposed on
diverse subjects. Chauhan et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and Christ et al. [11] proposed datasets on humour detection
where M2H2 was annotated for numerous occurrences from a well-known Hindi TV show,
and Passau-SFCH was annotated for humour along the sentiment (Positive or Negative) and
direction (towards self or towards others) dimensions, respectively. Gupta et al. [12] presented
3MASSIV, a dataset of about 50,000 expertly annotated multilingual short videos from a sharing
platform called Moj. Other works like Khan et al. [13] present Vyaktitv - a multimodal dataset
consisting of participants’ audio and visual recordings and their Hinglish transcriptions for
personality detection.
      </p>
      <p>To our current understanding, we encountered a challenge in locating a multimodal dataset
that has been annotated to encompass five distinct socio-political labels such as ours.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <p>Details of the YouTubers selected for this study are given in Table 1.</p>
      <p>To build the dataset, we first prepared a list of popular Indian YouTubers creating content
on socially and politically relevant topics. We manually selected 20 YouTubers, each with a
subscriber count of over 330,000, as of February 2023. We then selected 20 of the most viewed
videos from each of these YouTubers, post a manual removal of those videos that were either
irrelevant to politics or had a duration greater than 25 minutes or less than 5 minutes. This
was done in order to ensure consistency among all videos. We then used the YouTube API to
collect the title of the video, the description of the video, the number of likes received and view
count as of February 2023 and the thumbnail for each of the 400 videos. 1 Details of the dataset
are given in Table 1. The audio of the videos was downloaded as a .mp4 file using the pyTube
Python library2, later processed using the Librosa library3.
1Videos uploaded to YouTube fall under its ”fair use” guidelines, which is a legal doctrine that says the use of
copyright-protected material under certain circumstances is allowed without permission from the copyright holder.
In the United States and India works of research may be considered fair use if done fairly (https://support.google.
com/youtube/answer/9783148?hl=en).
2https://github.com/pytube/pytube
3https://github.com/librosa/librosa</p>
      <p>YouTuber
Abhisar Sharma
The Jaipur Dialogues
String
Kumar Shyam
Soch by Mohak Mangal
Sushant Sinha
The Deshbhakt
Kroordarshan
Punya Prasun Bajpai
Sarthak Goswami
Sakshi Joshi
Open Letter
Harsh Vardhan Tripathi
Dhruv Rathee
AKTK
Being Honest
Ajit Anjum
The Manish Thakur Show
The Sham Sharma Show
DO Politics</p>
      <p>Overall Dataset</p>
      <sec id="sec-2-1">
        <title>2.1. Annotations</title>
        <p>In recent years, the use of NLP and ML techniques to study politics has drawn more and more
interest from the research community. Our goal is to promote significant research improvements
across computational and socio-political areas, specifically in the Indian context, using this
multi-task framework and recognising the multimodal complexity of the data.</p>
        <p>We employed two undergraduate students with a strong grip on both Hindi and English to
annotate the videos. We considered five tasks for this annotation process - Humour/Satire,
Opposition/Criticism, Support/Advocacy, and Informational/Analysis.</p>
        <p>We have based our task definitions around a political entity i.e.: directed at or in reference
to. We define a political entity as a politician, a political party, the supporters of a political
party, and political event as any event that is linked to recent socio-political issues. Each of
the videos was then annotated based on the following task definitions which were given to
both the annotators along with an example. Each video was annotated individually for each
task. Examples for each task are given in Figure 1.</p>
        <p>• Task 1: Humour/Satire Over the past few years, researchers have become increasingly
interested in the topic of humour and satire detection on its own. When viewed from a
political angle, humour detection is a potent indicator that can be used to determine how
the general public feels about a situation or an entity. Politicians and content providers
alike can use it as a tactic to interact with and draw the public’s attention. More nuanced
issues like misinformation and manipulation can also be masked as humour, which needs
to be addressed. For the video to fall under this category, there is at least one mention
of a joke, meme, caricature or satirical/sarcastic comment directed at a political entity
or with regards to a political event, either uttered by the creator, visible in the video or
present in the title or description of the video.
• Task 2: Opposition/Criticism In this category, the aim is to detect any explicit
opposition or criticism of a political entity or the actions of the political entity with regard
to a political event. This could take place either as an utterance, demonstrated visually
in the video or textually in the title or description of the video. A common example
is the criticism of ”Godi Media” or pro-government media houses for not focusing on
important issues [14]. Analysis of opposing or critical material is an important factor that
sheds light on public opinion, highlighting the degree of disagreement and ideological
diferences. Issues receiving the most opposition and criticism tend to be those that are
most relevant to the public.
• Task 3: Support/Advocacy Similar to the Opposition/Criticism category, the aim here
is to detect any explicit support or advocacy of a political entity or the actions of the
political entity with regard to a political event. This could take place either as an utterance,
demonstrated visually in the video or textually in the title or description of the video. For
example, the creator could endorse policies introduced by the incumbent politicians. We
expect that studying Task 2 and Task 3 in unison also ofers useful insights with respect
to public opinion fluctuation towards entities and events as well as bias detection.
• Task 4: Informational/Analysis If the nature of the video is informational or is an
analysis of a relevant political entity or event, where the YouTuber explains the events to
the audience using suitable sources (news articles, scholarly publications or government
documents), the video will be informational in nature.
• Task 5: Opinion A video will fall under this category if the YouTuber voices their
personal opinion on any political entity or event, with or without justification. Detecting
when content is an opinion piece vs. factual information can be used in the downstream
modelling task of misinformation detection. This task also has uses in detecting deviation
of public opinion from reality as well as in combating confirmation bias.</p>
        <p>To measure the quality of our annotations, we choose Cohen’s Kappa statistic [15] which
is a measure of the Inter-Annotator Agreement (IAA). The scores obtained for each label are
mentioned in Table 2 and show the presence of high agreement for each of the labels.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Political videos uploaded to YouTube form interesting inputs to both analysis as well as
classification studies. For our research, we have chosen to focus on both of these major tasks:
1. Content Analysis By performing a thorough analysis of the diferent metadata features
extracted using the YouTube Data API, we want to understand both the kind of content
political YouTubers choose to put out as well as the response of the audience to it.
2. Classification We also provide diferent experiments on the multitask, and multimodal
classification of videos into the five labels as described above.</p>
      <sec id="sec-3-1">
        <title>3.1. Content Analysis</title>
        <p>Each feature collected from a video ofers significant insight into the content of the video. For
example, the title and the thumbnail are often designed to captivate the audience’s attention,
since they are the first features spotted. This has set a trend of content creators using ”clickbait”
to mislead viewers, leading to numerous studies on clickbait detection [16, 17, 18]. On the
other hand, comments act as an outlet for the audience’s reaction to the video. Other statistical
features, such as the number of likes and view count, indicate the acceptance and virality of the
videos, respectively. We used topic modelling and keyword extraction to analyze these features.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Topic Modelling</title>
          <p>An unsupervised method for identifying the most meaningful topics from a given corpus of
text is called topic modelling. BERTopic [19], Top2Vec [20] and Latent Dirichlet Allocation
(LDA) [21] are popular topic modelling methods. We have used BERTopic for our study, an
approach that uses a class-based implementation of the TF-IDF method to cluster embeddings
obtained from pre-trained transformers to produce pertinent topics. BERTopic was especially
favoured for its multilingual support [22]. To perform topic modelling on the comments, we
ifrst extracted the 25 most ”relevant” comments along with their timestamps from each of the
400 videos resulting in a corpus of 10,000. BERTopic provides a convenient function to visualize
the topics generated with time. Figure 3 depicts this visualization. One of the biggest spikes was
found in topic 11 in January 2021, which would allude to the farmers’ protest on Republic Day,
leading to a massive nationwide debate 4. Other topics include religion (”hindu”, ”muslim”),
political parties (”congress”, ”bjp”) and specific events ( ”farmer”, ”election”).
4https://en.wikipedia.org/wiki/2021_Indian_farmers%27_Republic_Day_protest</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Keyword Analysis</title>
          <p>Keyword
’modi’
’godi’
’news’
’media’
’bjp’
’week’
’episode’
’noise’
’india’
’top’
’show’
’kumar’
‘adani’
’views’
’badi*’
’rahul’
’explained’
’BJP*’
’gandhi’
’narendra’</p>
          <p>Extraction of ’keywords’ or ’keyphrases’ from a document is a method for the succinct
representations of its content. Keyword extraction is widely used in research areas like opinion
mining, information retrieval systems, document clustering, and other NLP tasks [23]. Many
popular approaches like YAKE! [24, 25], RAKE [26] and KeyBERT [27] are used for keyword
extraction. YAKE!, being a keyword extraction method based on statistical text features, is
domain-independent and language-independent, thus being the ideal choice for our study.
YAKE! returns a list of keywords along with a relevance score; the lower the score, the more
relevant the keyword is to the document. To perform keyword extraction using YAKE!, we
concatenated the title string and the text extracted from the thumbnail using the EasyOCR
Python library5. The concatenated strings were preprocessed by converting them to lowercase
and removing Hindi and English stopwords using the NLTK Python library6. We found the
names of some YouTubers occurring in some of the strings, which we removed, to keep them
from appearing as keywords. Table 4 shows the top 20 keywords extracted from the strings,
along with their relevance scores.
5https://github.com/JaidedAI/EasyOCR
6https://www.nltk.org</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Classification</title>
        <p>For a given video, three modalities were extracted, namely audio( ), text( ) and image( ).
Given these modalities,  ,  ,  our task is to predict the binary values for each of the five
labels.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Feature Extraction</title>
          <p>1. Text The text modality for each video is a combination of its title and description. We first
concatenate the title of the video with its description. We preprocess the concatenated
string to remove whitespaces, punctuations and URLs. In our dataset, the text strings
were either in English, Hindi, Romanized Hindi or a combination of them. To extract the
features from the text in these languages, we utilised MuRIL [28]. MuRIL, or Multilingual
Representations for Indian Languages, is a language model built for 16 Indian languages
and English. MuRIL has been shown to outperform the pre-existing multilingual models,
such as mBERT [29], on many NLP tasks for Indian languages. For each text string  ∈
 , the obtained feature embedding is a ℝ1×  vector. The text embeddings for the entire
dataset is a ℝ×  vector where   is 768.
2. Image The image modality for each video is the thumbnail. We use ConvNext [30], a
purely convolutional vision processing model, to extract the embedding for each
thumbnail. We use the ConvNext-T model pre-trained on the ImageNet-1k dataset. For each
thumbnail  ∈  , ConvNext returns a ℝ1×  ×7×7. The image embeddings for the entire
dataset is a ℝ×  ×7×7 vector. This was reduced to a ℝ×  vector using a max pooling and
a flatten layer, where   is 768.
3. Audio To represent the audio features of each video we extract the following features
using the Librosa library:
• Mel Frequency Cepstral Coeficients (MFCC) 7 are one of the most frequently
used audio characteristics. This feature is produced by performing a cosine
transformation on the power spectrum’s logarithm, which is translated onto the mel scale
as evenly spaced frequency bands. The Mel scale is based on the characteristics of
the human auditory system, which is better able to discern between sounds that are
represented on this scale.
• Chroma STFT is produced using a Fast Fourier Transform (FFT) on the audio and
a series of filters to transform the power spectrum into a chromatic scale.
• Spectral Centroid is taken from each frame of a magnitude spectrogram after
being normalized.
• Spectral Bandwidth is the width of the audio signal’s power spectrum as measured
at a specific level below the peak frequency.
• Rollof represents the frequency below which a certain percentage (85% by default)
of the signal’s total spectral energy is contained for each frame.
• Zero Crossing Rate is a measurement of the audio signal’s frequency content that
shows how many times per second the audio signal crosses the zero axis.
7https://librosa.org/doc/latest/generated/librosa.feature.mfcc.html#librosa.feature.mfcc</p>
          <p>Modalities
Text only
Image only
Audio only
Text + Image
Text + Audio
Audio + Image
Text + Audio + Image</p>
          <p>Representation
MuRIL
ConvNeXT
MFCC*
MuRIL + ConvNeXT
MuRIL + MCFF*
MCFF* + ConvNeXT
MuRIL + MCFF* + ConvNeXT
The collected features for audio thus consist of 20 values of the MFCCs along with 5
values comprising the mean Chroma STFT, Spectral Centroid, Spectral Bandwidth, Rollof
and Zero Crossing Rate. For each audio  ∈  , the obtained features is a ℝ1×  vector. The
overall audio features thus form a ℝ×  vector where   is 25.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>Our multimodal classification is an early fusion model, where we concatenate the embeddings
received from each of the three modalities and feed it to a fully connected neural network
of three layers. We kept the train-test split at 80%. The loss function chosen to minimize
was the Binary Cross Entropy Loss (BCELoss). We also perform ablation studies on diferent
combinations of these modalities, namely unimodal ( ,  ,  ) and bimodal ( + ,  + ,  + ).
The metrics computed using macro-averaging are provided in Table 5.</p>
      <p>Among all the unimodal and bimodal models, it has been noted that the presence of
imagebased features yields the highest Macro-F1 and ROC-AUC scores. This observation finds support
in the practice of YouTubers who strategically model thumbnails and titles to not only ofer a
glimpse of the video’s content but also to efectively draw the audience in.</p>
      <p>The early fusion tri-modal model of  + + performs the best out of all combinations of
modalities, with a Macro-F1 score of 0.8742 and ROC-AUC score of 0.769.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The consumption of content on social media sites, like YouTube, has grown manifold over the
last decade. This has led to an exponential rise in the creation of short-form and long-form
content on various topics, ranging from comedic videos to documentaries. In this study, we
analyzed one such content creation topic, political videos uploaded by independent Indian
content creators. We annotated around 400 videos collected from YouTube for diferent socially
and politically relevant labels. We performed a content analysis on our annotated dataset using
BERTopic for topic modelling and YAKE! for keyword extraction. We also applied an early fusion
multimodal model on the features extracted using state-of-the-art backbone representations,
namely MuRIL for text, ConvNeXT for images, and MCFF, ZCR, Spectral Bandwidth, Chroma
STFT and Spectral Rollof for audio. Our classification model yielded a Macro-F1 score of 0.8742.
Compared to other unimodal and bimodal models, the early fusion model yielded significantly
better results.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future Work</title>
      <p>Future work that focuses on a number of important areas of development will raise the calibre
and scope of this research. Here are some directions we want to go in:
1. Experimentation with other fusion models In this paper, we used an early fusion
model, combining modalities before classification. However, there are alternative fusion
techniques that warrant exploration, such as late fusion models, where each modality is
processed independently before being integrated with other modalities, and
attentionbased fusion models where the importance of diferent modalities is assessed with respect
to the task at hand, or ensemble models, which combines the strengths of multiple
prediction models to improve results.
2. Audio feature extraction The features extracted for audio in this study are numerical
metrics that regrettably fail to capture the nuances of speech, especially code-mixed
Hindi-English speech, which is a predominant mode of communication in India.
Experimenting with other audio feature extraction methods, for example, using transcripts to
capture semantic meaning, Mel-frequency spectrograms to capture phonetic variation or
transformer-based models that are distinguished for their contextual understanding can
ofer more sophisticated results.
3. Extending the dataset We chose to annotate data collected for five tasks for the purposes
of this study. However, the methods of classification and analysis can be extended to
include even more relevant labels that cover more NLP and discourse analysis tasks.
This includes the detection of hate speech towards marginalised communities veiled as
opinions, misinformation and fake news detection or the spread and polarization of public
opinions over time.
4. Multilingual and cross-regional support While our selection procedure primarily
focused on YouTube channels that ofered content in Hindi or English, it’s important to
recognise that a more inclusive approach is necessary for a thorough representation of
India’s political environment. To adequately capture the complex and varied political
narratives that arise across the nation’s various linguistic and cultural realms,
regionspecific content must be included.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This material is based upon work supported by the Google Cloud Research Credits program
with the award EDU Credit wilsonjessica 273571576.
[10] G. K. Shahi, AMUSED: An Annotation Framework of Multi-modal Social Media Data, 2021.</p>
      <p>URL: http://arxiv.org/abs/2010.00502, arXiv:2010.00502 [cs].
[11] L. Christ, S. Amiriparian, A. Kathan, N. Müller, A. König, B. W. Schuller, Multimodal
Prediction of Spontaneous Humour: A Novel Dataset and First Results, 2022. URL: http:
//arxiv.org/abs/2209.14272, arXiv:2209.14272 [cs, eess].
[12] V. Gupta, T. Mittal, P. Mathur, V. Mishra, M. Maheshwari, A. Bera, D. Mukherjee,
D. Manocha, 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social
Media Short Videos, 2022. URL: http://arxiv.org/abs/2203.14456, arXiv:2203.14456 [cs].
[13] S. N. Khan, M. Leekha, J. Shukla, R. R. Shah, Vyaktitv: A Multimodal Peer-to-Peer Hindi
Conversations based Dataset for Personality Assessment, 2020. URL: http://arxiv.org/abs/
2008.13769, arXiv:2008.13769 [cs].
[14] M. Choubey, Citizen journalism raises hope amid corona virus threats in india, Jamshedpur</p>
      <p>Res Rev ii (xxxxxi) (2020) 43–49.
[15] M. L. McHugh, Interrater reliability: the kappa statistic, Biochemia medica 22 (2012)
276–282.
[16] S. Zannettou, S. Chatzis, K. Papadamou, M. Sirivianos, The good, the bad and the bait:
Detecting and characterizing clickbait on youtube, in: 2018 IEEE Security and Privacy
Workshops (SPW), IEEE, 2018, pp. 63–69.
[17] L. Shang, D. Y. Zhang, M. Wang, S. Lai, D. Wang, Towards reliable online clickbait video
detection: A content-agnostic approach, Knowledge-Based Systems 182 (2019) 104851.
[18] R. Gothankar, F. D. Troia, M. Stamp, Clickbait detection for youtube videos, in: Artificial</p>
      <p>Intelligence for Cybersecurity, Springer, 2022, pp. 261–284.
[19] M. Grootendorst, BERTopic: Neural topic modeling with a class-based TF-IDF procedure,
2022. URL: http://arxiv.org/abs/2203.05794, arXiv:2203.05794 [cs].
[20] D. Angelov, Top2Vec: Distributed Representations of Topics, 2020. URL: http://arxiv.org/
abs/2008.09470. doi:10.48550/arXiv.2008.09470, arXiv:2008.09470 [cs, stat].
[21] D. M. Blei, A. Y. Ng, M. I. Jordan, Latent dirichlet allocation, J. Mach. Learn. Res. 3 (2003)
993–1022.
[22] R. Egger, J. Yu, A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic
to Demystify Twitter Posts, Frontiers in Sociology 7 (2022) 886498. URL: https://www.ncbi.
nlm.nih.gov/pmc/articles/PMC9120935/. doi:10.3389/fsoc.2022.886498.
[23] S. Beliga, Keyword extraction: a review of methods and approaches, University of Rijeka,</p>
      <p>Department of Informatics, Rijeka 1 (2014).
[24] R. Campos, V. Mangaravite, A. Pasquali, A. M. Jorge, C. Nunes, A. Jatowt, YAKE!
Collection-Independent Automatic Keyword Extractor, in: G. Pasi, B. Piwowarski,
L. Azzopardi, A. Hanbury (Eds.), Advances in Information Retrieval, Lecture Notes
in Computer Science, Springer International Publishing, Cham, 2018, pp. 806–810.
doi:10.1007/978- 3- 319- 76941- 7_80.
[25] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, A. Jatowt, YAKE!
Keyword extraction from single documents using multiple local features, Information
Sciences 509 (2020) 257–289. URL: https://www.sciencedirect.com/science/article/pii/
S0020025519308588. doi:10.1016/j.ins.2019.09.013.
[26] S. Rose, D. Engel, N. Cramer, W. Cowley, Automatic Keyword Extraction from Individual
Documents, in: M. W. Berry, J. Kogan (Eds.), Text Mining, John Wiley &amp; Sons, Ltd,
Chichester, UK, 2010, pp. 1–20. URL: https://onlinelibrary.wiley.com/doi/10.1002/9780470689646.
ch1. doi:10.1002/9780470689646.ch1.
[27] M. Grootendorst, Keybert: Minimal keyword extraction with bert., 2020. URL: https:
//doi.org/10.5281/zenodo.4461265. doi:10.5281/zenodo.4461265.
[28] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. K. Margam, P. Aggarwal,
R. T. Nagipogu, S. Dave, S. Gupta, S. C. B. Gali, V. Subramanian, P. Talukdar, MuRIL:
Multilingual Representations for Indian Languages, 2021. URL: http://arxiv.org/abs/2103.
10730. doi:10.48550/arXiv.2103.10730, arXiv:2103.10730 [cs].
[29] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding, 2019. URL: http://arxiv.org/abs/1810.04805.
doi:10.48550/arXiv.1810.04805, arXiv:1810.04805 [cs].
[30] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A ConvNet for the 2020s, 2022.</p>
      <p>URL: http://arxiv.org/abs/2201.03545. doi:10.48550/arXiv.2201.03545, arXiv:2201.03545
[cs].</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. H.</given-names>
            <surname>Yun</surname>
          </string-name>
          ,
          <article-title>The popularity of eating broadcast: Content analysis of “mukbang” YouTube videos, media coverage, and the health impact of “mukbang” on public</article-title>
          ,
          <source>Health Informatics Journal</source>
          <volume>26</volume>
          (
          <year>2020</year>
          )
          <fpage>2237</fpage>
          -
          <lpage>2248</lpage>
          . URL: http://journals.sagepub.com/ doi/10.1177/1460458220901360. doi:
          <volume>10</volume>
          .1177/1460458220901360.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Papadamou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zannettou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Blackburn</surname>
          </string-name>
          , E. De Cristofaro, G. Stringhini, M. Sirivianos, ”
          <article-title>How over is it?” Understanding the Incel Community on YouTube</article-title>
          ,
          <source>Proceedings of the ACM on Human-Computer Interaction</source>
          <volume>5</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          . URL: https://dl.acm.org/doi/10.1145/ 3479556. doi:
          <volume>10</volume>
          .1145/3479556.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alshamrani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abusnaina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abuhamad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mohaisen</surname>
          </string-name>
          , Hate, Obscenity, and
          <article-title>Insults: Measuring the Exposure of Children to Inappropriate Comments in YouTube</article-title>
          , in:
          <source>Companion Proceedings of the Web Conference</source>
          <year>2021</year>
          , WWW '21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , pp.
          <fpage>508</fpage>
          -
          <lpage>515</lpage>
          . URL: https://doi.org/10. 1145/3442442.3452314. doi:
          <volume>10</volume>
          .1145/3442442.3452314.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sampath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thenmozhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thangasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nallathambi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Dataset for Identification of Homophobia and</article-title>
          Transophobia in Multilingual YouTube Comments,
          <year>2021</year>
          . URL: http://arxiv.org/abs/2109.00227, arXiv:
          <fpage>2109</fpage>
          .00227 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Latorre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Amores</surname>
          </string-name>
          ,
          <article-title>Topic modelling of racist and xenophobic YouTube comments</article-title>
          .
          <article-title>Analyzing hate speech against migrants and refugees spread through YouTube in Spanish</article-title>
          ,
          <source>in: Ninth International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM'21)</source>
          , TEEM'21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , pp.
          <fpage>456</fpage>
          -
          <lpage>460</lpage>
          . URL: https://doi.org/10.1145/3486011.3486494. doi:
          <volume>10</volume>
          .1145/3486011.3486494.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yousaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nawaz</surname>
          </string-name>
          ,
          <article-title>A Deep Learning-Based Approach for Inappropriate Content Detection and Classification of YouTube Videos</article-title>
          ,
          <source>IEEE Access 10</source>
          (
          <year>2022</year>
          )
          <fpage>16283</fpage>
          -
          <lpage>16298</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ACCESS.
          <year>2022</year>
          .
          <volume>3147519</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ma</surname>
          </string-name>
          , A. Zubiaga, NUAA-QMUL at SemEval-2020 Task 8:
          <string-name>
            <surname>Utilizing</surname>
            <given-names>BERT</given-names>
          </string-name>
          and
          <article-title>DenseNet for Internet Meme Emotion Analysis</article-title>
          ,
          <source>in: Proceedings of the Fourteenth Workshop on Semantic Evaluation</source>
          , International Committee for Computational Linguistics,
          <source>Barcelona (online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>901</fpage>
          -
          <lpage>907</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .semeval-
          <volume>1</volume>
          .114. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .semeval-
          <volume>1</volume>
          .
          <fpage>114</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Maity</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          ,
          <article-title>A Multitask Framework for Sentiment, Emotion and Sarcasm aware Cyberbullying Detection from Multi-modal Code-Mixed Memes</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '22,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , pp.
          <fpage>1739</fpage>
          -
          <lpage>1749</lpage>
          . URL: https://doi.org/10.1145/3477495.3531925. doi:
          <volume>10</volume>
          .1145/3477495.3531925.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Chauhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. V.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekbal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          , L.-p. Morency,
          <string-name>
            <surname>S. Poria,</surname>
          </string-name>
          <article-title>M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations</article-title>
          ,
          <source>in: Proceedings of the 2021 International Conference on Multimodal Interaction</source>
          , ICMI '21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , pp.
          <fpage>773</fpage>
          -
          <lpage>777</lpage>
          . URL: https://doi.org/10.1145/3462244.3479959. doi:
          <volume>10</volume>
          .1145/3462244.3479959.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>