DANKMEMES @ EVALITA 2020:
                                      The Memeing of Life:
                                  Memes, Multimodality and Politics
                         Martina Miliani1,2 and Giulia Giorgi3 and Ilir Rama3
                            and Guido Anselmi3 and Gianluca E. Lebani4
                                        1
                               University for Foreigners of Siena
      2
      CoLing Lab, Department of Philology, Literature, and Linguistics, University of Pisa
              3
                Department of Social and Political Sciences, University of Milan
4
  Department of Linguistics and Comparative Cultural Studies, Ca’ Foscari University of Venice
          martina.miliani@fileli.unipi.it, giulia.giorgi@unito.it
                ilir.rama@unimi.it, guido.anselmi@unimi.it,
                         gianluca.lebani@unive.it


                         Abstract                                necessity to handle massive quantities of visual
                                                                 data (Tanaka et al., 2014) by leveraging on au-
     DANKMEMES is a shared task proposed
                                                                 tomated approaches. Efforts in this direction fo-
     for the 2020 EVALITA campaign, focus-
                                                                 cused on the generation of memes (Peirson V and
     ing on the automatic classification of In-
                                                                 Tolunay, 2018; Gonçalo Oliveira et al., 2016) and
     ternet memes. Providing a corpus of
                                                                 on automated sentiment analysis (French, 2017),
     2.361 memes on the 2019 Italian Gov-
                                                                 while stressing the need for a multimodal ap-
     ernment Crisis, DANKMEMES features
                                                                 proach able to contextually consider both visual
     three tasks: A) Meme Detection, B) Hate
                                                                 and textual information (Sharma et al., 2020;
     Speech Identification, and C) Event Clus-
                                                                 Smitha et al., 2018).
     tering. Overall, 5 groups took part in the
                                                                    As manual labelling becomes unfeasible on a
     first task, 2 in the second and 1 in the
                                                                 large scale, scholars require tools able to classify
     third. The best system was proposed by
                                                                 the huge amount of memetic content continuously
     the UniTor group and achieved a F1 score
                                                                 produced on the web. The main goal of our shared
     of 0.8501 for task A, 0.8235 for task B and
                                                                 task is to evaluate a range of technologies that can
     0.2657 for task C. In this report, we de-
                                                                 be used to automatize the process of meme recog-
     scribe how the task was set up, we report
                                                                 nition and sorting with an acceptable degree of re-
     the system results and we discuss them.
                                                                 liability.
 1    Introduction
                                                                 2   Task Description
 Internet memes are understood as “pieces of cul-
 ture, typically jokes, which gain influence through             The DANKMEMES task, presented at the 2020
 online transmission” (Davison, 2012). Specifi-                  EVALITA campaign (Basile et al., 2020), encom-
 cally, a meme is a multimodal artefact manipu-                  passes three subtasks, aimed at: detecting memes
 lated by users, who merges intertextual elements                (Task A), detecting the hate speech in memes
 to convey an ironic message. Featuring a visual                 (Task B) and clustering memes according to events
 format that includes images, texts or a combina-                (Task C). Participants could decide to take part in
 tion of them, memes combine references to cur-                  one or more of these tasks, with the only recom-
 rent events or relatable situations and pop-cultural            mendation that Task 1 functions as the compulsory
 references to music, comics and movies (Ross and                preliminary step for the other two tasks.
 Rivers, 2017).
    The pervasiveness of meme production and cir-                Task A: Meme Detection. The lack of consen-
 culation across different platforms increases the               sus around what defines a meme (Shifman, 2013)
                                                                 led to different definitions, focusing on circulation
      Copyright © 2020 for this paper by its authors. Use per-
 mitted under Creative Commons License Attribution 4.0 In-       (Davison, 2012; Dawkins, 2016), formal features
 ternational (CC BY 4.0).                                        (Milner, 2016), or content (Gal et al., 2016; Kno-
bel and Lankshear, 2007). For this dataset, manual           Label     Description
coding focused both on formal aspects (such as                 0       Residual category
layout, multimodality and manipulation) as well                1       Beginning of the government crisis
as content, e.g. ironic intent (Giorgi and Rama,                       Conte’s speech and beginning of con-
                                                               2
2019); the exponential increase in visual produc-                      sultations
tion, however, warrants an automated approach,                         Conte is called to form a new govern-
                                                               3
which might be able to further tap into stable and                     ment
generalizable aspects of memes, considering form,                      5SM holds a vote on the platform
                                                               4
content and circulation. Given the dataset minus                       Rousseau
the variable strictly related to memetic status, par-
ticipants must provide a binary classification, dis-     Table 1: Categories for Task C: Event Clustering.
tinguishing memes (1) from non memes (0).
                                                         niques to cluster the memes, so that memes pin-
Task B: Hate Speech Identification. Hate                 pointing to the same events are classified in the
speech became a relevant issue for social media          same cluster.
platforms. Even though the automatic classifi-
cation of posts may lead to censorship of non-           3     Dataset
offensive content (Gillespie, 2018), the use of ma-      3.1       Composition of the dataset
chine learning techniques became more and more
crucial, since manual filtering is a very time con-      The DANKMEMES dataset is comprised of 2,361
suming task for the annotators (Zampieri et al.,         images (for each subtask a specific dataset was
2019b). Recent studies have also shown that mul-         provided), automatically extracted from Instagram
timodal analysis is fundamental in such a task           through a Python script aimed at the hashtag
(Sabat et al., 2019). In this direction, SemEval         related to the Italian government crisis (“#cri-
2020 proposed the “Memotion Analysis” among              sidigoverno”). The corpus includes 367 offensive
its tasks, to classify sarcastic, humorous, and of-      political memes unrelated to the government cri-
fensive meme (Sharma et al., 2020). This kind            sis, and aimed at augmenting and balancing the
of analysis assumes a specific relevance when ap-        dataset for task 2.
plied to political content. Memes about political        3.2       Annotation of the dataset
topics are a powerful tool of political criticism
                                                         For each image of the dataset we provide both the
(Plevriti, 2014). For these reasons, the proposed
                                                         name of the .jpg image file, the date of publication
task aims at detecting memes with offensive con-
                                                         and the engagement, i.e. the number of comments
tent. Following Zampieri (2019a) definition, an
                                                         and likes of the post. The dataset also includes im-
offensive meme contains any form of profanity or
                                                         age embeddings. The vector representations are
a targeted offense, veiled or direct, such as insults,
                                                         computed employing ResNet (He et al., 2016), a
threats, profane language or swear words. Thus,
                                                         state-of-the-art model for image recognition based
the second task consists in a binary classification,
                                                         on Deep Residual Learning. Providing such image
where systems have to predict whether a meme is
                                                         representations allows the participants to approach
offensive (1) or not (0).
                                                         these multimodal tasks focusing primarily on its
Task C: Event Clustering. Social media react             NLP aspects (Kiela and Bottou, 2014). The anno-
to the real world, by commenting in real-time to         tation process involved two Italian native speak-
mediatised events in a way that disrupts traditional     ers, who study memes at an academic level, and
usage patterns (Al Nashmi, 2018). The ability to         focused on detecting and labelling 7 relevant cate-
understand which events are represented and how,         gories:
then, becomes relevant in the context of an hyper-            • Macro status: refers to meme layouts and
productive Internet.                                            their relation to diffused, conventionalised
The goal of the third subtask is to cluster a set of            formats called macros. The category has 0
memes that may be or may be not related to the                  and 1 as labels, where the value 1 represents
2019 Italian government crisis into five event cat-             well-known memetic frames, characters and
egories (see Table 1).                                          layouts (e.g. Pepe the Frog). The identifica-
   Participants’ goal is to apply supervised tech-              tion of macros relied both on external sources
      (e.g. the website ”Know Your Meme”) and
      the annotators’ literacy on memes.

   • Picture manipulation: entails the degree
     of visual modification of the images. Non-
     manipulated or low impact changes are la-
     beled 0 (e.g. the addition of a text or a logo).
     Heavily manipulated, impactful changes (e.g.
     images edited to include political actors) are
     labeled 1.

   • Visual actors: the political actors (i.e. politi-
     cians, parties’ logos) portrayed visually, re-
     gardless whether edited into the picture or
     portrayed in the original image.

   • Text: the textual content of the image
     has been extracted through optical character
     recognition (OCR) using Google’s Tesseract-
     OCR Engine, and further manually corrected.               Figure 1: Two examples from the dataset
                                                               for Meme Detection: the image at the top
   • Meme: binary feature, where 0 represents                  is a meme, whereas the image at the bot-
     non meme images and 1 meme images. This                   tom is not a meme.
     is the target label for Task A.
                                                         Dataset for Meme Detection (Task A). The
   • Hate Speech: binary feature only for memes.
                                                         whole dataset counts 2,000 images, half memes
     It differentiates memes with offensive lan-
                                                         and half not (see Figure 1 for an example). We
     guage (1) from non offensive memes (0).
                                                         split the dataset into training and test sets, in a pro-
     This is the target label for Task B.
                                                         portion of 80-20% of items. Table 2 represents the
   • Event: it is a feature only for meme images,        format of the training dataset. The test dataset has
     categorizing them according to 4 events (de-        been provided without gold labels, i.e. without the
     scribed in 4), plus a residual category labeled     “Meme” attribute.
     as 0. This is the target label for Task C.          Dataset for Hate Speech Identification (Task B).
                                                         The whole dataset counts 1,000 memes (see Fig-
   The final inter-annotator agreement (IAA) has         ure 2 for an example). We split the dataset into
been calculated by two of the authors on a subset        training and test sets, in a proportion of 80-20% of
of the dataset through Krippendorff’s alpha (Krip-       items. Table 3 represents the format of the training
pendorff, 2018). Four features have been consid-         dataset. The test dataset has been provided without
ered: Macro status (α = 0.755), Picture manipu-          the gold label “Hate Speech” for testing purposes.
lation (α = 0.930), Hate Speech (α = 0.741) and
Meme (α = 0.884). Other features were either ob-         Dataset for Event Clustering (Task C). The
jective (i.e. Visual and textual actors) or inferred     whole dataset counts 1,000 memes (see Figure 3
from external data (i.e. events).                        for an example). We split the dataset into training
   Participants were allowed to use external re-         and test sets, in a proportion of 80-20% of items.
sources, lexicons or independently annotated data.       Table 4 shows the format of the training set. The
Given that, although we provided ResNet image            test set has been provided without gold labels (i.e.
embeddings, participants could make use of any           without the “Event” attribute) for testing purposes.
other image representations.
                                                         3.4    Data release
3.3   Training and Test Data                             Both the training and the test sets were released on
The initial dataset was split into three datasets, one   our website and protected with a password. As de-
for each task, structured as follows:                    scribed in Section 3.3, the development data con-
 File         Engagement         Date              Manip.         Visual          Text               Meme
 1.jpg        21,053             22/08/19          1              Conte           aiuto              0
 56.jpg       114                22/08/19          0              Salvini         alle solite        1

                   Table 2: An excerpt from the dataset for Task A, Meme Detection.

 File           Engagement            Manip.            Visual          Text                    Hate Speech
 62.jpg         21,053                1                 Conte           aiuto                   0
 114.jpg        12,572                1                 Salvini         merdman                 1

              Table 3: An excerpt from the dataset for Task B, Hate Speech Identification.

 File          Engagement         Date           Macro       Manip.     Visual       Text             Event
 43.jpg        21,053             22/08/19       1           1          Conte        aiuto            1
 23.jpg        114                22/08/19       1           0          Salvini      alle solite      0
 114.jpg       12,572             25/08/19       0           1          Salvini      merdman          2

                   Table 4: An excerpt from the dataset for Task C, Event Clustering.

 Team Name                  Affiliation                                                             Task
 DMT                        RN Podar School                                                         A
 Keila                      Dipartimento di Matematica e Informatica di Perugia                     A
 UniTor                     Università degli Studi di Roma ”Tor Vergata”                           A,B,C
 UPB                        Univesity Politehnica of Bucharest                                      A,B
 SNK                        ETI3                                                                    A

           Table 5: Participants along with their affiliations and the tasks they participated in.


sisted of three distinct datasets, one for each task.       All material was released for non-commercial
The participants could download a distinct folder        research purposes only under a Creative Common
for each task, which contained:                          license (BY-NC-ND 4.0). Any use for statistical,
                                                         propagandistic or advertising purposes of any kind
   • A UTF-8 encoded comma separated “.csv”
                                                         is prohibited. It is not possible to modify, alter or
     file with 800 items (1,600 for task A), con-
                                                         enrich the data provided for the purposes of redis-
     taining the metadata described in Section 3.3;
                                                         tribution.
   • A folder containing the images in .jpg format;
                                                         4   Evaluation Measures
   • A .csv file containing the relative image em-
     beddings.                                           For all tasks, the models have been evaluated with
                                                         P recision, Recall, and F1 scores defined as fol-
  As for the test data, we released three folders        lows:
whose structure is similar to the ones of the train-
ing sets. Each folder for the train sets contains:                                      TP
                                                                      P recision =
                                                                                      TP + FP
   • A UTF-8 encoded comma separated “.csv”
     file with 200 items (400 for Task A), which                                     TP
     features the same metadata of the corre-                          Recall =
                                                                                   TP + FN
     sponding training set minus the golden label
     (i.e. “Meme” for Task A, “Hate speech” for                              P recision × Recall
     Task B and “Event” for Task C);                              F1 = 2 ×
                                                                             P recision + Recall
   • A folder containing the images in .jpg format;
                                                         where T P are true positives, and F N and F P
   • A .csv file containing the relative image em-       are false negatives and false positives, respec-
     beddings.                                           tively. We computed P recision, Recall, and F1
                                                       contains at least a swear word1 .

                                                       Task C: Event Clustering. The baseline is
                                                       given by the performance of a classifier labeling
                                                       every meme as belonging to the most numerous
                                                       class (i.e. the residual one).

                                                       5     Participants and Results
                                                       In total, 16 teams registered for DANKMEMES,
                                                       and five of them participated in at least one of
                                                       the tasks: DankMemesTeam (DMT) (Setpal and
                                                       Sarti, 2020), Keila, UPB (Vlad et al., 2020), SNK
                                                       (Fiorucci, 2020), and UniTor (Breazzano et al.,
                                                       2020).
                                                          All of the 5 teams participated in Task A, while
                                                       2 teams participated in Task B and 1 in Task C.
                                                       Participants could submit up to two runs per task:
                                                       all of the teams did so consistently across tasks,
                                                       with the exception of one team submitting a single
                                                       run in Task A. This amounts to 9 runs for Task A,
                                                       4 for Task B and 2 for Task C, as detailed in Table
                                                       5.

                                                       Task A: Meme Detection. Task A consisted in
                                                       differentiating between a meme and a not-meme.
                                                       Five teams presented a total of 9 runs, as detailed
                                                       in Table 6. The best scores have been achieved
                                                       by the UniTor team with an F1 -measure of 0.8501
                                                       (with a Precision score of 0.8522 and a Recall
                                                       measure of 0.848). The SNK and UPB teams fol-
    Figure 2: Two examples from the dataset
                                                       lowed closely, but all teams consistently showed a
    for Hate Speech Identification: the meme
                                                       drastic improvement over the baseline.
    at the top is classified as hate speech con-
    tent, whereas the meme at the bottom is                Team     Run    Recall    Precision     F1
    not.
                                                           Unitor    2     0.8522      0.848     0.8501
                                                           SNK       1     0.8515     0.8431     0.8473
for Task A and Task B considering only the pos-            UPB       2     0.8543     0.8333     0.8437
itive class. For what concerns Task C, which is            Unitor    1     0.839      0.8431     0.8411
a multiclass classification task, we computed the          SNK       2     0.8317      0.848     0.8398
performance for each class and then calculated the         UPB       1     0.861      0.7892     0.8235
macro-average over all classes.                            DMT       1     0.8249     0.7157     07664
                                                           Keila     1     0.8121     0.6569     0.7263
   Different baselines were used for the different
                                                           Keila     2     0.7389      0.652     0.6927
tasks:
                                                           baseline  1     0.525      0.5147     0.5198

Task A: Meme Detection. The baseline is given                      Table 6: Results of Task A.
by the performance of a random classifier, which
labels 50% of images as meme.                          Task B: Hate Speech Identification. Task B
                                                       consisted in the identification of whether a meme
                                                           1
Task B: Hate Speech Identification. The base-                The list of swear words was downloaded
                                                       from:          https://www.freewebheaders.com/
line is given by the performance of a classifier la-   italian-bad-words-list-and-swear-words/
beling a meme as offensive when the meme text          (last access: 2nd November 2020).
               (a)                       (b)                       (c)                         (d)

Figure 3: Examples of memes from the dataset for Event Clustering task. Each meme refers to an event:
(a) Beginning of the governement crisis; (b) Conte’s speech and beginning of consultations; (c) Conte is
called to form a new government; (d) 5SM holds a vote on the platform Rousseau.


is offensive or not. As detailed in Table 7, 2 teams    cation framework, exploitation of available fea-
participated in this task for a total of 4 runs (2      tures, multimodality of the adopted approaches,
each). The best scores are achieved by the UniTor       exploitation of further annotated data, and use of
team for the F1 -measure at 0.823 and the Recall        external resources. Since this is the first task
score of 0.8667, while the UPB team scored the          about memes within the EVALITA campaign, we
best Precision measure at 0.8056. The scores im-        could not compare the obtained results with those
prove over the baseline consistently across teams       achieved in any previous edition. A task about
for what concerns the Recall score and the F1 -         memes, Memotion, has been organized under Se-
measure, while the Precision measure was not            mEval 2020 (Sharma et al., 2020). However,
reached by any participant.                             the Memotion subtasks (Sentiment Classification,
                                                        Humor Classification, and Scales of Semantic
    Team        Run    Recall   Precision        F1     Classes) are quite different from those presented
    UniTor       2     0.7845    0.8667        0.8235   in DANKMEMES, and the results are hardly com-
    UniTor       1     0.7686    0.8857        0.823    parable.
    UPB          1     0.8056    0.8286        0.8169
    UPB          2     0.8333    0.7143        0.7692
    baseline     1     0.8958    0.4095        0.5621   System architecture. All the submitted runs to
                                                        DANKMEMES leverage on neural networks, in-
               Table 7: Results of Task B.              cluding very simple but equally efficient architec-
                                                        tures. Multi-Layer Perceptrons (MLP) have been
Task C: Event Clustering. Task C consisted in
                                                        adopted by UniTor and SNK, ranked first and sec-
clustering memes into 5 events using supervised
                                                        ond in the the Meme Detection task, respectively.
classification. As seen in Table 8, a single team
                                                        UPB adopted a Vocabulary Graph Convolutional
participated with 2 runs: the best score is there-
                                                        Network (VGCN) combined with BERT contex-
fore that of the UniTor team, with an F1 -score of
                                                        tual embeddings for text analysis. This team em-
0.2657.
                                                        ployed this architectural design within a Multi-
    Team        Run    Recall   Precision        F1     Task Learning (MTL) technique, based on two
    UniTor       1     0.2683    0.2851        0.2657   main neural network components: one for the text
    UniTor       2     0.2096    0.2548        0.2183   and the other for the image analysis. The out-
    baseline     1      0.096      0.2         0.1297   puts of these two elements were concatenated and
                                                        used to feed a Dense layer. The system in DMT is
               Table 8: Results of Task C.              composed of three 8-layer feed-forward networks,
                                                        each taking as input a different image vector repre-
6     Discussion
                                                        sentation. Finally, Keila exploited Convolutional
We compare the participating systems accord-            Neural Networks (CNN) in each of the submitted
ing to the following main dimensions: classifi-         run.
External resources. All the presented models            Data Augmentation. Several participants chose
employed external resources to feed their neu-          to adopt a data augmentation technique. Uni-
ral architecture with image and text representa-        Tor successfully manipulated the provided images
tions. The text contained in the images was en-         by horizontally mirroring them. On the contrary,
coded by using different flavours of word embed-        DMT created nine versions of each image at first,
dings. Most of the participants exploited one of        editing brightness, rotation, and zoom, but then
the available BERT contextual embeddings model          dropped them due to the overfitting caused by the
for the Italian language (AlBERTo, UmBERTo,             unmodified metadata associated with each image.
or GilBERTo). However, with its first run, SNK          Keila augmented textual data by firstly translating
achieved the second position in the Meme Detec-         the image texts in English and then back to Italian.
tion task using the pre-trained FastText embed-         Regarding the second task on Hate Speech Identi-
dings for the Italian language. Similarly, Keila        fication, UniTor trained for a few epochs the Um-
adopted pre-trained Word2Vec for the Italian lan-       BERTo embeddings on a dataset made available
guage, though achieving lower results. As for the       within the Hate Speech Detection (HaSpeeDe)
visual channel, the DANKMEMES datasets pro-             task (Bosco et al., 2018) before training it on the
vided a state-of-the-art representation of images,      DANKMEMES dataset.
obtained with the ResNet50 architecture. Most
of the participants experimented the use of other       Exploited features. SNK encoded and concate-
image vector representations as well: DMT used          nated in a single vector picture manipulation, vi-
three different image vector: AlexNet, ResNet,          sual, and engagement, along with the sentence and
and DenseNet; UniTor and UPB examined sev-              the image representation of each meme. Keila em-
eral models, among which: EfficientNET, VGG-            ployed engagement and manipulation features as
16, YOLOv4, ResNet50, and ResNet152. Un-                well. DMT normalized engagement and repre-
iTor chose EfficientNet for their final models,         sented dates with the count of days from a selected
while UPB based their ssystems on ResNet50 and          reference date. Along with the other provided
ResNet152.                                              data, temporal features were exploited by UPB as
                                                        well, through the computation of complementary
Multimodality. The exploitation of both images          sine and cosine distances, in order to preserve the
and text turned out to be fundamental for the task      cyclic characteristics of days and months. Finally,
of Meme Detection. Since memes adhere to spe-           UniTor relied only on visual and textual informa-
cific visual conventions, participants tried to ex-     tion.
ploit visual data at their best. The first run of Un-
                                                        Event Clustering. The goal of this task was to
iTor only relied on an image classifier, whereas
                                                        assign each meme to the event it refers to. Only
DMT exploited the information resulting from
                                                        UniTor participated in this task, modeling it as a
three different image classification models, then
                                                        classification problem in two distinguished runs.
combined with word embeddings. Nevertheless,
                                                        The first model only exploited textual data rep-
the best results were obtained by the combina-
                                                        resentation provided by the Transformer architec-
tion of text and image information. In its sec-
                                                        ture to feed the MLP classifier. Furthermore, Uni-
ond run, UniTor concatenated the image repre-
                                                        Tor submitted a second run. The team mapped the
sentation returned by their first model with pre-
                                                        original classification problem, which counted five
trained contextual word embeddings fine-tuned on
                                                        different labels (each corresponding to an event)
DANKMEMES data. Similarly, SNK and UPB
                                                        over a binary classification one. After pairing a
leveraged both textual and image data. Keila was
                                                        meme to each event, a pair was labeled as positive
the only participant who did not combine text and
                                                        if the association was correct, negative otherwise.
image information in any of the submitted runs.
                                                        However, this run did not overpass the first one,
For what concerns the second task, the first Uni-
                                                        the outcome of which doubled the provided base-
Tor run only relied on textual data and was slightly
                                                        line.
overcame only by their second run. As observed
by the team, in the Hate Speech Identification task,    7   Final Remarks
textual data heavily impact the classification re-
sults. Finally, UPB combined both image and tex-        The paper describes a task for the detection
tual data for this task.                                and analysis of memes in the Italian language.
DANKMEMES is the first task of this kind in                 Di Maro, and Lucia C. Passaro, editors, Proceedings
the EVALITA campaign. Although memes are                    of Seventh Evaluation Campaign of Natural Lan-
                                                            guage Processing and Speech Tools for Italian. Fi-
widespread on the Web, it is still hard to define
                                                            nal Workshop (EVALITA 2020), Online. CEUR.org.
them precisely. However, DANKMEMES high-
lighted the fundamental role of multimodality in          Jean French. 2017. Image-based memes as sentiment
                                                             predictors. In 2017 International Conference on In-
memes detection, mainly the combined use of
                                                             formation Society (i-Society).
texts and images for their classification. There-
fore, we could say that memes share peculiar lin-         Noam Gal, Limor Shifman, and Zohar Kampf. 2016.
                                                            “it gets better”: Internet memes and the construc-
guistic features, other than conventional layouts.          tion of collective identity. New media & society,
Future work will focus on the extension of the              18(8):1698–1714.
dataset, which showed some limitations, espe-
                                                          Tarleton Gillespie. 2018. Custodians of the Internet:
cially for its reduced size and for the unbalanced          Platforms, content moderation, and the hidden de-
representation of some events. This is due to the           cisions that shape social media. Yale University
difficulty of meme collection, especially when fil-         Press.
tered in relation to a specific event (e.g., the 2019     Giulia Giorgi and Ilir Rama. 2019. “one does not
Italian government crisis).                                 simply meme”. framing the 2019 italian government
                                                            crisis through memes. In La comunicazione po-
                                                            litica nell’ecosistema dei media digitali Convegno
References                                                  dell’Associazione Italiana di Comunicazione Polit-
                                                            ica (ASSOCOMPOL).
Eisa Al Nashmi. 2018. From selfies to media events:
   How instagram users interrupted their routines af-     Hugo Gonçalo Oliveira, Diogo Costa, and Alexandre
   ter the charlie hebdo shootings. Digital Journalism,     Pinto. 2016. One does not simply produce funny
   6(1):98–117.                                             memes! – explorations on the automatic generation
                                                            of internet humor. In Proceedings of the Seventh In-
Valerio Basile, Danilo Croce, Maria Di Maro, and Lu-        ternational Conference on Computational Creativity
  cia C. Passaro. 2020. Evalita 2020: Overview              (ICCC 2016).
  of the 7th evaluation campaign of natural language
  processing and speech tools for italian. In Valerio     Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
  Basile, Danilo Croce, Maria Di Maro, and Lucia C.         Sun. 2016. Deep residual learning for image recog-
  Passaro, editors, Proceedings of Seventh Evalua-          nition. In The IEEE Conference on Computer Vision
  tion Campaign of Natural Language Processing and          and Pattern Recognition (CVPR), pages 770–778.
  Speech Tools for Italian. Final Workshop (EVALITA       Douwe Kiela and Léon Bottou. 2014. Learning image
  2020), Online. CEUR.org.                                  embeddings using convolutional neural networks for
                                                            improved multi-modal semantics. In Proceedings of
Cristina Bosco, Dell’Orletta Felice, Fabio Poletto,         the 2014 Conference on Empirical Methods in Nat-
  Manuela Sanguinetti, and Tesconi Maurizio. 2018.          ural Language Processing (EMNLP), pages 36–45.
  Overview of the evalita 2018 hate speech detection
  task. In EVALITA 2018-Sixth Evaluation Campaign         Michele Knobel and Colin Lankshear. 2007. Online
  of Natural Language Processing and Speech Tools           memes, affinities, and cultural production. A new
  for Italian, pages 1–9.                                   literacies sampler, 29:199–227.
Claudia Breazzano, Edoardo Rubino, Danilo Croce,          Klaus Krippendorff. 2018. Content analysis: An intro-
  and Roberto Basili. 2020. Unitor @ dankmemes:             duction to its methodology. Sage publications.
  Combining convolutional models and transformer-
  based architectures for accurate meme management.       Ryan M. Milner. 2016. The World Made Meme: Pub-
  In Valerio Basile, Danilo Croce, Maria Di Maro, and       lic Conversations and Participatory Media. MIT
  Lucia C. Passaro, editors, Proceedings of Seventh         Press.
  Evaluation Campaign of Natural Language Pro-            Abel L. Peirson V and E. Meltem Tolunay. 2018. Dank
  cessing and Speech Tools for Italian. Final Work-         learning: Generating memes using deep neural net-
  shop (EVALITA 2020), Online. CEUR.org.                    works. CoRR, abs/1806.04510.
Patrick Davison. 2012. The language of internet           Vasiliki Plevriti. 2014. Satirical user-generated memes
  memes. The Social Media Reader, pages 120–134.            as an effective source of political criticism, extend-
                                                            ing debate and enhancing civic engagement.
Richard Dawkins. 2016. The Selfish Gene. Oxford
  University Press.                                       Andrew S. Ross and Damian J. Rivers. 2017. Digital
                                                            cultures of political participation: Internet memes
Stefano Fiorucci. 2020. Snk @ dankmemes: Lever-             and the discursive felegitimization of the 2016 us
   aging pretrained embeddings for multimodal meme          presidential candidates. Discourse, Context & Me-
   detection. In Valerio Basile, Danilo Croce, Maria        dia, 16:1–11.
Benet Oriol Sabat, Cristian Canton Ferrer, and Xavier
  Giro-i Nieto. 2019. Hate speech in pixels: Detec-
  tion of offensive memes towards automatic modera-
  tion. arXiv preprint arXiv:1910.02334.
Jinen Setpal and Gabriele Sarti.             2020.
   Dankmemesteam @ dankmemes: Archimede:
   A new model architecture for meme detection.
   In Valerio Basile, Danilo Croce, Maria Di Maro,
   and Lucia C. Passaro, editors, Proceedings of
   Seventh Evaluation Campaign of Natural Language
   Processing and Speech Tools for Italian. Final
   Workshop (EVALITA 2020), Online. CEUR.org.
Chhavi Sharma, Deepesh Bhageria, William Scott,
  Srinivas PYKL, Amitava Das, Tanmoy Chakraborty,
  Viswanath Pulabaigari, and Bjorn Gamback. 2020.
  Semeval-2020 task 8: Memotion analysis –
  the visuo-lingual metaphor!       arXiv preprint
  arXiv:2008.03781.

Limor Shifman. 2013. Memes in a digital world: Rec-
  onciling with a conceptual troublemaker. Journal of
  computer-mediated communication, 18(3):362–377.
E. S. Smitha, Selvaraju Sendhilkumar, and G. S. Ma-
   halaksmi. 2018. Meme classification using textual
   and visual features. In Computational Vision and
   Bio Inspired Computing, pages 1015–1031.
Emi Tanaka, Timothy Bailey, and Uri Keich. 2014.
  Improving meme via a two-tiered significance anal-
  ysis. Bioinformatics, 30:1965–1973, 03.
George-Alexandru Vlad, George-Eduard Zaharia,
  Dumitru-Clementin Cercel, and Mihai Dascalu.
  2020. Upb @ dankmemes: Italian memes analysis:
  Employing visual models and graph convolutional
  networks for meme identification and hate speech
  detection. In Valerio Basile, Danilo Croce, Maria
  Di Maro, and Lucia C. Passaro, editors, Proceedings
  of Seventh Evaluation Campaign of Natural Lan-
  guage Processing and Speech Tools for Italian. Fi-
  nal Workshop (EVALITA 2020), Online. CEUR.org.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov,
 Sara Rosenthal, Noura Farra, and Ritesh Kumar.
 2019a. Predicting the type and target of offensive
 posts in social media. In Proceedings of the 2019
 Conference of the North American Chapter of the
 Association for Computational Linguistics: Human
 Language Technologies, Volume 1 (Long and Short
 Papers), pages 1415–1420.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov,
 Sara Rosenthal, Noura Farra, and Ritesh Kumar.
 2019b. SemEval-2019 task 6: Identifying and cat-
 egorizing offensive language in social media (Of-
 fensEval). In Proceedings of the 13th International
 Workshop on Semantic Evaluation, pages 75–86.