=Paper= {{Paper |id=Vol-2769/30 |storemode=property |title=The CREENDER Tool for Creating Multimodal Datasets of Images and Comments |pdfUrl=https://ceur-ws.org/Vol-2769/paper_30.pdf |volume=Vol-2769 |authors=Alessio Palmero Aprosio,Stefano Menini,Sara Tonelli |dblpUrl=https://dblp.org/rec/conf/clic-it/AprosioMT20 }} ==The CREENDER Tool for Creating Multimodal Datasets of Images and Comments== https://ceur-ws.org/Vol-2769/paper_30.pdf
                            The CREENDER Tool for Creating
                       Multimodal Datasets of Images and Comments

    Alessio Palmero Aprosio                   Stefano Menini                        Sara Tonelli
    Fondazione Bruno Kessler              Fondazione Bruno Kessler            Fondazione Bruno Kessler
          Trento, Italy                         Trento, Italy                       Trento, Italy
      aprosio@fbk.eu                         menini@fbk.eu                     satonelli@fbk.eu


                        Abstract                            and language technologies, proposing approaches
                                                            towards multimodal data processing (Belz et al.,
        English. While text-only datasets are
                                                            2016; Belz et al., 2017). This has led to an in-
        widely produced and used for research
                                                            creasing need of multimodal datasets with high-
        purposes, limitations set by image-based
                                                            quality information to be used for training and
        social media platforms like Instagram
                                                            evaluating the developed systems. While several
        make it difficult for researchers to exper-
                                                            datasets have been created by downloading and of-
        iment with multimodal data. We there-
                                                            ten adding textual annotation to real online data
        fore developed CREENDER, an annota-
                                                            (see for example the Flickr dataset2 ), this poses
        tion tool to create multimodal datasets
                                                            privacy and copyright issues, since downloading
        with images associated with semantic tags
                                                            and using pictures posted online without the au-
        and comments, which we make freely
                                                            thor’s consent is often forbidden by social net-
        available under Apache 2.0 license. The
                                                            work privacy policies. Instagram terms of use, for
        software has been extensively tested with
                                                            example, explicitly forbid collecting information
        school classes, allowing us to improve the
                                                            in an automated way without express permission
        tool and add useful features not planned in
                                                            from the platform.3
        the first development phase.1
                                                               In order to address this issue, we present
        Italiano. Mentre i dataset testuali sono            CREENDER, a novel annotation tool to create
        ampiamenti creati e usati per scopi di              multimodal datasets of images and comments.
        ricerca, le limitazioni imposte dai social          With this tool it is possible to simulate a scenario
        media basati sulle immagini (come Insta-            where different users access the platform and are
        gram) rendono difficile per i ricercatori           displayed different pictures, having the possibil-
        sperimentare con dati multimodali. Ab-              ity to leave a comment and associate a semantic
        biamo quindi sviluppato CREENDER, un                tag to the image. The same pictures can be shown
        tool di annotazione per la creazione di             to different users, allowing a comparison of their
        dataset multimodali in cui immagini ven-            comments and online behaviour.
        gono associate a etichette semantiche e                CREENDER can be used in contexts where
        commenti, e che abbiamo reso disponibile            simulated scenarios are the only solution to collect
        gratuitamente con la licenza Apache 2.0.            datasets of interest. One typical example, which
        Il software è stato testato in un laborato-        we detail in Section 4, is the analysis of the online
        rio con alcune classi scolastiche, perme-           behaviour of teenagers and young adults, a task
        ttendoci di ottimizzare alcune procedure            that poses relevant privacy issues since underage
        e di aggiungere feature non previste nella          users are targeted. Giving the possibility to com-
        prima release.                                      ment images in an Instagram-like setting without
                                                            giving any personal information to register is in-
1       Introduction                                        deed of paramount importance, and can be eas-
                                                            ily achieved with the tool presented in this paper.
In the last years, the NLP community has started
                                                               2
to focus on the challenges of combining vision                  https://yahooresearch.
                                                            tumblr.com/post/89783581601/
    1                                                       one-hundred-million-creative-commons-flickr-images
     ”Copyright c 2020 for this paper by its authors. Use
                                                              3
permitted under Creative Commons License Attribution 4.0        See, for example, https://help.instagram.
International (CC BY 4.0).”                                 com/581066165581870.
Given its flexibility, CREENDER can however be            while other language files can be added as needed.
used for any task where images need to be tagged          The interface language can be assigned at user
and/or commented, and multiple annotations of             level, meaning that the interface for users on the
the same image should be preferably collected.            same instance can be configured in different lan-
                                                          guages.
2   Related Work                                              Once the tool is installed on a server, a super
Several tools have been developed to annotate im-         user is created, who can access the administra-
ages with different types of information. Most of         tion interface where the projects are managed with
them are designed to be run only on a desktop             the password chosen during installation (see Fig-
computer and are meant to select parts of the pic-        ure 2).
ture to assign a semantic tag or a description, so            For each project, on the configuration side, a set
that the resulting corpora can be used to train or        of photos (or a set of external links to images on
evaluate image recognition or captioning software.        the web) needs to be given to the tool. Then, one
In this scenario, users often need to be trained to       can set the number of users and the number of an-
use the annotation tool, which requires some time         notations that are required for each photo. Finally,
that is usually not available in specific settings like   the system assigns the photos to the users and cre-
schools (Russell et al., 2008). Other tools for im-       ates the login information for them. Social login
age annotation or captioning are web-based, like          is also supported (only Google for now), so that
CREENDER, but the software is not available for           there is no need to spread users and password: the
download and must be used as a service. This              administrator chooses a five-digit code and gives
paradigm can lead to privacy issues, as the data are      it to every annotator, that can then log in using the
not stored locally or on an owned server (Chapman         code and his/her social account.
et al., 2012). This could be problematic when the             Given a picture, the system can be set to per-
pictures to be annotated are copyright-protected          form three actions in sequence or in isolation, as
or when users involved in the data collection do          needed by the task: i) the picture can be skipped
not want/cannot create an account with personal           by the user, so that no annotation is stored and the
information. Finally, some software is not dis-           next one is displayed; ii) the user can insert free
tributed open source, and could suddenly become           text associated to the image. This can be used to
unavailable or not usable when not maintained any         write a caption, comment the picture, list the con-
more (Halaschek-Wiener et al., 2005; Hughes et            tained objects, etc. iii) one or more pre-defined
al., 2018).                                               categories can be assigned to the picture. Cate-
   Regarding the datasets, Mogadala et al. (2019)         gories can range from specific ones related to the
focus on prominent tasks that integrate language          portrayed objects (e.g. male, female, animals, etc.)
and vision by discussing their problem formula-           to more abstract ones, like for example the emo-
tions, methods, existing datasets, and evaluation         tions provoked by looking at the picture.
measures, comparing the results obtained with dif-            In the configuration screen, the administrator
ferent state-of-the-art methods. Ethical and legal        can edit the prompted questions and the possible
issues on the use of pictures and texts taken from        answers, so that the tool can be used for a variety
social networks are also relevant, as discussed in        of different tasks.
(Lyons, 2020; Prabhu and Birhane, 2020; Fiesler
                                                              Using the administration web interface, it is
and Proferes, 2018). Our tool has been developed
                                                          also possible to monitor the task with information
to address specifically also this kind of issues, pre-
                                                          about the number of annotations that each user has
serving the privacy of users and avoiding the col-
                                                          performed. This enables to check whether some
lection of real data.
                                                          users experience difficulties in the annotation, or
                                                          if some annotators are anomalously fast (for ex-
3   Annotation Tool
                                                          ample by skipping too many images). Once the
The CREENDER tool can be accessed both via                annotation session is closed, the administrator can
browser and mobile phone, so that users can use it        download the resulting corpus containing the im-
even if no computer connected to Internet is avail-       ages and the associated information. The export
able. The web interface is multi-language, since          is available in three formats: SQL database, CSV,
English, French and Italian are already included,         and JSON.
     Figure 1: CREENDER interface configured for the collection of potentially offensive comments


4    Use Case: Creation of Offensive Posts            tation. The sessions were organised so that differ-
                                                      ent school classes annotated the same set of im-
The CREENDER tool was used to collect abusive         ages, in order to collect multiple annotations on
comments associated to images, simulating a set-      the same pictures. The pictures were retrieved
ting like Instagram in which pictures and text to-    from online sources and then manually checked
gether build an interaction which may become of-      by the researchers involved in the study to remove
fensive. The data collection was carried out in       pornographic content. In the preparatory phase,
several classes of Italian teenagers aged between     the filtered pictures were uploaded in the CREEN-
15 and 18, in the framework of a collaboration        DER image folder. Then, a login and password
with schools aimed at increasing awareness on so-     were created for each student to be involved in
cial media and cyberbullying phenomena (Menini        the data collection and printed on paper, so that
et al., 2019). The data collection was embed-         they could be given to each student before an an-
ded in a larger process that required two to three    notation session without the possibility to asso-
meetings with each class, one per week, involv-       ciate login information with the students’ identity.
ing every time two social scientists, two computa-    CREENDER was configured to first take a random
tional linguists and at least two teachers. During    picture from the image folder, and display it to the
these meetings several activities were carried out    user with a prompt asking “If you saw this picture
with students, including simulating a WhatsApp        on Instagram, would you make fun of the user who
conversation around a given plot as described in      posted it?”. If the user selects “No”, then the sys-
(Sprugnoli et al., 2018), commenting on existing      tem picks another image randomly and the same
social media posts, and annotating images as de-      question is asked. If the user clicks on “Yes”, a
scribed in this paper.                                second screen opens where the user is asked to
    Overall, 95 students were involved in the anno-   specify the reason why the image would trigger
      Figure 2: The administration interface to define the number of users and the images per user


such reaction by selecting one of the following          likely to look at the picture before deciding to skip
categories: “Body”, “Clothing”, “Pose”, “Facial          it or not.
expression”, “Location”, “Activity” and “Other”.
Two screenshots of the interface are displayed in        5       Release
Figure 1. The user should also write the textual
                                                         The software is distributed as an open source pack-
comment s/he would post below the picture. After
                                                         age4 and is released under the Apache license (ver-
that, the next picture is displayed, and so on. A
                                                         sion 2.0). The API (backend) is written in php and
screenshot of the tool configured for this specific
                                                         relies on a MySQL database. The web interface
task is displayed in Figure 1.
                                                         (frontend) is developed using the HTML/CSS/JS
   At the end of the activities with schools, all col-   paradigm using the modern Bootstrap and VueJS
lected data were exported. The final corpus in-          frameworks.
cludes almost 17,912 images, 1,018 of which have             The interface is responsive, so that one can use
at least one associated comment, as well as a trig-      it from any device that can open web pages (desk-
ger category (e.g. facial expression, pose) and the      top computers, smartphones, tablets).
category of the subject/s (female, male, mixed or
nobody). The number of annotations for each pic-         6       Conclusions
ture may vary between 1 to 4. A more detailed
description of the corpus is reported in (Menini et      In this work we present a methodology and a tool,
al., 2021).                                              CREENDER, to create multimodal datasets. In
                                                         this framework, participants in online annotation
   The use of CREENDER allowed a seamless and            sessions can write comments to images, assign
very fast data collection, without the need to send      pre-defined categories or simply skipping an im-
images to each student, to exchange or merge files       age. The tool is freely available with an interface
and to install specific applications. On the other       in three languages, and allows setting up easily an-
hand, the data collection with students, who used        notation sessions with multiple users.
the online platform in classes while researchers            CREENDER has been extensively tested dur-
were physically present and could check the flow         ing activities with schools around the topic of
of the interaction, was useful to improve the tool.      cyberbullying, involving 95 Italian high-school
Some bug fixes and small improvements were in-           students. The tool is particularly suitable for
deed implemented after the first sessions. For ex-       this kind of settings, where privacy issues are of
ample, a small delay (2 seconds) was added af-           paramount importance and the involvement of un-
ter the image is displayed to the user and before
                                                             4
the Yes/No buttons appear, so that users are more                https://github.com/dhfbk/creender
derage people requires that personal information         Michael J Lyons. 2020. Excavating” excavating
is not shared.                                             ai”: The elephant in the gallery. arXiv preprint
                                                           arXiv:2009.01215.
   In the future, we plan to continue the annota-
tion of images related to cyberbullying, creating        Stefano Menini, Giovanni Moretti, Michele Corazza,
and comparing subsets of pictures related to differ-        Elena Cabrio, Sara Tonelli, and Serena Villata.
ent topics (e.g. religious symbols, political parties,      2019. A system to monitor cyberbullying based on
                                                            message classification and social network analysis.
football teams). From an implementation point of            In Proceedings of the Third Workshop on Abusive
view, we will extend the analytics panel, adding            Language Online, pages 105–110.
for example scripts for computing inter-annotator
                                                         Stefano Menini, Alessio Palmero Aprosio, and Sara
agreement.                                                  Tonelli. 2021. A multimodal dataset of images
                                                            and text to study abusive language. In 7th Italian
Acknowledgments                                             Conference on Computational Linguistics, CLiC-it
                                                            2020.
Part of this work has been funded by the KID
ACTIONS REC-AG project (n. 101005518) on                 Aditya Mogadala, Marimuthu Kalimuthu, and Dietrich
“Kick-off preventIng and responDing to children            Klakow. 2019. Trends in integration of vision and
                                                           language research: A survey of tasks, datasets, and
and AdolesCenT cyberbullyIng through innova-               methods. arXiv preprint arXiv:1907.09358.
tive mOnitoring and educatioNal technologieS”.
In addition, the authors want to thank all the stu-      Vinay Uday Prabhu and Abeba Birhane. 2020. Large
                                                           image datasets: A pyrrhic win for computer vision?
dents and teachers who participated in the experi-
mentation.                                               Bryan C Russell, Antonio Torralba, Kevin P Mur-
                                                           phy, and William T Freeman. 2008. LabelMe:
                                                           a database and web-based tool for image annota-
References                                                 tion. International journal of computer vision, 77(1-
                                                           3):157–173.
Anya Belz, Erkut Erdem, Krystian Mikolajczyk, and
  Katerina Pastra, editors. 2016. Proceedings of the     Rachele Sprugnoli, Stefano Menini, Sara Tonelli, Fil-
  5th Workshop on Vision and Language, Berlin, Ger-        ippo Oncini, and Enrico Piras. 2018. Creating a
  many, August. Association for Computational Lin-         WhatsApp Dataset to Study Pre-teen Cyberbullying.
  guistics.                                                In Proceedings of the 2nd Workshop on Abusive Lan-
                                                           guage Online (ALW2), pages 51–59. Association for
Anya Belz, Erkut Erdem, Katerina Pastra, and Krys-         Computational Linguistics.
  tian Mikolajczyk, editors. 2017. Proceedings of
  the Sixth Workshop on Vision and Language, Va-
  lencia, Spain, April. Association for Computational
  Linguistics.

Brian E Chapman, Mona Wong, Claudiu Farcas, and
  Patrick Reynolds. 2012. Annio: a web-based
  tool for annotating medical images with ontologies.
  In 2012 IEEE Second International Conference on
  Healthcare Informatics, Imaging and Systems Biol-
  ogy, pages 147–147. IEEE.

Casey Fiesler and Nicholas Proferes. 2018. “partici-
  pant” perceptions of twitter research ethics. Social
  Media + Society, 4(1):2056305118763366.

Christian Halaschek-Wiener, Jennifer Golbeck, An-
  drew Schain, Michael Grove, Bijan Parsia, and Jim
  Hendler. 2005. Photostuff-an image annotation tool
  for the semantic web. In Proceedings of the 4th in-
  ternational semantic web conference, pages 6–10.
  Citeseer.

Alex J Hughes, Joseph D Mornin, Sujoy K Biswas,
  Lauren E Beck, David P Bauer, Arjun Raj, Simone
  Bianco, and Zev J Gartner. 2018. Quanti.us: a tool
  for rapid, flexible, crowd-based annotation of im-
  ages. Nature methods, 15(8):587–590.