=Paper=
{{Paper
|id=Vol-2769/30
|storemode=property
|title=The CREENDER Tool for Creating Multimodal Datasets of Images and Comments
|pdfUrl=https://ceur-ws.org/Vol-2769/paper_30.pdf
|volume=Vol-2769
|authors=Alessio Palmero Aprosio,Stefano Menini,Sara Tonelli
|dblpUrl=https://dblp.org/rec/conf/clic-it/AprosioMT20
}}
==The CREENDER Tool for Creating Multimodal Datasets of Images and Comments==
The CREENDER Tool for Creating
Multimodal Datasets of Images and Comments
Alessio Palmero Aprosio Stefano Menini Sara Tonelli
Fondazione Bruno Kessler Fondazione Bruno Kessler Fondazione Bruno Kessler
Trento, Italy Trento, Italy Trento, Italy
aprosio@fbk.eu menini@fbk.eu satonelli@fbk.eu
Abstract and language technologies, proposing approaches
towards multimodal data processing (Belz et al.,
English. While text-only datasets are
2016; Belz et al., 2017). This has led to an in-
widely produced and used for research
creasing need of multimodal datasets with high-
purposes, limitations set by image-based
quality information to be used for training and
social media platforms like Instagram
evaluating the developed systems. While several
make it difficult for researchers to exper-
datasets have been created by downloading and of-
iment with multimodal data. We there-
ten adding textual annotation to real online data
fore developed CREENDER, an annota-
(see for example the Flickr dataset2 ), this poses
tion tool to create multimodal datasets
privacy and copyright issues, since downloading
with images associated with semantic tags
and using pictures posted online without the au-
and comments, which we make freely
thor’s consent is often forbidden by social net-
available under Apache 2.0 license. The
work privacy policies. Instagram terms of use, for
software has been extensively tested with
example, explicitly forbid collecting information
school classes, allowing us to improve the
in an automated way without express permission
tool and add useful features not planned in
from the platform.3
the first development phase.1
In order to address this issue, we present
Italiano. Mentre i dataset testuali sono CREENDER, a novel annotation tool to create
ampiamenti creati e usati per scopi di multimodal datasets of images and comments.
ricerca, le limitazioni imposte dai social With this tool it is possible to simulate a scenario
media basati sulle immagini (come Insta- where different users access the platform and are
gram) rendono difficile per i ricercatori displayed different pictures, having the possibil-
sperimentare con dati multimodali. Ab- ity to leave a comment and associate a semantic
biamo quindi sviluppato CREENDER, un tag to the image. The same pictures can be shown
tool di annotazione per la creazione di to different users, allowing a comparison of their
dataset multimodali in cui immagini ven- comments and online behaviour.
gono associate a etichette semantiche e CREENDER can be used in contexts where
commenti, e che abbiamo reso disponibile simulated scenarios are the only solution to collect
gratuitamente con la licenza Apache 2.0. datasets of interest. One typical example, which
Il software è stato testato in un laborato- we detail in Section 4, is the analysis of the online
rio con alcune classi scolastiche, perme- behaviour of teenagers and young adults, a task
ttendoci di ottimizzare alcune procedure that poses relevant privacy issues since underage
e di aggiungere feature non previste nella users are targeted. Giving the possibility to com-
prima release. ment images in an Instagram-like setting without
giving any personal information to register is in-
1 Introduction deed of paramount importance, and can be eas-
ily achieved with the tool presented in this paper.
In the last years, the NLP community has started
2
to focus on the challenges of combining vision https://yahooresearch.
tumblr.com/post/89783581601/
1 one-hundred-million-creative-commons-flickr-images
”Copyright c 2020 for this paper by its authors. Use
3
permitted under Creative Commons License Attribution 4.0 See, for example, https://help.instagram.
International (CC BY 4.0).” com/581066165581870.
Given its flexibility, CREENDER can however be while other language files can be added as needed.
used for any task where images need to be tagged The interface language can be assigned at user
and/or commented, and multiple annotations of level, meaning that the interface for users on the
the same image should be preferably collected. same instance can be configured in different lan-
guages.
2 Related Work Once the tool is installed on a server, a super
Several tools have been developed to annotate im- user is created, who can access the administra-
ages with different types of information. Most of tion interface where the projects are managed with
them are designed to be run only on a desktop the password chosen during installation (see Fig-
computer and are meant to select parts of the pic- ure 2).
ture to assign a semantic tag or a description, so For each project, on the configuration side, a set
that the resulting corpora can be used to train or of photos (or a set of external links to images on
evaluate image recognition or captioning software. the web) needs to be given to the tool. Then, one
In this scenario, users often need to be trained to can set the number of users and the number of an-
use the annotation tool, which requires some time notations that are required for each photo. Finally,
that is usually not available in specific settings like the system assigns the photos to the users and cre-
schools (Russell et al., 2008). Other tools for im- ates the login information for them. Social login
age annotation or captioning are web-based, like is also supported (only Google for now), so that
CREENDER, but the software is not available for there is no need to spread users and password: the
download and must be used as a service. This administrator chooses a five-digit code and gives
paradigm can lead to privacy issues, as the data are it to every annotator, that can then log in using the
not stored locally or on an owned server (Chapman code and his/her social account.
et al., 2012). This could be problematic when the Given a picture, the system can be set to per-
pictures to be annotated are copyright-protected form three actions in sequence or in isolation, as
or when users involved in the data collection do needed by the task: i) the picture can be skipped
not want/cannot create an account with personal by the user, so that no annotation is stored and the
information. Finally, some software is not dis- next one is displayed; ii) the user can insert free
tributed open source, and could suddenly become text associated to the image. This can be used to
unavailable or not usable when not maintained any write a caption, comment the picture, list the con-
more (Halaschek-Wiener et al., 2005; Hughes et tained objects, etc. iii) one or more pre-defined
al., 2018). categories can be assigned to the picture. Cate-
Regarding the datasets, Mogadala et al. (2019) gories can range from specific ones related to the
focus on prominent tasks that integrate language portrayed objects (e.g. male, female, animals, etc.)
and vision by discussing their problem formula- to more abstract ones, like for example the emo-
tions, methods, existing datasets, and evaluation tions provoked by looking at the picture.
measures, comparing the results obtained with dif- In the configuration screen, the administrator
ferent state-of-the-art methods. Ethical and legal can edit the prompted questions and the possible
issues on the use of pictures and texts taken from answers, so that the tool can be used for a variety
social networks are also relevant, as discussed in of different tasks.
(Lyons, 2020; Prabhu and Birhane, 2020; Fiesler
Using the administration web interface, it is
and Proferes, 2018). Our tool has been developed
also possible to monitor the task with information
to address specifically also this kind of issues, pre-
about the number of annotations that each user has
serving the privacy of users and avoiding the col-
performed. This enables to check whether some
lection of real data.
users experience difficulties in the annotation, or
if some annotators are anomalously fast (for ex-
3 Annotation Tool
ample by skipping too many images). Once the
The CREENDER tool can be accessed both via annotation session is closed, the administrator can
browser and mobile phone, so that users can use it download the resulting corpus containing the im-
even if no computer connected to Internet is avail- ages and the associated information. The export
able. The web interface is multi-language, since is available in three formats: SQL database, CSV,
English, French and Italian are already included, and JSON.
Figure 1: CREENDER interface configured for the collection of potentially offensive comments
4 Use Case: Creation of Offensive Posts tation. The sessions were organised so that differ-
ent school classes annotated the same set of im-
The CREENDER tool was used to collect abusive ages, in order to collect multiple annotations on
comments associated to images, simulating a set- the same pictures. The pictures were retrieved
ting like Instagram in which pictures and text to- from online sources and then manually checked
gether build an interaction which may become of- by the researchers involved in the study to remove
fensive. The data collection was carried out in pornographic content. In the preparatory phase,
several classes of Italian teenagers aged between the filtered pictures were uploaded in the CREEN-
15 and 18, in the framework of a collaboration DER image folder. Then, a login and password
with schools aimed at increasing awareness on so- were created for each student to be involved in
cial media and cyberbullying phenomena (Menini the data collection and printed on paper, so that
et al., 2019). The data collection was embed- they could be given to each student before an an-
ded in a larger process that required two to three notation session without the possibility to asso-
meetings with each class, one per week, involv- ciate login information with the students’ identity.
ing every time two social scientists, two computa- CREENDER was configured to first take a random
tional linguists and at least two teachers. During picture from the image folder, and display it to the
these meetings several activities were carried out user with a prompt asking “If you saw this picture
with students, including simulating a WhatsApp on Instagram, would you make fun of the user who
conversation around a given plot as described in posted it?”. If the user selects “No”, then the sys-
(Sprugnoli et al., 2018), commenting on existing tem picks another image randomly and the same
social media posts, and annotating images as de- question is asked. If the user clicks on “Yes”, a
scribed in this paper. second screen opens where the user is asked to
Overall, 95 students were involved in the anno- specify the reason why the image would trigger
Figure 2: The administration interface to define the number of users and the images per user
such reaction by selecting one of the following likely to look at the picture before deciding to skip
categories: “Body”, “Clothing”, “Pose”, “Facial it or not.
expression”, “Location”, “Activity” and “Other”.
Two screenshots of the interface are displayed in 5 Release
Figure 1. The user should also write the textual
The software is distributed as an open source pack-
comment s/he would post below the picture. After
age4 and is released under the Apache license (ver-
that, the next picture is displayed, and so on. A
sion 2.0). The API (backend) is written in php and
screenshot of the tool configured for this specific
relies on a MySQL database. The web interface
task is displayed in Figure 1.
(frontend) is developed using the HTML/CSS/JS
At the end of the activities with schools, all col- paradigm using the modern Bootstrap and VueJS
lected data were exported. The final corpus in- frameworks.
cludes almost 17,912 images, 1,018 of which have The interface is responsive, so that one can use
at least one associated comment, as well as a trig- it from any device that can open web pages (desk-
ger category (e.g. facial expression, pose) and the top computers, smartphones, tablets).
category of the subject/s (female, male, mixed or
nobody). The number of annotations for each pic- 6 Conclusions
ture may vary between 1 to 4. A more detailed
description of the corpus is reported in (Menini et In this work we present a methodology and a tool,
al., 2021). CREENDER, to create multimodal datasets. In
this framework, participants in online annotation
The use of CREENDER allowed a seamless and sessions can write comments to images, assign
very fast data collection, without the need to send pre-defined categories or simply skipping an im-
images to each student, to exchange or merge files age. The tool is freely available with an interface
and to install specific applications. On the other in three languages, and allows setting up easily an-
hand, the data collection with students, who used notation sessions with multiple users.
the online platform in classes while researchers CREENDER has been extensively tested dur-
were physically present and could check the flow ing activities with schools around the topic of
of the interaction, was useful to improve the tool. cyberbullying, involving 95 Italian high-school
Some bug fixes and small improvements were in- students. The tool is particularly suitable for
deed implemented after the first sessions. For ex- this kind of settings, where privacy issues are of
ample, a small delay (2 seconds) was added af- paramount importance and the involvement of un-
ter the image is displayed to the user and before
4
the Yes/No buttons appear, so that users are more https://github.com/dhfbk/creender
derage people requires that personal information Michael J Lyons. 2020. Excavating” excavating
is not shared. ai”: The elephant in the gallery. arXiv preprint
arXiv:2009.01215.
In the future, we plan to continue the annota-
tion of images related to cyberbullying, creating Stefano Menini, Giovanni Moretti, Michele Corazza,
and comparing subsets of pictures related to differ- Elena Cabrio, Sara Tonelli, and Serena Villata.
ent topics (e.g. religious symbols, political parties, 2019. A system to monitor cyberbullying based on
message classification and social network analysis.
football teams). From an implementation point of In Proceedings of the Third Workshop on Abusive
view, we will extend the analytics panel, adding Language Online, pages 105–110.
for example scripts for computing inter-annotator
Stefano Menini, Alessio Palmero Aprosio, and Sara
agreement. Tonelli. 2021. A multimodal dataset of images
and text to study abusive language. In 7th Italian
Acknowledgments Conference on Computational Linguistics, CLiC-it
2020.
Part of this work has been funded by the KID
ACTIONS REC-AG project (n. 101005518) on Aditya Mogadala, Marimuthu Kalimuthu, and Dietrich
“Kick-off preventIng and responDing to children Klakow. 2019. Trends in integration of vision and
language research: A survey of tasks, datasets, and
and AdolesCenT cyberbullyIng through innova- methods. arXiv preprint arXiv:1907.09358.
tive mOnitoring and educatioNal technologieS”.
In addition, the authors want to thank all the stu- Vinay Uday Prabhu and Abeba Birhane. 2020. Large
image datasets: A pyrrhic win for computer vision?
dents and teachers who participated in the experi-
mentation. Bryan C Russell, Antonio Torralba, Kevin P Mur-
phy, and William T Freeman. 2008. LabelMe:
a database and web-based tool for image annota-
References tion. International journal of computer vision, 77(1-
3):157–173.
Anya Belz, Erkut Erdem, Krystian Mikolajczyk, and
Katerina Pastra, editors. 2016. Proceedings of the Rachele Sprugnoli, Stefano Menini, Sara Tonelli, Fil-
5th Workshop on Vision and Language, Berlin, Ger- ippo Oncini, and Enrico Piras. 2018. Creating a
many, August. Association for Computational Lin- WhatsApp Dataset to Study Pre-teen Cyberbullying.
guistics. In Proceedings of the 2nd Workshop on Abusive Lan-
guage Online (ALW2), pages 51–59. Association for
Anya Belz, Erkut Erdem, Katerina Pastra, and Krys- Computational Linguistics.
tian Mikolajczyk, editors. 2017. Proceedings of
the Sixth Workshop on Vision and Language, Va-
lencia, Spain, April. Association for Computational
Linguistics.
Brian E Chapman, Mona Wong, Claudiu Farcas, and
Patrick Reynolds. 2012. Annio: a web-based
tool for annotating medical images with ontologies.
In 2012 IEEE Second International Conference on
Healthcare Informatics, Imaging and Systems Biol-
ogy, pages 147–147. IEEE.
Casey Fiesler and Nicholas Proferes. 2018. “partici-
pant” perceptions of twitter research ethics. Social
Media + Society, 4(1):2056305118763366.
Christian Halaschek-Wiener, Jennifer Golbeck, An-
drew Schain, Michael Grove, Bijan Parsia, and Jim
Hendler. 2005. Photostuff-an image annotation tool
for the semantic web. In Proceedings of the 4th in-
ternational semantic web conference, pages 6–10.
Citeseer.
Alex J Hughes, Joseph D Mornin, Sujoy K Biswas,
Lauren E Beck, David P Bauer, Arjun Raj, Simone
Bianco, and Zev J Gartner. 2018. Quanti.us: a tool
for rapid, flexible, crowd-based annotation of im-
ages. Nature methods, 15(8):587–590.