<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Multimodal Visual Sentiment Analysis Framework Enhanced With Feature Pyramid Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniele Galletti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerio Ponzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuele Russo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Systems Analysis and Computer Science, Italian National Research Council</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Neuroimaging Laboratory, IRCCS Santa Lucia Foundation</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>55</fpage>
      <lpage>63</lpage>
      <abstract>
        <p>Visual Sentiment Analysis aims to understand how images afect people in terms of evoked emotions. This paper presents a complete pipeline for comparing users' emotional responses to images, enabling the analysis of potential discrepancies between machine-inferred and subjective afective states. The proposed framework consists of three main stages. The first stage employs a Convolutional Neural Network (CNN) enhanced with Feature Pyramid Network (FPN) layers to extract multi-scale visual features. Experimental results show that incorporating three additional FPN layers improves performance while introducing only a negligible increase in model complexity. In the second stage, a multimodal approach is adopted, where visual features are integrated with textual features derived from captions generated by an Image Captioning model. This fusion enriches the emotional context by combining visual and linguistic cues. In the final stage, a grounding mechanism is applied to align and merge sentiments from the diferent modalities into a unified representation. The algorithm's output is then compared with the sentiment expressed by the user, enabling an analysis of the divergence between machine-inferred and human-perceived emotions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Visual Sentiment Analysis</kwd>
        <kwd>Feature Pyramid Network</kwd>
        <kwd>Multimodal Evaluation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        pipeline is built around three main stages. In the first
stage, visual features are extracted from the image using
Sentiment Analysis is a well-known field in machine an artificial neural network. In the second stage, a neural
learning. The goal of sentiment analysis is to measure captioning model generates a description of the image,
how certain topics afect people. The outcomes of this and in the third stage, the features from the first and the
study are very important: having the perception of what second stage are mixed into a common representation.
the common opinion is, influencing political, economic The CNN we use in the first stage is a novel
archiand social aspects of an entire population [
        <xref ref-type="bibr" rid="ref2 ref3">1, 2</xref>
        ]. Despite tecture that integrates FPN layers into a CNN [
        <xref ref-type="bibr" rid="ref9">8</xref>
        ]. This
its large use on text corpus and the huge availability of model aims to extract meaningful features at diferent
data coming from social platforms, sentiment analysis is scales, having the benefits of a CNN for object
detecstill far from achieving always good reliability. The lack tion and also exploiting low level features which have
of context, the diferences between languages and cul- proven to be useful for sentiment classification [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ]. The
tures, create, in fact, very important barriers which make model achieves better results when compared to its
presentiment classification a dificult task. Visual Sentiment decessor [
        <xref ref-type="bibr" rid="ref11 ref4">3, 10</xref>
        ] and more classical modeling techniques
Analysis (VSA) [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">3, 4, 5, 6</xref>
        ] was born as an additional in- [11, 12, 13, 14]. In the second step, a textual
descripstrument to understand people’s sentiment. It emerged in tion, coming from the Image Captioning model recently
the last decade, gaining traction with the increasing use presented by Wang et al. [15], is added to the features
of images to express opinions on social media platforms. extracted in the first step. The description ofers an
unImages ofer an additional channel capable of expressing biased representation unafected by the source of the
much more information than text [
        <xref ref-type="bibr" rid="ref8">7</xref>
        ]. Images convey data.
both semantic elements (e.g., objects, scenes) and emo- In the last step of the pipeline, a grounding technique
tional nuances, ofering a richer medium than text. For is used to merge features coming from visual and textual
this reason, social media platforms became very popular data. Textual features are converted into a sentiment
disand VSA, consequently, started to grow. In this work, tribution using the Emotion Sensor dataset [16]. Visual
we present a multimodal sentiment extraction pipeline. features, which are in another domain of emotions, are
This pipeline aims to give a framework to assess how similarly converted into the same representation by using
an image is classified in terms of evoked sentiment. The an association between the labels of the two
representations. This was done since labels in every representation
used in this work are meaningful in terms of sentiment
content.
      </p>
      <p>The result is then presented to the user. The user’s
feedback, in the form of an audio file, is converted to text
using the Speech Recognition API [17] and the sentiment
is extracted by using the same technique of the third step. In this project we used three diferent datasets. The
senThe result is also presented to the user, along with the timent extraction pipeline uses these datasets at diferent
algorithm’s result. steps.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Datasets</title>
    </sec>
    <sec id="sec-3">
      <title>2. Related Works</title>
      <p>
        Research in Visual Sentiment Analysis has evolved
significantly over the past decade, intersecting computer vision,
afective computing, and multimodal learning. One of the
ifrst paper presented in VSA field was in 2010 [ 18]. They
did positive/negative classification using SIFT features
extracted from images mixed with textual metadata
associated with the image. Text to sentiment conversion was
done using SentiWordNet [19], which was published the
same year. The SentiWordNet corpus associates synsets
to sentiment polarity. In 2013, Borth et al. [20] created a
visual ontology in which sentiments in an image are
represented by ANPs (adjective-noun pairs). In 2014 Chen et
al. [
        <xref ref-type="bibr" rid="ref4">3</xref>
        ] presented DeepSentiBank, a CNN finetuned on the
Flickr dataset which classified images into a 1553 (ANP)
vector. This vector consists of a meaningful middle-level
representation also exploited in this work. More recently,
concerning the new CNN structures, Tianrong Raoa et al.
[
        <xref ref-type="bibr" rid="ref12">21</xref>
        ] used a FRCNN (Faster R-CNN) based on FPN in order
to extract the region of interest (RoI) in which sentiment
is contained. Other region-based works on VSA were
also presented [
        <xref ref-type="bibr" rid="ref13">22</xref>
        ]. Concerning recent studies, literature
went towards multimodal extraction of features. In 2016
Katsurai and Satoh [
        <xref ref-type="bibr" rid="ref14">23</xref>
        ] used both hand-crafted features
(SIFT and GIST) and text sentiment analysis on image
metadata in order to predict the sentiment polarity. In
2018 Ortis et al. [
        <xref ref-type="bibr" rid="ref15">24</xref>
        ] used multimodal classification with
visual features, metadata sentiment, and objective
extraction of caption which was converted to text. Corchs et al.
[
        <xref ref-type="bibr" rid="ref16">25</xref>
        ] presented a method that combines visual and textual
features by employing an ensemble learning approach. In
particular, the authors classified emotions by combining
5 state-of-the-art classifiers trained on visual and
textual data. In recent studies, artificial intelligence systems
have been successfully applied in real-life environments
to assess and react to emotional states, as shown in
psychoeducational robotics frameworks (Ponzi et al., 2021
[
        <xref ref-type="bibr" rid="ref17">26</xref>
        ]). Additionally, some recent approaches leverage
eye-tracking data to infer user attention and emotional
engagement with visual stimuli. These methods ofer a
complementary channel to multimodal sentiment
analysis by correlating gaze patterns with afective responses
[
        <xref ref-type="bibr" rid="ref18 ref19 ref20 ref21">27, 28, 29, 30</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>Flickr Dataset The first dataset used, the Flickr</title>
        <p>Dataset with CC, was created by Borth et. al. [20].
Images were automatically crawled from Flickr and filtered
by their metadata, resulting in 487 256 weakly annotated
samples. This dataset represents one of the first and most
used dataset ever created for VSA tasks. Each of the
1553 classes is an Adjective-Noun pair (ANP), a mid-level
representation for sentiment classification. To build this
dataset the authors have crawled Flickr images and
extracted textual tags associated with each sample. Most
significant tags were then grouped and transformed into
a set of pairs of adjective and noun. The pair
adjectivenoun, called ANP, represents a more emotionally charged
concept instead of nouns and adjectives by themselves.
Despite its large use this dataset presents some
limitations. It is weakly annotated (categorized automatically
by metadata posted by users on social networks) and
thus subjected to bias. The dataset is also highly
unbalanced, the classes in fact present a big variation of
samples, going from 23 to 1402 samples per class. We
used this dataset in order to finetune the neural network
models trained on object detection tasks. Further
details are presented in the Implementation and in Result
sections.</p>
        <p>
          Emotion Dataset The second dataset, published in
2016 [
          <xref ref-type="bibr" rid="ref22">31</xref>
          ] and available on Github, provides 23 308 images
manually annotated using the 8 basic emotions presented
by Mikels et al. [
          <xref ref-type="bibr" rid="ref23">32</xref>
          ]. The team started from 3+ million
images weakly labeled; they filtered and annotated images
by designing a task in which a group of people is asked to
answer simple questions. From the results, they’ve built
the largest manually created dataset up to then. As a
motivation for the work, they discussed the predominance,
on existing datasets, of images associated with Fear and
Sadness emotions (Figure 2). This predominance can
result in unbalancing classes, which can prevent an
algorithm from working correctly. Emotion Dataset has
ofered a good benchmark option over the Flickr one
since it is more properly classified, less biased, and less
unbalanced. An example of images grouped by Mikels
emotions is shown in Figure 1. In this work, Emotion
Dataset is used to finetune the neural network models
by adding a layer that maps ANPs representation (from
the Flickr dataset) to Mikels emotions.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Emotion Sensor Dataset The third dataset used is the Full Emotion Sensor dataset [16]. This dataset associates</title>
        <p>
          the most used 23 730 words coming from the internet to a
distribution over 7 emotions. The dataset, whose preview
at the current time is not available anymore [16], was
created by collecting thousands of sentences from blogs
and online posts. The authors then labeled manually
and automatically the sentences using 7 emotions and
calculated naive Bayes to classify words. The 7 emotions
correspond to an extended version of the 6 Ekman basic
emotions [33], by adding a neutral emotion in case of an
equal distribution over the other 6. This dataset, made
for NLP tasks, is used in this work in order to convert
diferent representations of sentiment into a common
one. The outcome of the algorithm will be a distribution
over these 7 emotions.
classes, both psychological studies and data analysis were
performed. The most popular model in the literature is
Plutchik’s Wheel of Emotions [34]. This model defines 8
basic emotions with 3 valences each, resulting in 24 total
classes. In this work, we used three diferent
representations. The first, used in Flickr dataset with CC [ 20], is the
ANP representation. It consists of pairs of adjectives and
nouns which are meaningful in terms of the emotion’s
content. The second representation was introduced by
Mikels et al. [
          <xref ref-type="bibr" rid="ref23">32</xref>
          ] and used in Emotion dataset [
          <xref ref-type="bibr" rid="ref22">31</xref>
          ]. It
deifnes 8 classes of emotions as the results from an analysis
on the IASP dataset. The third method was presented
by Ekman et al. [33]. They found 6 basic emotions by
categorizing facial expressions of individuals subjected
to a test, which involved 10 diferent cultures. This
representation was used in the Emotion Sensor dataset [16]
by adding one additional neutral sentiment.
        </p>
        <p>In this work, we tackle the problem of having diferent
emotion representations by using a grounding technique
that transforms all representations into one. Such
technique assumes that there exists an association among
diferent sentiment spaces since all the representations
cover the same emotional content. The common
representational model is chosen to be the extended Ekman
representation, used in the Emotion Sensor dataset.
Using this dataset, we convert the other two representations
into a distribution over 7 basic emotions.</p>
        <p>The conversion between Mikels’ representation and
Ekman’s was performed using Mikels’ labels. The labels
are directly mapped into a distribution by the Sensor
dataset. Some labels are common to both
representations; thus, the output distribution presents a big
predominance of that emotion (example shown in Figure
3). Some other labels can give problems connected to
their distribution. The Sensor dataset can present, in fact,
some non-coherent distribution due to the poor quality
of data and the nature of the dataset. This is reflected in
the conversion as shown in Figure 4.</p>
        <p>The ANP representation is converted into the
distribution over 7 emotions using the same technique. Each of
the words of the pairs corresponds to one distribution;
the output is the sum over the two distributions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Models</title>
      <sec id="sec-4-1">
        <title>5.1. Visual Sentiment Extraction</title>
        <sec id="sec-4-1-1">
          <title>In this work, we use a Convolutional Neural Network</title>
          <p>
            4. Emotion Representation to extract visual features from an image. The proposed
CNN is a modification of a popular architecture for object
There are several ways to represent a sentiment. Dif- detection [
            <xref ref-type="bibr" rid="ref9">8</xref>
            ]. We trained the architecture and tested it on
ferent psychological studies have led to diferent ways the Flickr dataset as done by Chen et al. [
            <xref ref-type="bibr" rid="ref4">3</xref>
            ]. Aside from
of representing human feeling in terms of basic emo- this, we’ve created a new architecture by introducing 3
tions. In order to create and categorize data under some Feature Pyramid layers. These layers extract low-level
          </p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Feature Pyramid Network Model</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Data Bias</title>
      <sec id="sec-5-1">
        <title>Data bias is a problem still present in Sentiment Analysis. It is connected to the diferent cultures, languages, and contexts in which diferent people live. Most of the</title>
        <p>
          datasets for VSA are, in fact, crawled from the internet similarities between images. 1357 faulty images were
and automatically annotated from metadata. This way found in the dataset, which in total remained with 21 951
of proceeding can disadvantage the algorithm’s perfor- samples. The Emotion Sensor dataset presented some
mance, since the images can be wrongly labeled. Train- lack of words useful in order to convert ANPs to
sentiing a model by providing more input channels has been ment distributions. These words, when converted, are
shown to be an efective way of tackling the bias problem replaced by their synonyms, provided by [39] and [40]
[
          <xref ref-type="bibr" rid="ref10">9</xref>
          ]. Despite this, there is no manually annotated dataset English dictionaries. The synonyms, manually annotated,
that provides both image and text channels. Text is in- were organized in a file. The user’s input is provided in
stead available in large, weakly labeled datasets crawled audio file format.
from the internet.
        </p>
        <p>
          Some works tried to solve the bias problem by
extracting objective features from data. These features do not 9. Results
come from the same source as the training data, but they
are generated from the elaboration of another Machine
Learning algorithm. The final features come from
diferent joint algorithms’ results. This approach has recently
been revealed to be very efective [
          <xref ref-type="bibr" rid="ref16">25</xref>
          ], [
          <xref ref-type="bibr" rid="ref15">24</xref>
          ]. In this work,
we adopt a similar approach to the one used by Ortis et
al. [
          <xref ref-type="bibr" rid="ref15">24</xref>
          ]. We used an Image Captioning model to generate
an objective description of the image. We then convert
the caption to a sentiment distribution using the Emotion
Sensor dataset [16]. The image captioning model used
was recently presented in Wang et al. [15]. Once the
caption is generated, relevant keywords are extracted for
sentiment mapping. In order to filter keywords inside
the phrase, we filtered English stopwords provided by
the nltk corpus [36] and used the nltk POS tagger [37]
and WordNet [38] to lemmatize the words if a
correspondence is not found inside the Emotion Sensor dataset.
        </p>
        <p>As we will show in the Result section, the extraction of
a neutral description is efective, but is nothing
without a good (and unbiased) conversion into the sentiment
distribution.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Results shown here are relative to the benchmark com</title>
        <p>puted on the datasets presented above.</p>
        <p>
          The first result is relative to the ANP classification
task using the Flickr dataset. The dataset was split
before training into 3 subsets: training, evaluation, and
test set. Since the dataset is very unbalanced, we’ve
created the test set such that at least a number of samples
remained in the training set. In this way, classes with
few samples are guaranteed to have at least a certain
number of images in the training set. The minimum
number was chosen to be 14. The two models involved
are the DeepSentiBank and the FPN model, both share
the same backbone (AlexNet) pre-trained on the
ImageNet task. The metrics used to evaluate the model are
the top 1, top 3, and top 10 accuracy, the same used in the
DeepSentiBank paper [
          <xref ref-type="bibr" rid="ref4">3</xref>
          ]. The training was done using a
Stochastic Gradient Descent optimizer with learning rate
parameter set to 1e-3, weight decay to 5e-4, and
momentum to 0.9. The learning rate was shrunk by a factor of
10 every 20 epochs. The batch size was 16 samples. Both
models were trained for 40 epochs. Table 1 shows the
best performances achieved by both models. As shown
7. User’s input in Table 1 FPN model achieves +1% better performance
in the three metrics with respect to the DeepSentiBank
The user’s input represents the second input to the sys- model. The low-level features extracted by the FPN
laytem. The audio is converted into text using the Speech ers contribute additional, complementary information
Recognition API for Python [17] and converted into senti- that improves classification performance. The second
ment distribution using the Emotion Sensor dataset [16]. result consists of the evaluation of the FPN model and
This distribution is then presented to the user along with the DeepSentiBank model on the Emotion Dataset. Both
the result from the pipeline. models were trained with a Stochastic Gradient Descent
optimizer with a learning rate of 1e-3, a batch size of 16,
and trained for 20 epochs. Model weights were
initial8. Implementation details ized from the training on the Flickr dataset. The results
presented in Table 9 are measured on test data. The
results here confirm the previous statement about the FPN
model. In this case, using a more balanced and unbiased
dataset, the FPN reaches almost a +3% F1 score more
than the DeepSentiBank model, confirming its potential
in VSA tasks. The result of the FPN model training the
last layer only reaches a comparable score with respect to
the base finetuned DeepSentiBank model. This outcome
is likely due to the fact that by training the last layer
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>The captioning model was used in inference mode, it</title>
        <p>wasn’t used at training time for speed limitations. The
Flickr Dataset with CC [20] was resized before feeding
the algorithm since originally it was 60 GB large,
unfeasible to use in the settings described above. The resized
dimension is 9 GB. Concerning the Emotion Dataset a
ifltering step was adopted since some images presented
placeholders to indicate their unavailability. We removed
them by using a hashing comparison which measures
as a fearful word. The same happens for ‘illegal war’
which results to be a happy ANP according to the
Emotion Sensor dataset. The presence of such outliers in the
Emotion Sensor Dataset can cause a wrong sentiment
classification.</p>
        <p>The Mikels conversion is less afected by this kind of
outlier, having fewer classes. The 8 classes are almost
all classified in a balanced way. The only class which is
clearly not classified correctly is the ‘amusement’ class.
As shown in the Emotion Representation section, the
Emotion Sensor dataset in fact associates the ’amusement’
word with a distribution which is not correct.</p>
        <p>The issue of conversion afects also the captioning
model, but no NLP evaluation test was done in this
project.
In this work we presented a pipeline that aims to be a
systematic evaluation of a multimodal pipeline for
automated sentiment inference from visual data. The full
sentiment pipeline uses text, visual, and audio
information in order to present a final result to the user. In this
paper we focused the attention more on the Visual
Sentiment task while leaving the other aspects to already
developed algorithms.</p>
        <p>
          We have demonstrated the efectiveness of FPN layers
in the VSA task. Thanks to these layers, the network
gains even more advantage using the Emotion dataset
[
          <xref ref-type="bibr" rid="ref22">31</xref>
          ]. The FPN model has shown its improvement even
by adding simple branches on the main backbone. The
model used in this work was an old state-of-the-art
network. It was used to have a direct comparison with the
original paper in which ANPs were introduced. Many
SOTA CNNs can be used for the same task. Future works
could prove the performance gain by introducing FPN
layers also in these novel structures.
        </p>
        <p>This work attempts to address the multiple
representation problem of sentiment by using an easy technique of
conversion. We leveraged the Emotion Sensor dataset in
order to extract a distribution associated with each word.
The technique presented inaccuracies due to the need
to have more solid bases on the dataset used. Further
development could rely on more structured datasets, so
that the last step performances can be improved.</p>
      </sec>
      <sec id="sec-5-4">
        <title>In this section we present some examples concerning</title>
        <p>the conversion of the ANP to Ekman and of the Mikels
to Ekman representations. The content of this section
gives additional material which justifies the results above.
Some examples of wrong ANP conversion are shown in
Figure 5.</p>
        <p>As depicted in the figures, representations may fall
into outlier values. We can see that ‘flufy hair’ ANP is
associated with Fear as the predominant sentiment. This
is because the Emotion Sensor dataset presents ‘flufy’</p>
        <sec id="sec-5-4-1">
          <title>9.1. Emotion Conversion Results</title>
        </sec>
      </sec>
      <sec id="sec-5-5">
        <title>Another promising direction for future work is the Sentiment Analysis on audio data. Future development of this kind can bring important improvements to pipeline stages.</title>
        <p>The VSA problem is still far from being solved.
Exploiting multimodality is the key to reach further results. We
have seen, although, that with the actual settings, data
bias and unavailability of unique representations can
make VSA as well as Sentiment Analysis a very dificult
task.
11. Declaration on Generative AI</p>
      </sec>
      <sec id="sec-5-6">
        <title>During the preparation of this work, the authors used</title>
        <p>ChatGPT, Grammarly in order to: Grammar and spelling
check, Paraphrase and reword. After using this
tool/service, the authors reviewed and edited the content as
needed and take full responsibility for the publication’s</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          content.
          <source>advanced drone control, Drones</source>
          <volume>9</volume>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          . 3390/drones9020109. [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zimatore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Serantoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Gallotta</surname>
          </string-name>
          , References L.
          <string-name>
            <surname>Guidetti</surname>
          </string-name>
          , G. Maulucci,
          <string-name>
            <surname>M. De Spirito</surname>
          </string-name>
          ,
          <article-title>Automatic detection of aerobic threshold through re-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Oyebode</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Orji</surname>
          </string-name>
          , Social Media and
          <article-title>Sentiment currence quantification analysis of heart rate time Analysis:</article-title>
          <source>The Nigeria Presidential Election</source>
          <year>2019</year>
          , series,
          <source>International Journal of Environmental Rein: 2019 IEEE 10th Annual Information Technology, search and Public Health</source>
          <volume>20</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3390/ Electronics and Mobile Communication Conference (IEMCON),
          <year>2019</year>
          , pp.
          <fpage>0140</fpage>
          -
          <lpage>0146</lpage>
          . ISSN:
          <fpage>2644</fpage>
          -
          <lpage>3163</lpage>
          . [12] iMj.eCr.
          <year>pGh2al0lo0t3t1a</year>
          ,
          <year>9G98</year>
          ..
          <string-name>
            <surname>Zimatore</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Falcioni</surname>
          </string-name>
          , S. Migli-
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Brandizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gallotta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          , L. Ioc- accio, M. Lanza,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Biino</surname>
          </string-name>
          , M. Giuriato, chi, D. Nardi,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <surname>Unsupervised pose es- M. Bellafiore</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Palma</surname>
          </string-name>
          , et al.,
          <article-title>Influence of geographtimation by means of an innovative vision trans- ical area and living setting on children's weight former</article-title>
          ,
          <source>in: Lecture Notes in Computer Science status, motor coordination, and physical activity</source>
          ,
          <source>(including subseries Lecture Notes in Artificial In- Frontiers in pediatrics 9</source>
          (
          <year>2022</year>
          )
          <article-title>794284</article-title>
          .
          <source>telligence and Lecture Notes in Bioinformatics)</source>
          , vol- [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zimatore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cavagnaro</surname>
          </string-name>
          ,
          <source>Recurrence analume 13589 LNAI</source>
          ,
          <year>2023</year>
          , p.
          <fpage>3</fpage>
          -
          <lpage>20</lpage>
          . doi:
          <volume>10</volume>
          .1007/ ysis of otoacoustic emissions,
          <source>Understanding 978-3-031-23480-4</source>
          _
          <fpage>1</fpage>
          .
          <string-name>
            <given-names>Complex</given-names>
            <surname>Systems</surname>
          </string-name>
          (
          <year>2015</year>
          )
          <fpage>253</fpage>
          -
          <lpage>278</lpage>
          . doi:
          <volume>10</volume>
          .1007/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Borth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-F.</given-names>
            <surname>Chang</surname>
          </string-name>
          , DeepSentiBank: Visual Sentiment Concept Classifica- [
          <volume>14</volume>
          ]
          <fpage>9M7</fpage>
          .
          <fpage>8C</fpage>
          -.
          <source>3G-a3ll1o9tt-a0</source>
          ,
          <fpage>V71</fpage>
          .
          <fpage>B5o5n</fpage>
          -a8v_
          <fpage>o8lo</fpage>
          .ntà, G. Zimatore,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Iaztion with Deep Convolutional Neural Networks, zoni</article-title>
          , L. Guidetti,
          <string-name>
            <given-names>C.</given-names>
            <surname>Baldari</surname>
          </string-name>
          ,
          <source>Efects of open (racket) Technical Report arXiv:1410.8586</source>
          , arXiv,
          <year>2014</year>
          .
          <article-title>and closed (running) skill sports practice on chilArXiv:1410.8586 [cs] type: article. dren's attentional performance</article-title>
          , Open Sports Sci-
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          , W. Guettala,
          <source>ences Journal 13</source>
          (
          <year>2020</year>
          )
          <fpage>105</fpage>
          -
          <lpage>113</lpage>
          . doi:
          <volume>10</volume>
          .2174/ C. Napoli,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <article-title>Enhancing sentiment analysis on seed-iv dataset with vision transformers: A [15] 1P8</article-title>
          .
          <year>7W5a3n9g9</year>
          ,
          <fpage>XA02</fpage>
          .
          <year>0Y1a3n0g1</year>
          ,
          <year>0R1</year>
          .0M5.en, J.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>comparative study</article-title>
          ,
          <source>in: Proceedings of the 2023 11th J</source>
          . Ma,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>OFA</given-names>
            : Unifyinternational conference on information technol- ing
            <surname>Architectures</surname>
          </string-name>
          , Tasks, and
          <article-title>Modalities Through ogy: IoT and smart</article-title>
          city,
          <year>2023</year>
          , pp.
          <fpage>238</fpage>
          -
          <lpage>246</lpage>
          .
          <article-title>a Simple Sequence-to-</article-title>
          <string-name>
            <surname>Sequence Learning</surname>
          </string-name>
          Frame-
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Randieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pollina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Puglisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , Smart work,
          <source>Technical Report arXiv:2202.03052</source>
          , arXiv, glove
          <article-title>: A cost-efective and intuitive interface for 2022</article-title>
          . ArXiv:
          <volume>2202</volume>
          .03052 [
          <article-title>cs] type: article. advanced drone control</article-title>
          ,
          <source>Drones</source>
          <volume>9</volume>
          (
          <year>2025</year>
          ). doi:10. [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bil</surname>
          </string-name>
          ,
          <source>Full Emotions Sensor Dataset Containing Top 3390/drones9020109. 23 730 English Words Classified Statistically Into 7</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iacobelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Napoli,</surname>
          </string-name>
          <article-title>A machine learning Basic Emotions, 2022. based real-time application for engagement detec</article-title>
          - [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhang</surname>
          </string-name>
          (Uberi),
          <article-title>SpeechRecognition: Library for tion</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume performing speech recognition,
          <source>with support for 3695</source>
          ,
          <year>2023</year>
          , p.
          <fpage>75</fpage>
          -
          <lpage>84</lpage>
          .
          <article-title>several engines and APIs, online and ofline</article-title>
          .,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Puglisi</surname>
          </string-name>
          , S. Russo, [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Siersdorfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Minack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Deng</surname>
          </string-name>
          , J. Hare, AnalyzI. E. Tibermacine,
          <article-title>Exploiting robots as healthcare ing and predicting sentiment of images on the social resources for epidemics management and support web, in: Proceedings of the 18th ACM international caregivers</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , vol- conference on Multimedia, MM '10,
          <string-name>
            <surname>Association</surname>
            <given-names>for</given-names>
          </string-name>
          <source>ume 3686</source>
          ,
          <year>2024</year>
          , p.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . Computing Machinery, New York, NY, USA,
          <year>2010</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , ImageNet pp.
          <fpage>715</fpage>
          -
          <lpage>718</lpage>
          .
          <article-title>Classification with Deep Convolutional Neural Net-</article-title>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Esuli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          , SENTIWORDNET: A Pubworks,
          <source>in: Advances in Neural Information Process- licly Available Lexical Resource for Opinion Mining Systems</source>
          , volume
          <volume>25</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc., ing,
          <source>in: Proceedings of the Fifth International 2012. Conference on Language Resources and Evaluation</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wajda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>A com-</article-title>
          (
          <source>LREC'06)</source>
          ,
          <article-title>European Language Resources Associaparative study of machine learning approaches for tion (ELRA), Genoa</article-title>
          , Italy,
          <year>2006</year>
          .
          <article-title>autism detection in children from imaging data</article-title>
          , in: [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Borth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Breuel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-F.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <source>LargeCEUR Workshop Proceedings</source>
          , volume
          <volume>3398</volume>
          ,
          <year>2022</year>
          ,
          <article-title>scale visual sentiment ontology and detectors</article-title>
          using p.
          <fpage>9</fpage>
          -
          <lpage>15</lpage>
          .
          <article-title>adjective noun pairs</article-title>
          ,
          <source>in: Proceedings of the 21st</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Randieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pollina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Puglisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , Smart ACM international conference on Multimedia,
          <article-title>MM glove: A cost-efective and intuitive interface for '13, Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>2013</year>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Xu, Multi-level [33]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ekman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. V.</given-names>
            <surname>Friesen</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. O'Sullivan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chan</surname>
          </string-name>
          ,
          <article-title>Region-based Convolutional Neural Network for I. Diacoyanni-</article-title>
          <string-name>
            <surname>Tarlatzis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Heider</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>W. A.</given-names>
          </string-name>
          <string-name>
            <surname>Image Emotion</surname>
            <given-names>Classification</given-names>
          </string-name>
          , Neurocomputing LeCompte, T. Pitcairn,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Ricci-Bitti</surname>
          </string-name>
          ,
          <source>Universals</source>
          <volume>333</volume>
          (
          <year>2019</year>
          ).
          <article-title>and cultural diferences in the judgments of facial</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>She</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          , M.-M. Cheng, P. L. Rosin, expressions of emotion,
          <source>J Pers Soc Psychol</source>
          <volume>53</volume>
          (
          <year>1987</year>
          ) L.
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <source>Visual Sentiment Prediction Based on 712-717. Automatic Discovery of Afective Regions</source>
          , IEEE [34]
          <string-name>
            <given-names>R.</given-names>
            <surname>Plutchik</surname>
          </string-name>
          , Chapter 1 - A
          <source>GENERAL PSYCHOTransactions on Multimedia</source>
          <volume>20</volume>
          (
          <year>2018</year>
          )
          <fpage>2513</fpage>
          -
          <lpage>2525</lpage>
          .
          <source>EVOLUTIONARY THEORY OF EMOTION</source>
          , in: Conference Name: IEEE Transactions
          <string-name>
            <surname>on Multime- R. Plutchik</surname>
          </string-name>
          , H. Kellerman (Eds.), Theories of Emodia. tion, Academic Press,
          <year>1980</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Katsurai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satoh</surname>
          </string-name>
          , Image sentiment analysis [35]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Harihausing latent correlations among visual, textual, and ran, S. Belongie, Feature Pyramid Networks for Obsentiment views</article-title>
          ,
          <year>2016</year>
          . ject Detection,
          <source>Technical Report arXiv:1612</source>
          .03144,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , Remote eye movement arXiv,
          <year>2017</year>
          . ArXiv:
          <volume>1612</volume>
          .03144 [
          <article-title>cs] type: article. desensitization and reprocessing treatment of long</article-title>
          - [36] NLTK ::
          <source>Natural Language Toolkit</source>
          ,
          <year>2022</year>
          . covid- and
          <string-name>
            <surname>post-</surname>
            covid-related traumatic disorders: [37]
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Loria</surname>
          </string-name>
          , textblob-aptagger,
          <year>2022</year>
          .
          <article-title>Original-date: An innovative approach</article-title>
          ,
          <source>Brain Sciences</source>
          <volume>14</volume>
          (
          <year>2024</year>
          ).
          <year>2013</year>
          -
          <volume>09</volume>
          -18T20:
          <fpage>03</fpage>
          :40Z. doi:
          <volume>10</volume>
          .3390/brainsci14121212. [38]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>WordNet: a lexical database for English,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Corchs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparini</surname>
          </string-name>
          , Ensemble learn-
          <source>Commun. ACM</source>
          <volume>38</volume>
          (
          <year>1995</year>
          )
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          .
          <article-title>ing on visual and textual data for social image emo- [39] Oxford Learner's Dictionaries | Find definitions, tion classification</article-title>
          ,
          <source>Int. J. Mach. Learn. &amp; Cyber</source>
          . 10 translations, and grammar explanations at Oxford (
          <year>2019</year>
          )
          <fpage>2057</fpage>
          -
          <lpage>2070</lpage>
          .
          <source>Learner's Dictionaries</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bianco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , W. Agata, [
          <volume>40</volume>
          ]
          <article-title>Thesaurus.com - The world's favorite online theet al</article-title>
          .,
          <source>Psychoeducative social robots for an healthier saurus!</source>
          ,
          <year>2022</year>
          .
          <article-title>lifestyle using artificial intelligence: a case-study</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3118</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>26</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iacobelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Eyetracking system with low-end hardware: development and evaluation</article-title>
          ,
          <source>Information</source>
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <fpage>644</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iacobelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pelella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , et al.,
          <article-title>A fast and accessible neural network based eye-tracking system for real-time psychometric and hci applications</article-title>
          ,
          <source>in: CEUR WORKSHOP PROCEEDINGS</source>
          , volume
          <volume>3870</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Puglisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          , et al.,
          <article-title>Exploiting robots as healthcare resources for epidemics management and support caregivers</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3686</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>N.</given-names>
            <surname>Boutarfaia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <article-title>Deep learning for eeg-based motor imagery classification: Towards enhanced human-machine interaction and assistive robotics</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3695</volume>
          ,
          <year>2023</year>
          , p.
          <fpage>68</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Q.</given-names>
            <surname>You</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark</article-title>
          ,
          <source>Technical Report arXiv:1605.02677</source>
          , arXiv,
          <year>2016</year>
          . ArXiv:
          <volume>1605</volume>
          .02677 [
          <article-title>cs] type: article.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Mikels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. L.</given-names>
            <surname>Fredrickson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Larkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Lindberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Maglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Reuter-Lorenz</surname>
          </string-name>
          ,
          <article-title>Emotional category data on images from the international afective picture system</article-title>
          ,
          <source>Behavior Research Methods</source>
          <volume>37</volume>
          (
          <year>2005</year>
          )
          <fpage>626</fpage>
          -
          <lpage>630</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>