<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ICYRIME</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Real-Time Machine Learning Based Solution for Privacy Enforcement in Video Recordings and Live Streaming</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pietro Manganelli Conforti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Emanuele</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo Mandelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer, Control and Management Engineering, Sapienza University of Rome</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>9</volume>
      <fpage>53</fpage>
      <lpage>59</lpage>
      <abstract>
        <p>These past years the world had to deal with a whole new situation brought by Covid-19. Everyone's routine changed and we started passing way more time than before on virtual meeting, virtual chats and similar. With this, many privacy problems arised from all the video data generated by a single user. Google and Zoom introduced the possibility to blur out the background while using a front face camera, but this did not solve many privacy concerns ranging from showing people in videos without their permission, to the leaking of sensible data and information from videos uploaded online. We propose a solution build over the use of computer vision techniques like image segmentation and classification for context recognition for a privacy enforcement solution capable of fitting the user's personal need, blurring out selectively specific objects from a video based on the user's preferences for each room in which they are.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Image segmentation</kwd>
        <kwd>Context Recognition</kwd>
        <kwd>Detectron2</kwd>
        <kwd>Privacy enforcement</kwd>
        <kwd>Covid-19</kwd>
        <kwd>Alexnet</kwd>
        <kwd>Transfer learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        to show, based on the recognition of the environment
framed, in order to blur out objects relatively to both the
In the past years there has been a solid shift for the en- user needs and the context in which they are.
tire world population towards a more active presence
online. Covid-19 has further pushed many activities to
be faced digitally. Virtual meeting application like Zoom,
had 10 million daily meeting participants in December 2. Related works
2019, but by April 2020, that number increased to reach
up to 300 million [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It is estimated that in 2024 only With the advancement of technology, people have been
25% of the business meetings will take place in person sharing a continuously growing amount of personal data
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Studies started during 2020 have demonstrated that online. Additionally to life-logging devices[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], social
nowadays people spend on average way more time in vir- medias have recently stepped in, ending up quickly
domtual meetings than before [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], leading to many concerns inating the landscape of mass produced data with "visual
for the single person. Users have started experiencing data"(i.e. images and video). For instance, in 2020, the
stress related to not being competent in the use of the ifrst year of pandemic, users have generated and shared
technology, but most importantly to "Zoom fatigue" due via Facebook a total of 10.5million videos[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
to it being always "on" [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Many privacy related issues This impactful amount of data brought to the attention
have been crippling the user experience ever since, such of experts and users to many privacy related issues;
studas exposing private and personal spaces on camera, un- ies started identifying and observing how easily privacy
intentionally framing a person who did not give consent could be violated just from unintentionally sharing
perto be on video or sharing sensible information leaked sonal data contained inside images and videos, and
subfrom careless online posting. Many solutions have been sequently started proposing privacy models to formally
promptly developed to prevent such things from hap- approach and tackle said scenarios [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The scientific
pening, providing virtual meeting room services with world went quickly from defining sub-fields like
Privacysafeguard-privacy functionalities like blurring the back- Preserving Machine Learning(PPML) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], to adopting
ground and virtual backgrounds[
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">5, 6, 7, 8</xref>
        ]. We present deep learning models for image disguising [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Context
in this paper a novel computer vision based approach for Recognition[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] as well as Image-Based Localization[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
privacy enforcement in video data, capable of filter out and again computer vision based framework[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] as novel
from a video a list of objects that a user does not want solutions for privacy preserving of first person vision
image sequences, placing computer vision, Artificial
Intelligence and data driven approaches as state of the
art techniques for preserving privacy on line. Among
the many available solutions for privacy preservation
and safeguarding, it is currently missing one which
allows single users to selectively censor objects from visual
data depending on their personal needs and preferences. very powerful instance segmentation network published
The propose work aims therefore to provide experience- by Facebook in 2019 [27], is used to identify user’s
conlacking users with an intuitive, easy to use tool for pri- text specific, privacy related data, within video frames,
vacy enforcement in video data based on computer vision with no ambiguity; combining the output masks
protechniques. duced by Detectron 2 with all the information retrieved
before, a particular region of the frame is identified and
ifltered with a Gaussian transform. Similar or identical
contexts disambiguation is possible to be tackled and
3. Implementation solved with the support of RFID technology: with the
introduction of a beacon that send a constant signal, it
is possible to recognize and distinguish two
apparentlysame looking environment. Such discriminatory action
is essential, yet simple to be applied since it is integrable
in any environment with low efort or invasiveness. A
similar RFID-based solution for context recognition was
already presented by another research [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] conducted
some years ago. Finally the desired efect is obtained by
processing and collecting all the frames of the video and
setting the right frame velocity.
      </p>
      <p>Sensible users’ data is crucial to be kept private. The
proposed software tracks such information by means of
various modules which pipeline is shown in fig.1.
Memory bufers are used in between modules to guarantee
lfexibility towards input videos of any ,  and
 , as well as to stabilize the output overcoming the
common flickering experienced in these kind of
applications. Thanks to such bufers it is possible to store
past frames and reuse those to statistically smooth of the
ifnal output; past frames are reused according to level of
confidence the class recognition module predicted with.</p>
      <p>
        Input videos can be directly uploaded to the system or
streamed from cameras(i.e. webcams). 3.1. Dataset
The proposed solution separates the overarching learning Distinct datasets have been used for the two diferent
problem in two sub problems, namely context recogni- learning tasks respectively, image segmentation and
contion and image segmentation; this approach guarantees text recognition. The choice of using the Detectron2
robustness through modularity and simplifies the overall network for the the image segmentation tasks leaves
litfunctioning of the software. tle to no choice but using the 2017 version for the COCO
Recognizing the users, their emotional state [
        <xref ref-type="bibr" rid="ref17 ref18 ref19">17, 18, 19</xref>
        ], dataset [28] which has been demonstrated to be
perforhis attentive state [
        <xref ref-type="bibr" rid="ref20 ref21 ref22">20, 21, 22</xref>
        ], and the context surroundings[2m3,ative with such dataset.
24, 25] allows to selectively obscure specific elements COCO is a dataset composed of two groups of elements:
based on preferences the same user expressed at the reg- images and annotations. Images contain a vast variety
istration time; such data is stored into a database for later of objects, for a total of 80 diferent category of elements.
inference. Context recognition has been tackled with a The network was capable to recognize them all, and even
neural network inspired by Alexnet [26], a famous Deep apparently odd objects were left untouched and not
reConvolutional Neural Network, designed for image clas- moved. Together with the set of images, COCO is
comsification. posed of a set of so-called "annotations" that contain
Together with an RFID application [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] Detectron 2, a information related to the position of the object masks,
their bounding box and their location on the image ref- image to classify. For this task Alexnet has been fine
erence frame. tuned on our reduced dataset by means of transfer
learnFor what concern the context recognition part, a slightly ing.
modified version of a dataset available on Kaggle[ 29] has The structure of the network is shown in figure 2.
been used; such dataset is composed of 5 diferent classes
symbolizing five diferent kind of rooms, two of which Here we report the performance with which we
evalhas been merged together, namely living room and din- uated our model. We will give particular importance to
ing room. Each element is originally an RGB picture of both the accuracy and F1-score of each class.
a fixed size of 224x224x3 which has been resized to be
227x227x3, to better fit through AlexNet (?). Classes precision recall F1-score overall Accuracy
As part of the training &amp; testing process a defined set of LBivainthgrRoooomm 00..8942 00..7990 00..8857
image processing techniques have been organized into a Bedroom 0.69 0.76 0.76
pipeline. Such transformation pipeline has been imple- Kitchen 0.75 0.76 0.76
mented using Albumentation library[30], an easy-to-use 0.83
and intuitive library for image processing; it consists of: Table 1
ShiftScaleRotate, for shifting or rotating images; RGB- Performance of our classification task
shift for randomly altering RGB channels’ values;
RandomBrightnessContrast for randomly changing iamges’
brightness and contrast; MultiplicativeNoise for randomly
adding noise; Normalize, for normalizing data;
HueSaturationValue, for randomly changing images’ saturation.
      </p>
      <p>The performance of the network are reported in the
table 1. As we can see we are capable of obtaining high
F1score values for each class and an overall accuracy above
80%, making the results satisfying for our standards. We
can consider the macro F1 score as a general metric of
evaluation, defined as:</p>
      <sec id="sec-1-1">
        <title>3.2. Image Segmentation Network</title>
        <sec id="sec-1-1-1">
          <title>Detectron2 is Facebook AI Research’s library [27] that</title>
          <p>provides state-of-the-art detection and segmentation
algorithms. It is the successor of Detectron [31] which is
in turn based on the Maskrcnn-benchmark model [32].
It supports a great number of computer vision research
projects thanks to its flexibility, output capabilities and
available documentation.</p>
          <p>Among the available Detectron2’ architectures
maskrcnn-fpn has been chosen. Such architecture is mainly
built from three modules: a Backbone Network, a Region
Proposal Network and a Box Head.</p>
          <p>The   , whose role is to extract
multiscale feature maps with diferent receptive fields starting
from the input image, is based on the Feature Pyramid
Network [33] technique. In this way areas of interest
from diferent points of view are identified and passed
to both the two next modules. The   
  detects object regions (which are the so called
“proposal boxes” ) based on multi-scale features, which
together with the feature maps serve as input for 
(Region of interest) . This last module warps
feature maps using proposal boxes into multiple fixed-size
features, and retrieves the fine-tuned box locations and
classification results via fully-connected layers.</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>3.3. Context Recognition Network</title>
        <p>The context recognition task belongs to a classification
problem, where each frame of the video is treated as an
 −  1 =</p>
        <p>1 ∑︁  1 = 0.81

=1
Which is simply an overall F1 score among all classes. In
our case, we can see a macro F1 score of 0.81.</p>
        <sec id="sec-1-2-1">
          <title>It is also possible to take vision of the confusion matrix in figure 3, showing how the diferent samples from the test set were classified during the test phase.</title>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>3.4. Output stabilization techniques</title>
        <p>Videos in daily life scenarios, likely happen to contain
temporary blank frames, as well as artifacts, due to user
or scene related conditions. A context recognition
network will therefore generate a classification label that
will be trivially assigned, leading to an instability
problem. We dealt with this problem through the introduction
of a "memory bufer", that is capable of stabilizing
statistically the result.</p>
        <p>This is achieved by endowing the system with two
bufers, one for the context recognition network that
stores the predicted context classes, and one used to track
the instance segmentation network output and to store
the classes predicted with enough accuracy. Storage of
past data allows to create a time relation between
successive frames, thus enforcing the output of each
network and stabilizing the final one. This method allows
to correlate information inside the video with the least
expenditure of resources. There are two bufers.
to change the output and in this short period of time is
wrongly classified with the previous stable label. This is
strongly compensated by the stability provided and the
delay is short enough to be dificult to notice.
3.4.2. Instance segmentation memory bufer
The other output stabilization technique is the instance
segmentation memory bufer dedicated to the Detectron
output. The rationale here is regarding the threshold
used by the model to decide if an element belongs or not
to a certain class. Being a privacy concern to conceal
as much as possible the sensible information inside the
frames, false positives in exchange of a higher number
of true positives are preferable. Therefore, two kinds
of thresholds are considered: the basic one and the
opFigure 3: The confusion matrix generated with the use of the timal one. The first one is lower than the second one
scikit library [35] and is the minimal value considered acceptable to take
into account the output of the network. If the output
confidence regarding a specific instance inside the frame
falls below this value is considered too unclear and it is
3.4.1. Context memory bufer not counted. The second threshold instead represents
The first bufer visible in the top of fig.1 is the one ded- the optimal value of confidence used by the system in
icated to the output of Alexnet. The assumption taken order to properly recognize an element with enough
acby this approach is that the context changes don’t take curacy. This information is used in order to track the
place suddenly but instead are related to a smooth trend. last elements appeared to the network, appending them
For instance, if the frame  recognizes a specific context, inside the bufer.
there is a high probability that also the frame  + 1 will If the networks find an instance of a class inside the bufer
carry out similar information and represents the same even with a lower confidence value in respect to the
opcontext. Thus, doing an average among the past frames timal threshold it is still considered acceptable, therefore
increases the overall accuracy by smoothing the output processed and eventually concealed. In this way it gets
trend. easier for the network to work with moving objects
beThe length of the bufer is set dynamically in relation to cause this method allows it to trace them even in the case
the   value retrieved from the video and the informa- of uncertainty due to movement.
tion obtained by the frames from the last half of second The bufer length is dynamically related to the   value
gets stored within. and stores information regarding the set of frames
conThe trade-of of this method is a small delay from the cerning the last three seconds. If an element is
recogcontext recognition module because the output context nized by the network after this time interval in order to
label has to stabilize for at least half of the bufer ℎ be evaluated it needs to overcome the optimal confidence
threshold again.
The trade-of of this system, as mentioned above, is a
higher frequency of false positives, which can be
misleading for the final result and inversely proportional
in number to the two threshold values. Overall, the
accuracy following this approach improved by a discrete
percentage, mainly in the more dynamic scenarios.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Results</title>
      <p>The results of the system are evaluated accordingly on
how many times the full procedure works consistently to
specific information given as input related to a specific
test video. Knowing in advance those information which
are the settings inside a set of test videos as well as the list
of elements inside of them, we can measure the overall
accuracy of the system.</p>
      <p>For instance, if a video displays a specific context with
a certain number of known elements inside of it we can
control how many times those elements are found by
the two networks and by the two memory bufers in the
output trend.</p>
      <p>In order to achieve this result an evaluation procedure
was implemented that given an input video follows
similar steps that the system does but keeps into account
the number of times the output given starting from each
input frame is correct in respect to the total number of
them. The test video used are three, which are providing
the following scenarios:
• ℎ (fig. 4.a), where we want to blur out a
bowl from a table given that the system recognize
the context. The instance segmentation network
can identify various objects as a table, a oven and
bottles. Due to the user preferences, here the
stationary objects we want to blur out from all the
video frames is simply one, a black bowl.
• ℎ (fig. 4.b), where the user want to blur
out the WC. In such scenario, the instance
segmentation network recognize as a WC also the
bidet given the similarity in their structure.
•  (fig. 4.c), In this scenario there is a
bowl placed in a flat surface behind the bed, and
the user wants to blur it out. This scenario can
be potentially challenging since the portion of
the room we are framing is very restricted, and
the only object that can be considered a strong
feature is the bed.</p>
      <p>The results are shown in the following table. Here we
have 5 diferent values of evaluation for each test:
• accuracy of context recognition network respect the
total number of frames(C.R.), indicating the
percentage of success for the context recognition
network applied to the frames of the video. This
value does not show the improvements brought
by the memory bufer.
• accuracy of context recognition network + memory
bufer respect the total number of frames (C.R. ∖
B.), indicating the percentage of success for the
context recognition network combined with the
memory bufer for the context recognition task.
• Accuracy of the instance segmentation network in
ifnding the objects of interest in a frame respect
the total number of frame(I.S.). This indicates the
percentage of success for the instance
segmentation task respect the objects of interest for the
user. This value does not show the improvements
brought by the memory bufer.
• Accuracy of the instance segmentation network +
memory bufer in finding the objects of interest in a
frame respect the total number of frame(I.S. ∖ B.).
this indicates the percentage of success for the
instance segmentation task respect the objects of
interest for the user. This value does not show
the improvements brought by the memory bufer.
• overall accuracy of the whole system. This
indicates, as the name states, the overall accuracy
of the whole pipeline. This accuracy is given as
a combined accuracy from the accuracy of the
two tasks, obtained as the product between the
accuracy of the instance segmentation
considering also the bufer and the context recognition
considering also the bufer.</p>
      <sec id="sec-2-1">
        <title>In table 2 tests’ results are reported, confirming that mem</title>
        <p>ory bufers contribute to increase the accuracy of both
tasks, translating in overall better system’s accuracy.
It must be noted that accuracy can be further improved
by fine tuning the thresholds required by the instance
segmentation task. A general thumb rule is that, if the
accuracy is similar between the system using the bufers
and the system not using it, it is possible to improve the
performance through such fine tuning.</p>
        <p>Input
Kitchen
Bathroom</p>
        <p>Bedroom</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Conclusions</title>
      <p>In this paper we presented a machine learning-powered
solution for privacy enforcement in video data, a
datadriven implementation to safeguard the privacy of any
user that may be forced to spend plenty of hours in videos
and/or in video meetings. Such solution will solve an
untouched problem that wasn’t formally faced by the field
in the past years, where our time started to be of-centre
towards the time spent online.</p>
      <p>Our implementation presents good performance even in
presence of noisy or foggy videos, while perform almost
perfectly in the most common scenarios of videos with
common perspectives extracted from general recordings
using mobile devices. The adaptability of the system to
change with the needs of diferent users both for the
objects of interest and the context of interest makes the
solution we propose a solid step forward in the field of
privacy enforcement for video data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <article-title>The zoom revolution: 10 eye-popping stats from tech's new superstar</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Standaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Muylle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Basu</surname>
          </string-name>
          ,
          <article-title>How shall we meet? understanding the importance of meeting mode capabilities for diferent meeting objectives</article-title>
          ,
          <source>Information Management</source>
          (
          <year>2021</year>
          ). doi:https:// doi.org/10.1016/j.im.
          <year>2020</year>
          .
          <volume>103393</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Azizi</surname>
          </string-name>
          ,
          <source>The state of video conferencing</source>
          <year>2022</year>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Karl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Peluchette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aghakhani</surname>
          </string-name>
          ,
          <article-title>Virtual work meetings during the covid-19 pandemic: The good, bad, and ugly</article-title>
          ,
          <source>Small Group Research</source>
          <volume>0</volume>
          (
          <year>2021</year>
          ). URL: https://doi.org/10.1177/10464964211015286.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wozniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , E. Tramontana, G. Capizzi,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lo</surname>
          </string-name>
          <string-name>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Nowicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Starczewski</surname>
          </string-name>
          ,
          <article-title>A multiscale image compressor with rbfnn and discrete wavelet decomposition</article-title>
          ,
          <source>in: Proceedings of the International Joint Conference on Neural Networks</source>
          , volume
          <volume>2015</volume>
          <source>-September</source>
          ,
          <year>2015</year>
          . doi:
          <volume>10</volume>
          .1109/IJCNN.
          <year>2015</year>
          .
          <volume>7280461</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>A new iterative fir filter design approach using a gaussian approximation</article-title>
          ,
          <source>IEEE Signal Processing Letters</source>
          <volume>25</volume>
          (
          <year>2018</year>
          )
          <fpage>1615</fpage>
          -
          <lpage>1619</lpage>
          . doi:
          <volume>10</volume>
          .1109/LSP.
          <year>2018</year>
          .
          <volume>2866926</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Połap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , E. Tramontana,
          <string-name>
            <given-names>R.</given-names>
            <surname>Damaševičius</surname>
          </string-name>
          ,
          <article-title>Is the colony of ants able to recognize graphic objects?</article-title>
          ,
          <source>Communications in Computer and Information Science</source>
          <volume>538</volume>
          (
          <year>2015</year>
          )
          <fpage>376</fpage>
          -
          <lpage>387</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -24770-0_
          <fpage>33</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Połap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gabryel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Nowicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , E. Tramontana,
          <article-title>Can we process 2d images using artificial bee colony?</article-title>
          ,
          <source>in: Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)</source>
          , volume
          <volume>9119</volume>
          ,
          <year>2015</year>
          , p.
          <fpage>660</fpage>
          -
          <lpage>671</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -19324-3_
          <fpage>59</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <article-title>Dredging up the past: Lifelogging, memory</article-title>
          and surveillance, University of Chicago Law Review
          <volume>12</volume>
          (
          <year>2008</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dobrilova</surname>
          </string-name>
          ,
          <source>The most astonishing facebook statistics in</source>
          <year>2022</year>
          ,
          <year>2022</year>
          . URL: https://techjury.net/blog/ facebook-statistics/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Masoodian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <article-title>Privacy issues for online personal photograph collections</article-title>
          ,
          <source>Journal of Theoretical and Applied Electronic Commerce Research</source>
          <volume>5</volume>
          (
          <year>2010</year>
          ). doi:
          <volume>10</volume>
          .4067/ S0718-18762010000200003.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Baracaldo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <article-title>Privacy-preserving machine learning: Methods, challenges</article-title>
          and directions,
          <year>2021</year>
          . arXiv:
          <volume>2108</volume>
          .
          <fpage>04417</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Image disguising for privacypreserving deep learning</article-title>
          ,
          <source>in: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security</source>
          ,
          <article-title>Association for Com- ing a machine learning approach with gan-based puting</article-title>
          <string-name>
            <surname>Machinery</surname>
          </string-name>
          , New York, NY, USA,
          <year>2018</year>
          .
          <article-title>URL: data augmentation technique trained using a cushttps://doi</article-title>
          .org/10.1145/3243734.3278511. tom dataset,
          <source>OBM Neurobiology 6</source>
          (
          <year>2022</year>
          ). doi:10.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Farinella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , G. Nicotra, S. Riccobene,
          <volume>21926</volume>
          /obm.neurobiol.2204139.
          <article-title>A context-driven privacy enforcement system for</article-title>
          [25]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iacobelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Napoli,</surname>
          </string-name>
          <article-title>A machine learning autonomous media capture devices, Multimedia based real-time application for engagement detecTools</article-title>
          and
          <source>Applications</source>
          <volume>78</volume>
          (
          <year>2019</year>
          )
          <fpage>14091</fpage>
          -
          <lpage>14108</lpage>
          . URL: tion,
          <source>in: CEUR Workshop Proceedings</source>
          , volume https://doi.org/10.1007/s11042-019-7376-z.
          <volume>3695</volume>
          ,
          <year>2023</year>
          , p.
          <fpage>75</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Speciale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Schönberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Sinha</surname>
          </string-name>
          , [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <surname>Imagenet M. Pollefeys</surname>
          </string-name>
          ,
          <article-title>Privacy preserving image-based local- classification with deep convolutional neural netization</article-title>
          , CoRR abs/
          <year>1903</year>
          .05572 (
          <year>2019</year>
          ). works (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>A. T.-Y. Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Biglari-Abhari</surname>
            ,
            <given-names>K. I.-K.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            , [27]
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kirillov</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Massa</surname>
            , W.-Y. Lo,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
          </string-name>
          ,
          <article-title>Trusting the computer in computer vision: A Detectron2</article-title>
          , https://github.com/facebookresearch/ privacy-afirming framework,
          <source>in: 2017 IEEE Confer- detectron2</source>
          ,
          <year>2019</year>
          . ence on Computer Vision and Pattern Recognition [28]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maire</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bourdev</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>GirWorkshops (CVPRW</article-title>
          ),
          <year>2017</year>
          . shick, J. Hays,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ramanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Zitnick</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>I. E.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tibermacine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Guettala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollár</surname>
          </string-name>
          ,
          <article-title>Microsoft coco: Common objects in conC</article-title>
          . Napoli,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <source>Enhancing sentiment anal- text</source>
          ,
          <year>2015</year>
          . arXiv:
          <volume>1405</volume>
          .0312.
          <article-title>ysis on seed-iv dataset with vision transformers: [29] RobinReni, House rooms image dataset, 2020. A comparative study</article-title>
          , in: ACM International URL: https://www.kaggle.com/robinreni/ Conference Proceeding Series,
          <year>2023</year>
          , p.
          <fpage>238</fpage>
          -
          <lpage>246</lpage>
          .
          <article-title>house-rooms-image-dataset</article-title>
          .
          <source>doi:10.1145/3638985</source>
          .3639024. [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Buslaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. I.</given-names>
            <surname>Iglovikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Khvedchenya</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Pari-
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wajda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brociek</surname>
          </string-name>
          , C. Napoli, nov,
          <string-name>
            <given-names>M.</given-names>
            <surname>Druzhinin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Kalinin</surname>
          </string-name>
          ,
          <article-title>Albumentations: Analysis pre and post covid-19 pandemic rorschach Fast and flexible image augmentations, Informatest data of using em algorithms</article-title>
          and gmm mod- tion
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <article-title>125</article-title>
          . URL: http://dx.doi.org/10.3390/ els, in
          <source>: CEUR Workshop Proceedings</source>
          , volume
          <volume>3360</volume>
          ,
          <year>info11020125</year>
          .
          <year>2022</year>
          , p.
          <fpage>55</fpage>
          -
          <lpage>63</lpage>
          . [31]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          , I. Radosavovic, G. Gkioxari, P. Dollár,
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alfarano</surname>
          </string-name>
          , G. De Magistris,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mongelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          , Detectron,
          <year>2018</year>
          . URL: https://github.com/ J. Starczewski,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>A novel convmixer trans- facebookresearch/detectron. former based architecture for</article-title>
          violent behavior de- [32]
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          , R. Girshick, maskrcnn
          <article-title>-benchmark: Fast, tection</article-title>
          ,
          <source>in: Lecture Notes in Computer Science modular reference implementation of Instance (including subseries Lecture Notes in Artificial In- Segmentation and Object Detection algorithms telligence and Lecture Notes in Bioinformatics)</source>
          , vol- in PyTorch, https://github.com/facebookresearch/ ume 14126 LNAI,
          <year>2023</year>
          , p.
          <fpage>3</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .1007/ maskrcnn-benchmark,
          <year>2018</year>
          . Accessed: [
          <source>Insert date 978-3-031-42508-0_1</source>
          . here].
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iacobelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , Eye- [33]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Hariharan, tracking system with low-end hardware: Devel- S. Belongie, Feature pyramid networks for object opment and evaluation, Information (Switzerland) detection</article-title>
          ,
          <year>2017</year>
          . arXiv:
          <volume>1612</volume>
          .
          <fpage>03144</fpage>
          . 14 (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3390/info14120644. [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khvostikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Aderghal</surname>
          </string-name>
          , J. Benois-Pineau,
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>An advanced solu- A.</article-title>
          <string-name>
            <surname>Krylov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Catheline, 3d cnn-based classification based on machine learning for remote emdr tion using smri and md-dti images for alzheimer therapy</article-title>
          ,
          <source>Technologies</source>
          <volume>11</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3390/ disease studies (
          <year>2018</year>
          ).
          <year>technologies11060172</year>
          . [35]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bianco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wa- B. Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>Prettenhofer, jda, Psychoeducative social robots for an healthier</article-title>
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Passos, lifestyle using artificial intelligence: a case-study,</article-title>
          <string-name>
            <surname>D. Cournapeau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Perrot</surname>
          </string-name>
          , E. Duchin: CEUR Workshop Proceedings, volume
          <volume>3118</volume>
          , esnay,
          <source>Scikit-learn: Machine learning in Python</source>
          ,
          <year>2021</year>
          , p.
          <fpage>26</fpage>
          -
          <lpage>33</lpage>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Brociek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Magistris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cardia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Coppa</surname>
          </string-name>
          ,
          <volume>2825</volume>
          -
          <fpage>2830</fpage>
          . S. Russo,
          <article-title>Contagion prevention of covid-19 by means of touch detection for retail stores</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3092</volume>
          ,
          <year>2021</year>
          , p.
          <fpage>89</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pepe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tedeschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Brandizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Iocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Human attention assessment us-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>