<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>J. Mallmann, A. O. Santin, E. K. Viegas, R. R. dos Santos, J. Geremias, Ppcensor: Architecture
for real-time pornography detection in video streaming, Future Generation Computer Systems</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/CVPR.2015.7298594</article-id>
      <title-group>
        <article-title>Comparative Analysis of YOLO Architectures for Human Body Part Detection: Towards Symbiotic AI in Human-AI Interaction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vita Santa Barletta</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Caivano</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Dimauro</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massimiliano Morga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Maria Ricchiuti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Beatrice Scavo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Valentino</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SER&amp;Practices, Spin-of of the University of Bari Aldo Moro</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli studi di Bari Aldo Moro</institution>
          ,
          <addr-line>Piazza Umberto I, 70121 Bari, Apulia</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>112</volume>
      <issue>2020</issue>
      <fpage>770</fpage>
      <lpage>778</lpage>
      <abstract>
        <p>Cyber Social Security requires efective tools for the identification and automated moderation of harmful visual content, such as non-consensual nudity, sextortion, and online pornography. Addressing this issue requires not only accurate AI-based moderation tools but also systems that align with ethical, trustworthy, and human-centered design principles. In this study, we present a comparative analysis of two versions of the YOLO framework (YOLOv5 and YOLO11), evaluated across their respective model sizes (n, s, m, l, x, xl) and tested with both pretrained and randomly initialized weights. The goal is to determine the most efective configuration for the task of nudity detection. To this end, we constructed a dedicated dataset of over 5,000 annotated images across ten sensitive classes, with a focus on semantic balance and annotation quality. The models were tested under various configurations, revealing that YOLO11m with pretrained weights ofers the best trade-of between accuracy and computational eficiency. The results confirm the potential of YOLO-based models for real-time automated moderation applications, while also highlighting the need for further improvements in localization accuracy.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Cyber Social Security</kwd>
        <kwd>Trustworthy AI System</kwd>
        <kwd>Nudity Detection</kwd>
        <kwd>YOLO Framework</kwd>
        <kwd>Real-Time Object Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The development of the internet and digital technologies has both enhanced social interaction and the
availability of damaging material. Concerned Cyber Sociologists need mechanisms capable of automated
non-consensual image-sharing detection and online sexual exploitation deterrence, capable of nudity
detection as it relates to image analysis [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Discriminative models utilize photographic skin color or
shape and use multi-stage pipelines associated with explicit content classification. Such approaches
sufer from high false positive rates, lack of generalization, increased computing resources or time
expenditures, and disjointed execution.
      </p>
      <p>
        In addition, recent research explores the potential of human-AI symbiosis and human body part
detection for advanced human-machine interaction. Willcox &amp; Rosenberg [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] propose a Symbiont AI that
learns to assist humans in real-time through Embodied Symbiotic Learning, fostering a partnership with
shared expectations. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the authors emphasizes augmented cognition to enhance human-machine
symbiosis through mutual understanding and support. In the realm of human body part detection,
Kuang et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduce a method integrating human body part information to improve Human
Object Interaction detection. Instead, Xu et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] present AIP-Net, an anchor-free instance-level human
part detection network that achieves state-of-the-art performance on the COCO Human Parts Dataset
and demonstrates practical application in human-robot interaction. These advancements collectively
contribute to the development of more efective and intuitive human-AI interactions, leveraging body
part information and symbiotic learning approaches.
      </p>
      <p>Therefore, considering the need to identify new tools for social security and the literature on nudity
detection in social contexts, this paper describes a system for detecting nudity that uses exclusively
the YOLO architecture which trains to find and mark nude boundaries in static images. While
twostage approaches must be less accurate, focus on inference speed, and greater ease-of-use, single-stage
models derive better results. In this study, we attempt to determine which variant of YOLOv5 and
YOLOv11 enables real-time moderation of explicit content based on accuracy, eficiency, and resource
consumption.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The automated detection of pornographic and sexually explicit content is a central challenge within
the broader field of Cyber Social Security [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], where it supports the mitigation of digital harms such
as online grooming, sextortion, and unwanted exposure—particularly in vulnerable populations [
        <xref ref-type="bibr" rid="ref1 ref8">8, 1</xref>
        ].
Efective content moderation systems are critical for law enforcement, platform compliance, and the
maintenance of healthy digital ecosystems.
      </p>
      <p>
        Early approaches to visual explicit content detection primarily relied on color-based models to
identify skin-toned regions under various lighting and pose conditions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Although computationally
eficient, these methods exhibited high false positive rates, often misclassifying sports scenes or
skincolored backgrounds. To address this, shape-based techniques introduced spatial constraints to better
delineate potentially explicit regions [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], yet these approaches still lacked semantic understanding and
generalization.
      </p>
      <p>
        To improve robustness, mid-level representations such as the Bag of Visual Words (BoVW) were
introduced, combining local feature descriptors with classifiers like SVMs for enhanced
discrimination [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In video settings, the inclusion of motion-based features—such as MPEG-4 motion vectors,
histograms of motion (MHIST), and periodicity detection (PER)—further enhanced detection accuracy,
as shown by Jansohn et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        A major leap occurred with the advent of deep learning, particularly Convolutional Neural Networks
(CNNs). AGNet, an ensemble of AlexNet and GoogLeNet, achieved 89.2% accuracy on the NPDI dataset
by aggregating predictions across frames [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. However, its lack of temporal modeling limited its
efectiveness in video contexts. To address this, Perez et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] extended GoogLeNet to incorporate
sequential motion features, improving F1-score by 4–5% over AGNet. Subsequent work emphasized
multi-task learning to enhance semantic richness. AttM-CNN, for instance, combined pornography
detection with age estimation using a dual-branch CNN based on ResNet and Inception architectures [
        <xref ref-type="bibr" rid="ref15">15,
16, 17</xref>
        ]. Trained on over two million images, the model reached 92.7% accuracy, outperforming forensic
tools like NuDetective by more than 20%.
      </p>
      <p>More recently, the focus has shifted toward computational eficiency and real-time deployment.
Mallmann et al. [18] introduced PPCensor, a CNN-based pipeline that reframes nudity detection as an
object detection task. By applying localized obfuscation to private body regions, the system allows for
granular moderation without discarding entire frames, while maintaining near real-time performance
on edge hardware.</p>
      <p>In parallel, transformer-based architectures have gained attention for their ability to capture global
context. He et al. [19] demonstrated that Vision Transformers (ViTs) significantly outperform traditional
CNNs such as ResNet in classifying sensitive content, thanks to their self-attention mechanisms.</p>
      <p>YOLO-based methods have also emerged as promising alternatives for adult content detection.
Typically, these systems follow a two-stage architecture: first detecting people or sensitive body parts
using YOLO, followed by a secondary classification network [ 20, 21]. While efective, this separation
introduces architectural complexity and additional inference latency.</p>
      <p>
        Our work departs from this paradigm by employing YOLO in a fully end-to-end manner. We train
the network directly to detect explicit regions without auxiliary classifiers, resulting in a single-stage
architecture that reduces latency and simplifies deployment—particularly in real-time applications.
Unlike prior video-based methods that apply naive frame-by-frame processing [
        <xref ref-type="bibr" rid="ref14">14, 18</xref>
        ], our system
focuses on static image analysis, leveraging YOLO’s speed and spatial precision to isolate nudity with
high fidelity. This provides a solid foundation for future extensions to multimodal, temporally aware
moderation systems in large-scale platforms.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. YOLO</title>
      <p>YOLO (You Only Look Once) is a unified, real-time approach to object detection proposed by Redmon
et al. (2016) [22], which reformulates the detection problem as a single regression task that directly
maps from image pixels to bounding box coordinates and class probabilities.</p>
      <p>The architecture of YOLO is based on a unified convolutional neural network that processes the entire
image in a single pass. The image is divided into a grid of size  × , where each cell is responsible
for detecting objects whose center falls within it. Each cell predicts  bounding boxes, each with a
confidence score that reflects both the probability of the presence of an object and the spatial accuracy of
the prediction, calculated using the Intersection Over Union (IoU) metric. In parallel, each cell provides
a single conditional probability distribution over the  classes, which is computed only if the cell
contains an object.</p>
      <p>YOLO was chosen for its:
• Speed – since it treats the problem as a regression task, it does not involve a complex pipeline;
• Contextualization ability – as it has a global view of the image during both training and testing;
• Generalization capability – as it learns generalized representations of objects.
3.1. YOLOv5
YOLOv5 incorporates the Cross Stage Partial Network (CSPNet) [23] into its backbone (Darknet). CSPNet
reduces redundant gradient information during training, thereby improving the model’s eficiency. It
splits the feature map into two flows: one is processed through a series of convolutional blocks, while
the other remains unchanged. In the end, the two flows are concatenated, reducing the overall number
of parameters and computational cost (in terms of FLOPs), without compromising performance.</p>
      <p>In the neck, the model adopts the Path Aggregation Network (PANet) [24], which enhances
information transmission between diferent levels of the network by adding a bottom-up path to the traditional
top-down structure of the Feature Pyramid Network (FPN). This enables better propagation of both
low- and high-resolution features, contributing to more accurate object localization.</p>
      <p>Finally, the head of the network consists of three convolutional layers. The activation functions used
are SiLU and Sigmoid: the former is applied in the hidden layers, while the latter is used in the output
layer. The model outputs three types of predictions: the classes of the detected objects, their bounding
boxes, and their objectness scores. The CIoU (Complete Intersection over Union) is used to compute
the location loss.
3.2. YOLO11
YOLO11 [25] represents a significant advancement of the YOLO framework. The main innovations
introduced in YOLO11 include:
• C3k2 Block: a more eficient variant of the classic CSP Bottleneck module. It uses two
convolutions with smaller kernels instead of a single larger one, reducing computational cost while
maintaining good performance. Its behavior can vary based on the c3k parameter, allowing for
deeper structures when needed.
• C2PSA Block: introduces a spatial attention mechanism that helps the model focus more
efectively on the most relevant areas of the image, improving detection accuracy, especially in
complex scenes or with small or partially occluded objects.
• CBS Blocks (Convolution-BatchNorm-SiLU): combine convolution, batch normalization, and
SiLU activation to enhance the quality of the extracted features, making the learning process
more stable and efective, and contributing to greater accuracy.</p>
      <p>With respect to traditional YOLO architecture, the innovations introduced are arranged as follows:
• Backbone: replacement of the C2f block with the more eficient C3k2, retention of the SPPF
block, and introduction of the new C2PSA to enhance spatial attention.
• Neck: use of the C3k2 block to improve speed and reduce computational complexity, along with
integration of the C2PSA block to increase the relevance of features, especially for
dificult-todetect objects.
• Head: combined use of C3k2 and CBS blocks to process feature maps and increase detection
accuracy. This section ends with 2D convolutional layers and the Detect module, which produces
the final output (bounding boxes, confidence scores, and classes). The behavior of the C3k2 block
is governed by the c3k parameter, which adjusts its internal structure.</p>
      <p>For both YOLO versions, the Ultralytics platform ofers model implementations in two configurations:
• one pre-trained on the COCO dataset;
• another initialized with randomly assigned weights.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setting</title>
      <p>The primary objective of our experimental evaluation is to identify the most suitable YOLO model
variant for end-to-end nudity detection in static images. To this end, we systematically investigate
how architectural variations within the YOLO family afect both detection accuracy and computational
eficiency.</p>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>For the development of our automatic human body part detection system, a dedicated dataset was
constructed to address the specific requirements of the task. The dataset was developed through an
iterative pipeline comprising repeated cycles of web-based image collection, manual annotation, and
empirical evaluation of model performance. Particular attention was given to enhancing dataset quality
and coverage through successive refinement steps, which included targeted augmentation of
underrepresented classes and exclusion of low-quality or ambiguous samples. This process allowed for the
progressive improvement of the dataset in terms of both class balance and semantic diversity.</p>
        <p>The final version of the dataset, employed for training the YOLOv5 and YOLOv11 object-detection
models, consists of 5 090 images annotated with 8 247 bounding boxes. While the total number of
samples remains relatively limited, the dataset reflects a considerable investment of time and manual
efort, and its composition was carefully curated to optimize the training process for the intended
detection task.</p>
        <p>The dataset includes annotations for the following ten classes, encompassing both anatomical features
and sexually explicit content: anus, breast, buttocks, penis, vagina, oral-sex, penetration, penetration
position, masturbation, porn.</p>
        <p>Annotations were performed using bounding boxes in accordance with a consistent labeling protocol
designed to ensure inter-annotator agreement and reduce noise in the training data. Class frequencies
were regularly monitored throughout the dataset-construction process, and specific measures were taken
to mitigate class imbalance and prevent model bias. The resulting dataset thus provides a task-specific
and well-structured foundation for the supervised training of explicit-content detection systems.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Model Variants</title>
        <p>The evaluation focuses on a comparative analysis of multiple YOLO architectures, emphasizing both
the well-established YOLOv5 family and the more recent YOLO11 series. The aim is to determine the
optimal model configuration that balances detection performance with computational eficiency for the
specific task of nudity detection.</p>
        <p>The following model configurations were evaluated:
• YOLOv5: YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x
• YOLO11: YOLO11n, YOLO11s, YOLO11m, YOLO11l, YOLO11x</p>
        <sec id="sec-4-2-1">
          <title>Each model was trained under two initialization strategies:</title>
          <p>• Pre-trained weights from the COCO dataset;
• Random weight initialization.</p>
          <p>Training was conducted for 100 epochs using the default hyperparameters provided by the respective
implementations. All experiments employed consistent data augmentation strategies and loss functions.
Input resolution and batch size were adapted per model to optimize GPU utilization while maintaining
experimental comparability.</p>
          <p>This setup allows for:
• Comparative analysis across lightweight, mid-sized, and high-capacity models.
• Identification of the YOLO variant ofering the best trade-of between detection performance and
computational eficiency.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Evaluation Metrics</title>
        <sec id="sec-4-3-1">
          <title>Performance was assessed using the following metrics:</title>
          <p>To ensure systematic monitoring and reproducibility, we utilized Weights &amp; Biases and Comet
throughout the training and evaluation phases. These tools enabled comprehensive tracking of:
• Precision, recall, and mAP over training steps;
• Learning curves and loss values;
• All relevant training hyperparameters.</p>
          <p>This experimental protocol supports a fair, reproducible, and well-documented comparison of YOLO
models of varying complexity under realistic deployment conditions.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>This section presents the comparative evaluation of YOLOv11 (Table 1) and YOLOv5 (Table 2)
architectures on the task of nudity detection, performed on a curated dataset of over 5,000 annotated images
across ten sensitive semantic classes. The goal was to assess the detection capabilities of each model
variant, considering both pretrained and randomly initialized configurations, and to identify the optimal
trade-of between accuracy and computational eficiency in view of real-time human-AI collaborative
applications.</p>
      <p>Among the YOLOv5 variants, YOLOv5x achieved the best results, with a mAP@0.5 of 0.367 and
mAP@0.5:0.95 of 0.215. Precision and recall were 0.412 and 0.440 respectively, reflecting a reasonably
balanced detection performance. Smaller configurations such as YOLOv5n and YOLOv5s showed a
significant drop in recall, limiting their applicability in critical moderation tasks. The inclusion of
pretrained weights yielded modest improvements across all sizes.
mAp50
mAP50- Precisio
95 n</p>
      <p>Recall</p>
      <p>In contrast, YOLO11 models consistently outperformed YOLOv5 in both accuracy and recall. The
YOLO11m configuration achieved the best results overall, with a mAP@0.5 of 0.438, mAP@0.5:0.95 of
0.243, and a recall of 0.516, outperforming all other configurations. Notably, models trained from scratch
showed a marked decrease in performance—e.g., YOLO11m without pretrained weights reached only
0.291 in mAP@0.5—highlighting the importance of transfer learning, particularly in domain-specific
visual tasks such as nudity detection.</p>
      <p>A direct comparison between YOLOv5x and YOLO11m, summarized in Table 3, demonstrates the
superior capability of YOLO11m, especially in detecting nuanced and sensitive content. These results
underscore the potential of YOLO11-based architectures to support automated moderation systems that
are both accurate and eficient.</p>
      <p>From a human-AI symbiosis perspective, high recall and precision rates are essential to ensure user
trust, system transparency, and ethical alignment. YOLO11m’s performance enhances the reliability
of AI-based moderators in identifying harmful visual content, minimizing both false positives and
false negatives. Furthermore, the adaptability shown by pretrained configurations supports future
personalization and domain transfer, critical for sensitive contexts such as healthcare, education, or
platform moderation.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Extensive empirical evaluation revealed that the Medium (M) and Large (L) configurations of the YOLO
architecture demonstrated the most favorable performance for human-body-part detection, particularly
when initialized with pre-trained weights. These configurations ofered an optimal compromise between
detection accuracy and computational eficiency, rendering them appropriate for deployment in practical
applications such as online-safety monitoring and content moderation.</p>
      <p>Nonetheless, as illustrated by the visual results, the models’ overall detection performance remained
suboptimal. Although the networks exhibited a capacity to identify relevant anatomical features, they
mAp50 mAP50-95 Precision</p>
      <p>Recall Train/ Train/ Train/ Val/box
box_lo cls_lo dfl_lo _loss
ss ss ss</p>
      <p>Val/cls_l Val/df
oss l_loss</p>
      <p>YOLOv5n
YOLOv5n
_nw
YOLOv5s
YOLOv5s
_nw
YOLOv5
m
YOLOv5
m_nw
YOLOv5l
YOLOv5x
_nw
YOLOv5l_ 0.34984
nw
YOLOv5x 0.36713
frequently encountered dificulties in achieving precise object localization and accurate delineation of
bounding boxes. Among all evaluated variants, the YOLOv11-M model with pre-configured weights
proved to be the most efective, yielding the highest precision scores. However, it still exhibited notable
shortcomings in terms of boundary accuracy and spatial consistency.</p>
      <p>These observations suggest that, while YOLO-based models hold promise for the task of body-part
detection, further enhancements—such as more meticulous annotation, the incorporation of additional
training samples, or architectural refinements—are required to improve localization precision and
overall detection robustness.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>This work was partially supported by the following projects: SERICS - ”Security and Rights In the
CyberSpace - SERICS” (PE00000014) under the MUR National Recovery and Resilience Plan funded
by the European Union - NextGenerationEU; Patto territoriale "Sistema universitario pugliese" – CUP
F61B23000370006; Accordo Quadro CrASte - “Cyber Academy for Security and Intelligence”.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <sec id="sec-8-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Barletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Caivano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Dimauro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mantini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morga</surname>
          </string-name>
          ,
          <article-title>Exploring artificial intelligence challenges for monitoring cyber child abuse</article-title>
          , volume
          <volume>3978</volume>
          ,
          <year>2025</year>
          . URL: https://www.scopus.com/inward/ record.uri?eid=
          <fpage>2</fpage>
          -
          <lpage>s2</lpage>
          .
          <fpage>0</fpage>
          -
          <lpage>105008760266</lpage>
          &amp;partnerID=
          <volume>40</volume>
          &amp;md5=
          <fpage>fed46dfcbf71cc59bd9344aec2c4f01b</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Willcox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B.</given-names>
            <surname>Rosenberg</surname>
          </string-name>
          ,
          <article-title>Symbiont ai and embodied symbiotic learning</article-title>
          ,
          <source>Proceedings of the Future Technologies Conference (FTC)</source>
          <year>2021</year>
          , Volume
          <volume>1</volume>
          (
          <year>2021</year>
          ). URL: https://api.semanticscholar. org/CorpusID:239802003.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Grigsby</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence for advanced human-machine symbiosis</article-title>
          ,
          <source>in: Interacción</source>
          ,
          <year>2018</year>
          . URL: https://api.semanticscholar.org/CorpusID:51612552.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>A human-object interaction detection method inspired by human body part information</article-title>
          ,
          <source>in: 2020 12th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>342</fpage>
          -
          <lpage>346</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICMTMA50254.
          <year>2020</year>
          .
          <volume>00082</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Leng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Aip-net: An anchor-free instance-level human part detection network</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>573</volume>
          (
          <year>2024</year>
          )
          <article-title>127254</article-title>
          . URL: https://www.sciencedirect.com/science/article/ pii/S0925231224000250. doi:https://doi.org/10.1016/j.neucom.
          <year>2024</year>
          .
          <volume>127254</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Antoniol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Battista</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Buono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Caivano</surname>
          </string-name>
          , G. Calvano, G. Campesi,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cascione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Curci</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. de Gemmis</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Gattulli</surname>
            ,
            <given-names>R. La</given-names>
          </string-name>
          <string-name>
            <surname>Scala</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Scardigno</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          <string-name>
            <surname>Sciacovelli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Senaldi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Sorianello</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Tamburrano</surname>
          </string-name>
          ,
          <article-title>Cyber social security (css): A lens on methods for extraction of social sensor data</article-title>
          , volume
          <volume>3978</volume>
          ,
          <year>2025</year>
          . URL: https://www.scopus.com/inward/record.uri?eid=
          <fpage>2</fpage>
          -
          <lpage>s2</lpage>
          .
          <fpage>0</fpage>
          -
          <lpage>105008758722</lpage>
          &amp; partnerID=
          <volume>40</volume>
          &amp;md5=
          <fpage>f6717d25f68d5394e464db890b6ad62</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Barletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Caivano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Calvano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Curci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piccinno</surname>
          </string-name>
          , Craste:
          <article-title>Human factors and perception in cybersecurity education</article-title>
          , volume
          <volume>3713</volume>
          ,
          <year>2024</year>
          , p.
          <fpage>75</fpage>
          -
          <lpage>81</lpage>
          . URL: https://www.scopus.com/inward/ record.uri?eid=
          <fpage>2</fpage>
          -
          <lpage>s2</lpage>
          .
          <fpage>0</fpage>
          -
          <lpage>85198753881</lpage>
          &amp;partnerID=
          <volume>40</volume>
          &amp;md5=
          <fpage>35f9b858e583d214bb7a53c0a7dbf0da</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Baldassarre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Barletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bavaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Caivano</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. P. De Matteis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lippolis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Piccinno</surname>
          </string-name>
          ,
          <article-title>Llms to detect cyber child abuse in the in textual conversations</article-title>
          , volume
          <volume>3978</volume>
          ,
          <year>2025</year>
          . URL: https://www.scopus.com/inward/record.uri?eid=
          <fpage>2</fpage>
          -
          <lpage>s2</lpage>
          .
          <fpage>0</fpage>
          -
          <lpage>105008757382</lpage>
          &amp;partnerID=
          <volume>40</volume>
          &amp;md5=
          <fpage>91bfb48c91b5043174e33d19f2ed45dd</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gevers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Smeulders</surname>
          </string-name>
          ,
          <article-title>Color-based object recognition</article-title>
          ,
          <source>Pattern Recognition</source>
          <volume>32</volume>
          (
          <year>1999</year>
          )
          <fpage>453</fpage>
          -
          <lpage>464</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S0031320398000363. doi:https: //doi.org/10.1016/S0031-
          <volume>3203</volume>
          (
          <issue>98</issue>
          )
          <fpage>00036</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Q.-F.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , W. Zeng, G. Wen,
          <string-name>
            <given-names>W.-Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Shape-based adult images detection</article-title>
          ,
          <source>in: Third International Conference on Image and Graphics (ICIG'04)</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>150</fpage>
          -
          <lpage>153</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICIG.
          <year>2004</year>
          .
          <volume>128</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Deselaers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pimenidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ney</surname>
          </string-name>
          ,
          <article-title>Bag-of-visual-words models for adult image classification and filtering</article-title>
          ,
          <source>in: 2008 19th International Conference on Pattern Recognition</source>
          , IEEE,
          <year>2008</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICPR.
          <year>2008</year>
          .
          <volume>4761366</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Jansohn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ulges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Breuel</surname>
          </string-name>
          ,
          <article-title>Detecting pornographic video content by combining image features with motion information</article-title>
          ,
          <source>in: Proceedings of the 17th ACM International Conference on Multimedia, MM '09</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2009</year>
          , p.
          <fpage>601</fpage>
          -
          <lpage>604</lpage>
          . URL: https://doi.org/10.1145/1631272.1631366. doi:
          <volume>10</volume>
          .1145/1631272.1631366.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Moustafa</surname>
          </string-name>
          ,
          <article-title>Applying deep learning to classify pornographic images and videos, 2015</article-title>
          . URL: https://arxiv.org/abs/1511.08899. arXiv:
          <volume>1511</volume>
          .
          <fpage>08899</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Avila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moraes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Testoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Valle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goldenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <article-title>Video pornography detection through deep learning techniques and motion information</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>230</volume>
          (
          <year>2017</year>
          )
          <fpage>279</fpage>
          -
          <lpage>293</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/ S0925231216314928. doi:https://doi.org/10.1016/j.neucom.
          <year>2016</year>
          .
          <volume>12</volume>
          .017.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>González-Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Alegre</surname>
          </string-name>
          , E. Fidalgo,
          <article-title>Attm-cnn: Attention and metric learning based cnn for pornography, age and child sexual abuse (csa) detection in images</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>445</volume>
          (
          <year>2021</year>
          )
          <fpage>81</fpage>
          -
          <lpage>104</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S092523122100312X. doi:https://doi.org/10.1016/j.neucom.
          <year>2021</year>
          .
          <volume>02</volume>
          .056.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>