<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>DaQuaMRec</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Inside the Frame: A Plan for Audio-Visual Feature Analysis of Video Recommendations for Children</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Erasmo Purificato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>European Commission, Joint Research Centre (JRC)</institution>
          ,
          <addr-line>Ispra</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>1</volume>
      <fpage>22</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>Algorithmic recommendation systems are increasingly shaping children's digital consumption, but there is limited understanding of how audio-visual features impact the visibility and popularity of videos aimed at young audiences. Regulatory frameworks, such as the EU Digital Services Act, demand greater transparency and accountability, particularly regarding content targeted at minors. However, current systems often overlook the influence of content design on engagement. In this position paper, we propose a research agenda to systematically analyze interpretable visual and audio features, such as color vividness, motion intensity, vocal dynamics, and musicality. By linking these elements to engagement outcomes, we aim to discover consistent patterns that can inform the design of child-sensitive recommendation systems, algorithmic audits, and compliance with policies, as well as establish a foundation for more accountable algorithmic media environments for children.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Video recommendation</kwd>
        <kwd>Audio-visual feature extraction</kwd>
        <kwd>Algorithmic transparency</kwd>
        <kwd>Children engagement</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation and Literature Gaps</title>
      <p>
        Recommendation systems play a crucial role in shaping the digital media consumption of millions of
users, particularly children, who are increasingly exposed to algorithmically curated content on platforms
such as YouTube [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. While traditional approaches to recommendation rely heavily on behavioral signals
(e.g., click-through rate, watch time) [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ], recent attention has turned toward incorporating
contentlevel features to enhance both recommendation quality [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and fairness [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In parallel, concerns from
policymakers and researchers have grown regarding the excessive engagement of children with digital
platforms, with emerging regulations, such as the EU Digital Services Act (DSA) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the related
recently-published guidelines on the protection of minors1, placing new obligations on platforms to
limit manipulative design and addictive recommendation patterns, especially when targeting minors.
Despite this, little is known about the role of low-level audio-visual features (e.g., color saturation,
motion patterns, pitch dynamics) in driving engagement and influencing recommendation outcomes . In
practice, such features are often disregarded in the design or audit of recommendation pipelines, despite
their potentially significant influence on viewer retention and algorithmic amplification.
      </p>
      <p>To address this gap, we propose a new research perspective that systematically analyzes the role of
intrinsic audio-visual features in shaping the popularity and virality of children’s videos. We argue
that understanding how low-level multimedia signals correlate with engagement is significant to ensure
transparent, fair, and child-appropriate recommendation systems. Our position is that such content
features, despite their potential psychological and perceptual impact, are critically understudied in both
academic research and regulatory scrutiny. Our perspective requires the development of a multimodal
analysis pipeline that can extract interpretable audio-visual descriptors from full-length video content.
By investigating how these features vary across high-visibility and low-visibility content aimed at
children, future research can uncover methodical engagement patterns and evaluate their alignment
with child protection principles and legal obligations under frameworks such as the DSA.</p>
      <p>
        Despite interest in video content analysis, especially for child safety and moderation, gaps remain in
understanding how intrinsic audio-visual features influence the visibility and popularity of children’s
videos on platforms like YouTube. Most existing research focuses on high-level content classification
or metadata-driven moderation. Deep learning (DL) models are commonly used to detect
inappropriate content [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
        ], while popularity is often predicted using metadata such as views, likes, comments,
emotional valence, or linguistic style [
        <xref ref-type="bibr" rid="ref11 ref12 ref8">8, 11, 12, 13</xref>
        ]. However, these approaches overlook how low-level
video attributes, such as colour saturation, motion patterns, or pitch dynamics, contribute to engagement
or recommendation outcomes. Although general video mining includes broad measures, such as visual
variation or content richness [14], a systematic analysis of these features in the context of children’s
media is still missing. Studies on children’s preferences tend to focus on static elements (e.g., book
cover characteristics [15, 16] or music genres [17]) for personalising recommender systems, without
addressing how dynamic video features afect algorithmic amplification. Evaluation frameworks for
children’s YouTube content often emphasize educational or design quality [18, 19, 20], but not the role
of intrinsic audio-visual traits in afecting appeal. The “ Elsagate” phenomenon [21, 22, 23] exposed the
limitations of metadata filtering, demonstrating the need for deeper analysis of the multimedia attributes
that drive engagement, irrespective of content appropriateness. Moreover, many high-performing
classiifcation models ofer limited interpretability. Tools like class activation maps highlight relevant image
regions but do not explain how specific audio-visual features influence viewer behavior [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Research
on video engagement in educational settings (e.g., comparing lecture capture to infographic videos)
reveals the impact of visual dynamics [24]. However, these findings do not generalize to children’s
entertainment content. Crowdsourcing is used for content moderation, allowing human judgment to
identify and segment inappropriate content [25]; yet, this relies on human perception of appropriateness,
lacking interpretability into which multimedia elements propel video virality.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Planned Methodology</title>
      <p>We propose to systematically extract low-level audio-visual features from children’s videos to uncover
correlations between content characteristics and their popularity or recommendation exposure. We
argue that formalizing the influence of content-level attributes can provide valuable insights for more
fair and transparent recommenders. In practice, in the initial iteration, we plan to analyze the top-25
most-viewed YouTube videos for children compared with the overall top-25 most-viewed videos.</p>
      <p>On the visual side, we plan to compute global color statistics (saturation, brightness, contrast) to
capture the overall vividness and visual salience of each video, factors previously shown to afect
children’s attention [26]. Texture descriptors, such as local binary patterns, will be used to distinguish
visually detailed scenes [27]. Motion characteristics will be quantified using the dense optical flow
metric [28], which captures the proportion of highly dynamic frames, an indicator of editing pace and
visual stimulation. We also intend to include features that reflect social and narrative structure, such
as face presence and wave-ratio, to approximate the visibility of on-screen presenters or characters [29].
To identify diferences in animation style , we compute depth statistics and assess consistency across
frames to infer whether the video adopts a flat 2-D layout or 3-D CGI production [30]. To complement
these handcrafted features, we plan to extract vector embeddings from pre-trained deep video models
such as VideoMAE [31] and SlowFast [32], which can summarize spatial and temporal patterns to
describe the video’s overall visual style and movement patterns.</p>
      <p>On the audio side, we plan to extract rhythmic and harmonic descriptors, such as spectral contrast,
melody, and tempo, to distinguish between speech, music, and song-based content [33, 34].
Additional features, including short-term energy, zero-crossing rate, spectral centroid and roll-of, harmonic
ratio, pitch, and silence ratio, will serve as proxies for vocal intensity and excitement level, which
have been linked to children’s emotional arousal [35].</p>
      <p>The extracted features will be used in comparative analyses between content explicitly targeted at
children and general-audience videos to measure whether children’s videos exhibit unique design
signatures. In conclusion, this planned methodology seeks to inform both the auditing of content-driven
engagement mechanisms and the development of recommender systems that are more accountable and
appropriate for child audiences.</p>
    </sec>
    <sec id="sec-3">
      <title>Declaration on Generative AI</title>
      <p>The author have not employed any Generative AI tools.
Engagement From Features of Time- and Value-Continuous, Dimensional Emotions, Frontiers in
Computer Science 4 (2022).
[13] A. C. Munaro, R. H. Barcelos, E. C. F. Mafezzolli, J. P. S. Rodrigues, E. C. Paraiso, To engage or
not engage? The features of video content on YouTube afecting digital consumer engagement,
Journal of Consumer Behaviour 20 (2021) 1336–1352. doi:10.1002/cb.1939.
[14] X. Li, M. Shi, X. S. Wang, Video mining: Measuring visual information using automatic methods,
International Journal of Research in Marketing 36 (2019) 216–231. doi:10.1016/j.ijresmar.
2019.02.004.
[15] Y. Beyhan, M. S. Pera, Covering Covers: Characterization Of Visual Elements Regarding Sleeves,
in: Adjunct Proceedings of the 31st ACM Conference on User Modeling, Adaptation and
Personalization, UMAP ’23 Adjunct, Association for Computing Machinery, New York, NY, USA, 2023, pp.
28–33. doi:10.1145/3563359.3597404.
[16] A. Milton, L. Batista, G. Allen, S. Gao, Y.-K. D. Ng, M. S. Pera, “Don’t Judge a Book by its
Cover”: Exploring Book Traits Children Favor, in: Proceedings of the 14th ACM Conference on
Recommender Systems, RecSys ’20, Association for Computing Machinery, New York, NY, USA,
2020, pp. 669–674. doi:10.1145/3383313.3418490.
[17] L. Spear, A. Milton, G. Allen, A. Raj, M. Green, M. D. Ekstrand, M. S. Pera, Baby Shark to Barracuda:
Analyzing Children’s Music Listening Behavior, in: Proceedings of the 15th ACM Conference on
Recommender Systems, RecSys ’21, Association for Computing Machinery, New York, NY, USA,
2021, pp. 639–644. doi:10.1145/3460231.3478856.
[18] M. M. Neumann, C. Herodotou, Evaluating YouTube videos for young children, Education and</p>
      <p>Information Technologies 25 (2020) 4459–4475. doi:10.1007/s10639-020-10183-7.
[19] D. Poveda, M. Matsumoto, E. Sundin, H. Sandberg, C. Aliagas, J. Gillen, Space and practices:
Engagement of children under 3 with tablets and televisions in homes in Spain, Sweden and England,
Journal of Early Childhood Literacy 20 (2020) 500–523. doi:10.1177/1468798420923715.
[20] J. Zhang, Y. Huang, M. Gao, Video Features, Engagement, and Patterns of Collective Attention</p>
      <p>Allocation: An Open Flow Network Perspective, Journal of Learning Analytics 9 (2022) 32–52.
[21] J. Balanzategui, Examining the "Elsagate" Phenomenon: Disturbing Children’s YouTube Content
and New Frontiers in Children’s Culture, AoIR Selected Papers of Internet Research (2019).
doi:10.5210/spir.v2019i0.10921.
[22] W. Han, M. Ansingkar, Discovery of Elsagate: Detection of Sparse Inappropriate Content from
Kids Videos, in: 2020 Zooming Innovation in Consumer Technologies Conference (ZINC), 2020,
pp. 46–47. doi:10.1109/ZINC50678.2020.9161808.
[23] P. Soustas, M. Edwards, The Elsagate Corpus: Characterising Commentary on Alarming Video
Content, in: R. Mitkov, S. Ezzini, T. Ranasinghe, I. Ezeani, N. Khallaf, C. Acarturk, M. Bradbury,
M. El-Haj, P. Rayson (Eds.), Proceedings of the First International Conference on Natural Language
Processing and Artificial Intelligence for Cyber Security, International Conference on Natural
Language Processing and Artificial Intelligence for Cyber Security, Lancaster, UK, 2024, pp. 147–
152.
[24] S. Lackmann, P.-M. Léger, P. Charland, C. Aubé, J. Talbot, The Influence of Video Format on
Engagement and Performance in Online Learning, Brain Sciences 11 (2021) 128. doi:10.3390/
brainsci11020128.
[25] S. K. Mridha, B. Sarkar, S. Chatterjee, M. Bhattacharyya, ViSSa: Recognizing the appropriateness
of videos on social media with on-demand crowdsourcing, Information Processing &amp; Management
57 (2020) 102189. doi:10.1016/j.ipm.2019.102189.
[26] D. R. Anderson, H. L. Kirkorian, Attention and Television, in: Psychology of Entertainment,</p>
      <p>Routledge, 2006.
[27] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture
classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine
Intelligence 24 (2002) 971–987. doi:10.1109/TPAMI.2002.1017623.
[28] G. Farnebäck, Two-Frame Motion Estimation Based on Polynomial Expansion, in: J. Bigun,
T. Gustavsson (Eds.), Image Analysis, Springer, Berlin, Heidelberg, 2003, pp. 363–370. doi:10.
1007/3-540-45103-X_50.
[29] R. F. Kizilcec, J. N. Bailenson, C. J. Gomez, The instructor’s face in video instruction: Evidence
from two large-scale field studies, Journal of Educational Psychology 107 (2015) 724–739. doi: 10.
1037/edu0000013.
[30] R. Ranftl, A. Bochkovskiy, V. Koltun, Vision Transformers for Dense Prediction, in: Proceedings
of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12179–12188.
[31] Z. Tong, Y. Song, J. Wang, L. Wang, VideoMAE: Masked Autoencoders are Data-Eficient Learners
for Self-Supervised Video Pre-Training, Advances in Neural Information Processing Systems 35
(2022) 10078–10093.
[32] C. Feichtenhofer, H. Fan, J. Malik, K. He, SlowFast Networks for Video Recognition, in: Proceedings
of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6202–6211.
[33] G. Tzanetakis, P. Cook, Musical genre classification of audio signals, IEEE Transactions on Speech
and Audio Processing 10 (2002) 293–302. doi:10.1109/TSA.2002.800560.
[34] J. Salamon, E. Gomez, Melody Extraction From Polyphonic Music Signals Using Pitch Contour
Characteristics, IEEE Transactions on Audio, Speech, and Language Processing 20 (2012) 1759–1770.
doi:10.1109/TASL.2012.2188515.
[35] H. E. Kragness, M. J. Eitel, A. M. Baksh, L. J. Trainor, Evidence for early arousal-based diferentiation
of emotions in children’s musical production, Developmental Science 24 (2021) e12982. doi:10.
1111/desc.12982.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Radesky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bridgewater</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. O'Neil</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Schaller</surname>
            ,
            <given-names>H. M.</given-names>
          </string-name>
          <string-name>
            <surname>Weeks</surname>
          </string-name>
          , S. W. Campbell,
          <article-title>Algorithmic Content Recommendations on a Video-Sharing Platform Used by Children</article-title>
          ,
          <source>JAMA Network Open</source>
          <volume>7</volume>
          (
          <year>2024</year>
          )
          <article-title>e2413855</article-title>
          . doi:
          <volume>10</volume>
          .1001/jamanetworkopen.
          <year>2024</year>
          .
          <volume>13855</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <article-title>Overview of Content-Based Click-Through Rate Prediction Challenge for Video Recommendation</article-title>
          ,
          <source>in: Proceedings of the 27th ACM International Conference on Multimedia, MM '19</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , pp.
          <fpage>2593</fpage>
          -
          <lpage>2596</lpage>
          . doi:
          <volume>10</volume>
          .1145/3343031.3356085.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>DVR</surname>
          </string-name>
          :
          <article-title>Micro-Video Recommendation Optimizing Watch-Time-Gain under Duration Bias</article-title>
          ,
          <source>in: Proceedings of the 30th ACM International Conference on Multimedia, MM '22</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , pp.
          <fpage>334</fpage>
          -
          <lpage>345</lpage>
          . doi:
          <volume>10</volume>
          .1145/3503161.3548428.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Xu,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Uncovering User Interest from Biased and Noised Watch Time in Video Recommendation</article-title>
          ,
          <source>in: Proceedings of the 17th ACM Conference on Recommender Systems</source>
          , RecSys '23,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , pp.
          <fpage>528</fpage>
          -
          <lpage>539</lpage>
          . doi:
          <volume>10</volume>
          .1145/3604915.3608797.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Eghbal-Zadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          ,
          <article-title>Audio-visual encoding of multimedia content for enhancing movie recommendations</article-title>
          ,
          <source>in: Proceedings of the 12th ACM Conference on Recommender Systems</source>
          , RecSys '18,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2018</year>
          , pp.
          <fpage>455</fpage>
          -
          <lpage>459</lpage>
          . doi:
          <volume>10</volume>
          .1145/3240323.3240407.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Causality-Inspired Fair Representation Learning for Multimodal Recommendation</article-title>
          ,
          <source>ACM Transactions on Information Systems</source>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .1145/3744240.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>European</given-names>
            <surname>Union</surname>
          </string-name>
          (EU),
          <source>Regulation (EU)</source>
          <year>2022</year>
          /
          <article-title>2065 of the European Parliament and of the Council of 19 October 2022 on a Single Market For Digital Services</article-title>
          and
          <source>amending Directive</source>
          <year>2000</year>
          /31/EC (Digital Services Act),
          <year>2022</year>
          . URL: https://eur-lex.europa.eu/eli/reg/2022/2065/oj/eng.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yousaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nawaz</surname>
          </string-name>
          ,
          <article-title>A Deep Learning-Based Approach for Inappropriate Content Detection and Classification of YouTube Videos</article-title>
          ,
          <source>IEEE Access 10</source>
          (
          <year>2022</year>
          )
          <fpage>16283</fpage>
          -
          <lpage>16298</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2022</year>
          .
          <volume>3147519</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tahir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zafar</surname>
          </string-name>
          , C. Wilson,
          <article-title>Bringing the kid back into YouTube kids: Detecting inappropriate content on video streaming platforms</article-title>
          ,
          <source>in: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</source>
          , ASONAM '19,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , pp.
          <fpage>464</fpage>
          -
          <lpage>469</lpage>
          . doi:
          <volume>10</volume>
          .1145/3341161.3342913.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>El Bakri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yehia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Osama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Adel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Gamal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Saleh</surname>
          </string-name>
          , The Eye:
          <article-title>An AI-Powered Video Streaming Platform to Protect Children from Inappropriate Content</article-title>
          ,
          <source>in: 2024 6th Novel Intelligent and Leading Emerging Sciences Conference (NILES)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>529</fpage>
          -
          <lpage>532</lpage>
          . doi:
          <volume>10</volume>
          .1109/ NILES63360.
          <year>2024</year>
          .
          <volume>10753227</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Rizoiu</surname>
          </string-name>
          , L. Xie, Beyond Views:
          <article-title>Measuring and Predicting Engagement in Online Videos</article-title>
          ,
          <source>Proceedings of the International AAAI Conference on Web and Social Media</source>
          <volume>12</volume>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .1609/icwsm.v12i1.
          <fpage>15031</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Stappen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lienhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bätz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schuller</surname>
          </string-name>
          , An Estimation of Online Video User
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>