<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>through Deep Learning Tools: A Case Study on Rudolf Nureyev</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Silvia Garzarella</string-name>
          <email>silvia.garzarella3@unibo.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo Stacchio</string-name>
          <email>lorenzo.stacchio@unimc.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasquale Cascarano</string-name>
          <email>pasquale.cascarano2@unibo.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Allegra De Filippo</string-name>
          <email>allegra.defilippo@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Cervellati</string-name>
          <email>elena.cervellati@unibo.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gustavo Marfia</string-name>
          <email>gustavo.marfia@unibo.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Artificial Intelligence, Data Labeling, Deep Learning, Cultural Heritage, Dance</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Political Sciences, Communication and International Relations, University of Macerata</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of the Arts, University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The cultural heritage of theatrical dance involves diverse sources requiring complex multi-modal approaches. Since manual analysis methods are labor-intensive and so limited to few data samples, we here discuss the use of the DanXe framework, which combines diferent AI paradigms for comprehensive dance material analysis and visualization. However, DanXe lacks models and datasets specific to dance domains. To address this, we propose a human-in-the-loop (HITL) extension to the DanXe to accelerate multi-modal data labeling through semi-automatic, high-quality data labeling. This approach aims to create detailed datasets providing humans with a set of user-friendly and efective tools for advancing multi-modal dance analysis and optimizing AI methodologies for dance heritage documentation. To this date, we designed a novel middleware that allows us to adapt data generated from visual Deep Learning (DL) models within DanXe to visual annotation tools, to empower domain experts with a user-friendly tool to preserve all the components included in the choreographic creation, enriching the process of metadata creation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The cultural heritage of theatrical dance consists of a multitude of sources, both tangible and
intangible. These sources are diverse by nature and type, location, and preservation methods,
creating a complex constellation that requires a diverse set of skills to be efectively enhanced [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Acknowledging this complexity is inherently tied to a comprehensive and integrated analysis
of theory and practice, with significant implications in terms of accessibility [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Considering in particular choreography, while historiographical approaches are essential for working
with written documentation, thorough analysis requires an understanding that often involves
observing execution techniques [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>International Workshop on Artificial Intelligence and Creativity (CREAI), co-located with ECAI 2024
†These authors contributed equally.</p>
      <p>
        Due to this challenge, we have attempted to envision a framework that allows for an integrated
approach to theatrical dance’s documentary assets, using the artistic and cultural legacy of
dancer Rudolf Nureyev (1938-1993) as a case study [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The decision to focus on Nureyev
stemmed from the distinctive nature of the documentary heritage associated with him. He
was one of the first dancers to experience extensive and varied media coverage, given the
period during which his career developed (the 1960s and 1980s). This widespread mediatization,
during a transformative era for both dance and media, underscores the unique and multifaceted
nature of his legacy, making him a pivotal example for the dance domain. However, it is worth
noticing that in this case, as in others like it, the large amount of data available (e.g., dance
videos, playbills, or biographical documents) and their international distribution often lead
dance experts to apply multi-modal analysis to a limited number of samples. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Such approaches exhibit three main limits: (a) even being an expert, the process of analyzing
such data by hand is time-consuming; (b) the outcomes of such analyses would be hard to
organize and visualize in an efective way (e.g., discover correlations); (c) it would prevent
discovering semantical knowledge that could be only found by adopting a multi-modal analytical
approach on a vast amount of data [
        <xref ref-type="bibr" rid="ref10 ref7 ref8 ref9">8, 7, 9, 10</xref>
        ]. To face all such challenges, Computational
Dance (COMD) paradigms amount to a possible solution. However, COMD is underserved by
comprehensive datasets, limiting the potential for in-depth research and development [
        <xref ref-type="bibr" rid="ref11 ref12 ref7">7, 11, 12</xref>
        ].
This lack is even greater when considering multi-modal dance datasets: the majority of datasets
were collected for uni-modal analysis, in particular for the choreographic one [
        <xref ref-type="bibr" rid="ref11 ref12 ref7">7, 11, 12</xref>
        ].
Such datasets would be fundamental to optimizing AI methodologies capable of automatically
extracting knowledge and labels from dance digital material [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        In such a line, a recent work introduced a unified multi-modal analysis tool, DanXe, an
Extended Artificial Intelligence framework that blends (i) AI algorithms for digitization and
automated analysis of both tangible and intangible materials, with the goal of crafting a digital
replica of dance cultural heritage, and (ii) XR solutions for immersive visualization of the
derived insights. This framework introduces a novel space for the concurrent analysis of all
elements that define the essence of dance. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. For the here considered use case, the AI
analysis module of DanXe can be efectively used to extract knowledge for diferent kinds
of dance heritage materials, since it employs diferent Deep Learning (DL)-based models to
examine dance heritage materials, ranging from textual, audio, visual, and 3D data, providing
a foundational framework for multi-modal dance analysis. However, such a framework does
not resort to models specifically designed for the dance arena, for domains diferent from
choreography, exhibiting again a lack of models and datasets.
      </p>
      <p>
        Tools like DanXe can be employed to digitalize dance heritage and at the same time accelerate
the labeling of multi-modal dance data, which can be used to train multi-modal models, that
can be employed to improve heritage preservation and analysis. Nevertheless, the integration
of human experts is required to ensure the quality of generated data and provide novel and
connected knowledge to those. For this reason, we here propose an extension of the DanXe
framework to inject a human-in-the-loop (HITL) component that leverages the initial AI-inferred
annotations as a foundation, enabling a semi-automatic approach to provide high-quality labels.
This approach aims to facilitate the creation of richer and more accurate datasets to support the
optimization of future heritage preservation models. We here contextualize such an approach
for a multi-modal dance data annotation process, considering the specific case of choreography,
where there is a lack of datasets that capture fine-grained labels of specific dance moves, often
focusing on the general style [
        <xref ref-type="bibr" rid="ref11 ref13 ref14">13, 14, 11</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Materials and Methods</title>
      <p>We here provide a detailed overview of the materials and methodologies employed in our
study. We begin with the Video Dataset subsection, which describes the collection and
characteristics of the video data used for analysis. Following this, the AI Augmented Human
Annotator subsection outlines the HITL approach that leverages an AI Dance toolbox to
enhance human annotation eficiency and accuracy. Finally, the Visual Annotation Tool
Integration Middleware subsection discusses the middleware designed and implemented
to seamlessly integrate the synthetic AI-generated annotations in a visual annotation tool,
facilitating a cohesive and streamlined workflow for annotation domain experts. Each subsection
aims to elucidate the integral components and techniques critical to our research process.</p>
      <sec id="sec-3-1">
        <title>2.1. Video dataset</title>
        <p>The dataset was created using materials from the case study, which were originally recorded on
iflm, distributed in cinemas and on VHS, and later digitized. The original recording format often
sufered from wear and tear (e.g., film damage, darkening). Additionally, the original intended
use, designed for cinemas or home video viewing, included video direction elements such as
close-ups, zooms, and fade-ins/outs. These elements often cover movements and are not ideal
for a comprehensive recording of the performance. The process of selecting a video for building
the initial basic dataset was therefore inevitably influenced by the need for well-lit footage, the
highest possible definition, and minimal directorial interventions. To further reduce noise (e.g.
background dancers, extras), it was decided to analyze a solo performance: Nureyev’s adagio of
Prince Siegfried in Swan Lake (Act I). In the analysis of this adagio, we’ve focused on the initial
20 seconds of choreography. Here, the dancer transitions from a static pose, embodying their
character without movement, to a sequence of steps performed in place. These steps showcase
a range of volumes and heights, adding depth and dimension to the performance.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. AI Augmented Human Annotator</title>
        <p>
          Considering our main use case, choreographic-related data, various labels, and information can
be inferred, including music, dance styles, individual dance moves, and background descriptions.
Some of this information could be inferred with a high degree of accuracy by modern DL
approaches, like the ones introduced in the DanXe platform [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Despite this rich potential,
it remains challenging for human experts to adjust, enhance, and integrate novel labels or
information clearly and visually on top of this generated data. On this line, visual annotation
tools (VATs) could be exploited [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. In fact, the primary advantage of VATs is their ability to
significantly reduce the manual efort required from users, even those who are non-experts. By
incorporating various functionalities for manual, semi-automatic, and automatic annotations
through advanced AI algorithms, VATs could accelerate high-quality data labeling [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], given
also by the natural quantitative and qualitative approach introduced in such a process [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
Given such consideration, we employed the DanXe visual annotation module as a black box
capable of inferring diferent relevant data for the dance visual domain, such as textual data
within pictures, human pose estimation, and semantic segmentation and defining a novel
visual-annotation-based framework on top of it. This is visually represented in Figure 1.
        </p>
        <p>This annotation layer assumes that all synthesized AI label data are stored in a local database
after their inference. A data adaptation middleware ingests and transforms the various data
formats inferred by diferent AI models, ensuring compatibility with the visual annotation
tool at hand. This setup enables human annotators to use the tool to correct and add new
labels on top of the existing information. Subsequently, the updated annotations are re-adapted
and stored in the database, following the inverse chain of processes. This iterative approach
facilitates the eficient enhancement of dance video annotations, leveraging both AI and human
expertise. To implement such a framework, a fundamental step amounts to defining a smart
middleware able to bridge diferent file formats and data structures coming from AI models
and make them interpretable from diferent visual annotation tools. For this reason, in the
following, we will describe the general architecture of the middleware we defined to ingest and
adapt annotations coming from diferent AI tools to visual annotation tools.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Visual Annotation Tool Integration Middleware</title>
        <p>In response to the growing complexity of data formats and interpretation regarding diferent
tasks (e.g. Human Pose Estimation), a middleware solution has been developed to foster
interoperability between diverse AI models and various visual annotation tools. This middleware
serves as a bridge, facilitating seamless communication and data exchange between diferent
components of the annotation pipeline. Its architecture is reported in Figure 2. Implementing
standardized interfaces and protocols, enables the integration of multiple deep learning models,
each specializing in diferent aspects of visual analysis, such as pose estimation or object
detection. Simultaneously, the middleware performs a conversion of the ingested data respecting
a range of diferent visual annotation tools interfaces, providing a unified platform for annotators
to interact with and refine the output of these models. Through this interoperability, the
annotation process is augmented, ofering annotators the flexibility to leverage the strengths
of diferent models while at the same time having user-friendly interfaces. Moreover, by
automating certain aspects of annotation and providing semi-automatic functionalities, the
middleware accelerates the annotation workflow, significantly reducing the time and efort
required to generate high-quality annotated datasets.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Results</title>
      <p>We concretely applied our introduced methodology to accelerate single dance moves annotation
from a multi-modal perspective (i.e., linking human pose estimation and single dance moves). To
the best of our knowledge, this is the first attempt to do so through a custom-defined middleware
and semi-automatic approach.</p>
      <p>In particular, we took as a use case choreographical human pose estimation by using the
AlphaPose models 1 that were included in the DanXe pipeline. AlphaPose allows to extract and
1https://github.com/MVIG-SJTU/AlphaPose
track multi-person poses, codified into 17 body key points when used in the model trained on
the COCO dataset [17]. In our case, we applied it for a variation from Swan Lake performed
by Rudolf Nureyev in 1967. The human pose estimation extracted from AlphaPose is stored
in a JSON file which contains one record per each frame where a person was detected. Some
visual representations of the inferred key points are reported in Figure 3 while an example of
the resulting JSON file is provided in Listing 1.</p>
      <p>Listing 1: Human pose estimation JSON data generated by AlphaPose on a single image.
1 {
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22</p>
      <p>In this example, there was only one person identified (category ID 1), indicating the human
ID within the considered frame. The key points array contains precise x and y coordinates along
with confidence scores for various body joints, exemplified by the first key point positioned at
(311.995, 307.967). We do not include the confidence score provided for each key point inferred
for description simplicity. As mentioned, those are 17 key points, corresponding to the nose,
eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles. Each key point serves as a precise
indicator of a specific body part’s location within the image frame. The overall confidence in
the pose estimation is quantified by a score of 3.001. Also, information related to the bounding
box enclosing the detected human figure is inferred but was not reported for simplicity.</p>
      <p>Starting from this representation, we then considered adapting it for our target visual
annotation tool Vidat 2, which could be exploited. Vidat is a high-quality video annotation tool
for computer vision and machine learning applications that is simple and eficient to use for
2https://github.com/anucvml/vidat
a non-expert and supports multiple annotation types including temporal segments, object
bounding boxes, semantic and instance regions, and human pose (skeleton). Moreover, it is
completely data-driven: all the data can be stored and loaded by encoding them in a predefined
key-value structure (i.e., a JSON file). Our goal was to load the annotated data from AlphaPose
in a format readable by Vidat. However, the Vidat skeleton structure description does not take
into account the elbow data. This means that we first filtered out the data per each detection
and then re-adapt the remaining information to match the reading structure of the Vidat tool.
The resulting JSON is reported in Listing 2.</p>
      <p>Listing 2: JSON representation of video annotations and configurations.
1 {
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
...,
"objectAnnotationListMap": {},
"regionAnnotationListMap": {},
"actionAnnotationList": [],
"skeletonAnnotationListMap": {
"0": [{ ...,
"pointList": [
{ "id": 0, "name": "nose",</p>
      <p>"x": 312.0, "y": 308.0},
{ "id": 1, "name": "left eye",</p>
      <p>"x": 312.0, "y": 304.0},
{ "id": 2, "name": "right eye",</p>
      <p>"x": 315.0, "y": 305.0},
...</p>
      <p>], "centerX": 315.67, "centerY": 346.13}
]
},
"config": {
"objectLabelData": [...],
"actionLabelData": [...],
"skeletonTypeData": [...]
}}}</p>
      <p>The provided JSON encapsulates metadata crucial for video annotation and analysis. Within
its structure, key parameters such as video dimensions, frame rate, and duration are outlined,
essential for Vidat temporal analysis and processing (not reported in the example for
simplicity). The inclusion of keyframe listings ofers strategic markers for video segmentation and
analysis, facilitating eficient data handling. Furthermore, the presence of object and region
annotation maps anticipates future expansion into object detection and spatial
characterization. The delineation of action annotation lists underscores the intention to annotate dynamic
data. Particularly noteworthy is the skeleton annotation list, which furnishes detailed skeletal
representations. The configuration segment provides an extensive catalog of object and action
label data, coupled with skeleton-type specifications, forming the cornerstone for semantic
understanding and classification in video content.</p>
      <p>Finally, since this JSON is aligned with the original video frames, it can be loaded into the
Vidat visual annotation tool. Our dance domain expert used the inferred human key-point labels
to add new dance move labels, supported by the already-generated dance poses. Each label
corresponds to a name (e.g., arabesque) and a time interval, representing the duration of the
step execution. After completing the label descriptions, the next step would normally amount to
the labeling of human movement frame by frame, but those were already labeled automatically
generated, so the domain expert only corrected minor interpolations or mismatches between
the skeleton and the video image. Finally, the dance domain experts annotated dance moves
linked with one or more inferred poses. The resulting JSON can be stored at any moment, and
will now include both skeleton and dance move label data. The outputs of such a process are
visually reported in Figure 4.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Discussion and Conclusion</title>
      <p>The introduction of the DanXe framework represents a significant leap forward in digitizing and
analyzing dance heritage materials, ofering promising capabilities for the automatic annotation
of archive videos. Supported by human oversight and augmented by XR technologies, the
proposed multi-modal, semi-automatic annotation framework signifies a substantial advancement
in cultural heritage conservation, especially in cases involving intangible heritage alongside
tangible assets. Given the unique nature of the analyzed case study (that of an archival collection
related to a dancer’s legacy), the annotations cannot be limited to just recognizing steps but
must also allow for tracking props, stage settings, and performers involved. This would enable
the preservation of all scenic components that contributed to a choreographic creation, ensuring
better preservation, facilitating restaging processes, and enriching the process of metadata
creation, which is typically limited to principal performers or even just the choreographer.
Providing a tool that can support the work of scholars and archivists, without replacing their
expertise but leveraging it to validate semi-automatic acquisitions, not only represents a valuable
contribution in expediting their work but also enriches the metadata associated with archival
sources, thus enabling user research. This approach promises to generate richer, more accurate
datasets, ultimately fostering a deeper understanding and appreciation of the art form.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was partly funded by: (i) the PNRR - M4C2 - Investimento 1.3, Partenariato Esteso
PE00000013 - “FAIR - Future Artificial Intelligence Research” - Spoke 8 “Pervasive AI”, funded
by the European Commission under the NextGeneration EU program.
mixed quantitative-qualitative analyses, in: Proceedings of the 2022 ACM Conference on
Information Technology for Social Good, 2022, pp. 161–166.
[17] H.-S. Fang, J. Li, H. Tang, C. Xu, H. Zhu, Y. Xiu, Y.-L. Li, C. Lu, Alphapose: Whole-body
regional multi-person pose estimation and tracking in real-time, IEEE Transactions on
Pattern Analysis and Machine Intelligence (2022).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Adshead-Lansdale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Layson</surname>
          </string-name>
          ,
          <article-title>Dance history: An introduction</article-title>
          ,
          <source>Routledge</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>M. De Marinis</surname>
          </string-name>
          ,
          <article-title>Il corpo dello spettatore. performance studies e nuova teatrologia</article-title>
          ,
          <source>Sezione di Lettere</source>
          (
          <year>2014</year>
          )
          <fpage>188</fpage>
          -
          <lpage>201</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Giannasca</surname>
          </string-name>
          ,
          <article-title>Dance in the ontological perspective of a document theory of art, Danza e ricerca. laboratorio di studi, scritture</article-title>
          , visioni
          <volume>10</volume>
          (
          <year>2018</year>
          )
          <fpage>325</fpage>
          -
          <lpage>346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Randi</surname>
          </string-name>
          ,
          <article-title>Primi appunti per un progetto di edizione critica coreica, SigMa-Rivista di Letterature comparate</article-title>
          ,
          <source>Teatro e Arti dello spettacolo 4</source>
          (
          <year>2020</year>
          )
          <fpage>755</fpage>
          -
          <lpage>771</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Franco</surname>
          </string-name>
          ,
          <article-title>Corpo-archivio: mappatura di una nozione tra incorporazione e pratica coreografica (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kavanagh</surname>
          </string-name>
          , Rudolf Nureyev:
          <article-title>the life</article-title>
          ,
          <string-name>
            <surname>Penguin</surname>
            <given-names>UK</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>El Raheb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          ,
          <article-title>Dance in the world of data and objects</article-title>
          , in: International Conference on Information Technologies for Performing Arts, Media Access, and Entertainment, Springer,
          <year>2013</year>
          , pp.
          <fpage>192</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Naveda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leman</surname>
          </string-name>
          ,
          <article-title>Representation of samba dance gestures, using a multi-modal analysis approach</article-title>
          , in: Enactive08,
          <string-name>
            <surname>Edizione</surname>
            <given-names>ETS</given-names>
          </string-name>
          ,
          <year>2008</year>
          , pp.
          <fpage>68</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Medukg: a deep-learning-based approach for multi-modal educational knowledge graph construction</article-title>
          ,
          <source>Information</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>91</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Church</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rothwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Downie</surname>
          </string-name>
          , S. DeLahunta,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Blackwell</surname>
          </string-name>
          ,
          <article-title>Sketching by programming in the choreographic language agent</article-title>
          .,
          <source>in: PPIG</source>
          ,
          <year>2012</year>
          , p.
          <fpage>16</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Finedance: A fine-grained choreography dataset for 3d full body dance generation</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>10234</fpage>
          -
          <lpage>10243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Stacchio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Garzarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cascarano</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Filippo</surname>
          </string-name>
          , E. Cervellati, G. Marfia,
          <article-title>Danxe: an extended artificial intelligence framework to analyze and promote dance heritage, Digital Applications in Archaeology and Cultural Heritage (</article-title>
          <year>2024</year>
          )
          <article-title>e00343</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Alemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Françoise</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pasquier</surname>
          </string-name>
          , Groovenet:
          <article-title>Real-time music-driven dance movement generation using artificial neural networks, networks 8 (</article-title>
          <year>2017</year>
          )
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <article-title>Dance with melody: An lstm-autoencoder approach to musicoriented dance synthesis</article-title>
          ,
          <source>in: Proceedings of the 26th ACM international conference on Multimedia</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1598</fpage>
          -
          <lpage>1606</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bianco</surname>
          </string-name>
          , G. Ciocca,
          <string-name>
            <given-names>P.</given-names>
            <surname>Napoletano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schettini</surname>
          </string-name>
          ,
          <article-title>An interactive tool for manual, semiautomatic and automatic video annotation</article-title>
          ,
          <source>Computer Vision and Image Understanding</source>
          <volume>131</volume>
          (
          <year>2015</year>
          )
          <fpage>88</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Stacchio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Angeli</surname>
          </string-name>
          , G. Lisanti, G. Marfia,
          <article-title>Applying deep learning approaches to</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>