<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Avatar for Spanish Sign Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>María Pilar Agustín-Llach</string-name>
          <email>maria-del-pilar.agustin@unirioja.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vanessa Alvear</string-name>
          <email>vanessa.alvear@irsoluciones.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>César Domínguez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel García-Domínguez</string-name>
          <email>manuel.garciad@unirioja.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jónathan Heras</string-name>
          <email>jonathan.heras@unirioja.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Félix Lanas</string-name>
          <email>felix.lanas@unirioja.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gadea Mata</string-name>
          <email>gadea.mata@unirioja.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pablo Munarriz-Senosiain</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pablo Ochoa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirari San Martín</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>(M. San Martín)</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Spanish Sign Language, LSE, Glosses, Avatar</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Filologías Modernas, Universidad de La Rioja</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Departamento de Matemáticas y Computación, Universidad de La Rioja</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Innovación Riojana de Soluciones IT S.L.</institution>
          ,
          <addr-line>Logroño</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Approximately 70,000 individuals use Spanish Sign Language (LSE) as their primary means of communication. However, due to the limited prevalence of sign language proficiency among the general population, deaf individuals often face significant challenges in various environments. Therefore, the development of technological systems that facilitate communication between deaf and hearing individuals is essential. In the LSEAvatar project, we address how to translate messages from Spoken Spanish into LSE to facilitate communication for deaf individuals. To achieve this goal, we will employ natural language processing techniques along with deep learning models that convert audio or text into LSE glosses. Ultimately, the project aims for these glosses to be interpreted by an avatar, enhancing access to information and communication for deaf individuals. This project has the collaboration, advice, and validation of the Association of Deaf of La Rioja.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        According to the World Federation of the Deaf, approximately 70 million people worldwide are deaf [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
In Spain, the population with hearing disabilities amounts to around 1,230,000 people [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], with
approximately 70,000 individuals [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] using sign language as their primary means of communication.
However, due to the limited prevalence of sign language proficiency among the general population, deaf
individuals often face significant challenges in various environments, making daily interactions dificult,
particularly in the absence of interpreters for translation assistance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Given that sign language is the
primary mode of communication for the deaf community, the development of technological systems
that facilitate communication between deaf and hearing individuals is essential [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Specifically, tools
are needed to enable hearing persons to understand signed messages and to help them get their oral
messages across in signed modality so that deaf individuals can apprehend them. The present project
focuses on the latter aspect.
      </p>
      <p>
        Currently, one of the most widely used tools allowing deaf individuals to access spoken Spanish, from
now on LOE (which stands for Lengua Oral Española), is the use of transcriptions or subtitles available
in television programs and films. Additionally, applications such as Google’s real-time transcription
tool [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] can be used to generate text transcriptions that deaf individuals can read. However, there are
several reasons why deaf individuals may prefer using Spanish Sign Language (from now on, LSE, which
stands for Lengua de Signos Española) rather than reading written text. Firstly, many deaf individuals,
especially those who are deaf from birth, prefer accessing information through LSE since it is their
      </p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
native language. Secondly, some individuals may struggle to read subtitles at the speed at which they
change on the screen. Finally, subtitles generally fail to capture other aspects of oral language such as
intonation patterns, volume, rhythm, specific accents and so on.</p>
      <p>The aforementioned considerations have led to the establishment of the following objective for the
LSEAvatar project. In this project, we aim to develop an avatar capable of translating LOE into LSE
using Natural Language Processing (NLP) and Computer Vision techniques. To achieve such a goal, the
following specific objectives have been outlined:
• Datasets: Collection of diverse datasets to train the models used by the avatar. Specifically, a
repository of signed video clips will be built using open-source LSE dictionaries. Additionally, a
dataset will be created to convert LOE into glosses — an intermediate written representation of
sign concepts.
• Models: Development and implementation of various machine learning and deep learning models
to support diferent components of the avatar. These include computer vision models for capturing
facial, arm, and hand movements necessary for signing, and rendering these movements onto the
avatar; and language models for transcribing audio into text (that is, Speech to Text technology),
and converting written LOE into LSE glosses.
• Integration: Design and implementation of the avatar, ensuring seamless integration with the
models so that the avatar can generate sign language output from spoken or written input.
• Validation: Evaluation of the avatar’s efectiveness by members of the Association of the Deaf of</p>
      <p>La Rioja.
• Deployment: Implementation of the avatar on various platforms, such as mobile applications and
web pages, and exploring its potential integration into videos as an alternative to subtitles.</p>
      <p>
        The development of this avatar will represent a significant step forward in fostering inclusion and
integration for deaf individuals. At present, the only existing avatar with a similar purpose is the
one presented in [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], which focuses solely on signing individual words using the LSE alphabet. As
such, it does not incorporate specific signs or the grammatical rules required for accurate translation
between LOE and LSE. Similar projects to LSEAvatar have been proposed in other Spanish speaking
countries; for instance, in Mexico [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] or Ecuador [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]; however, they work with the sign language of
those countries that difers from LSE. Additionally, services such as SVisual [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], available in police
stations, connect LSE users with interpreters. However, such systems are not available in most everyday
situations, and usually, deaf people have to pay to access this kind of service. Therefore, our project
aims to significantly expand communication opportunities for deaf individuals across a wide range of
environments and scenarios, ultimately providing an inclusive and accessible tool.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Architecture</title>
      <p>The avatar that will be built in the LSEAvatar project consists of three modules depicted in Figure 1. All
the models, datasets and code associated with the project will be publicly released with an open-source
license.</p>
      <p>
        The first module of the avatar is responsible for transcribing LOE audio in LOE text. To achieve
this, pre-trained multilingual speech-to-text models such as Whisper [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or Seamless [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] will be used.
These models will be evaluated by taking into account the dialect of the speakers and their speed rate.
      </p>
      <p>
        The second module focuses on translating LOE text into LSE glosses. This can be seen as a machine
translation problem, for which the most successful approach to date is sequence-to-sequence models
based on transformers [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In our case, we will fine-tune and compare several of these models, using
diferent open-source language models in Spanish (such as Bertin [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or Maria [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]) and multilingual
models, such as mt5 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], as starting points.
      </p>
      <p>
        The third module will be responsible for extracting the necessary movements to sign the diferent LSE
signs from the glosses. To carry out this development, we will use Spanish Sign Language Dictionaries,
which contain videos of more than 10,000 signs performed by deaf professionals specialized in LSE.
From these videos, we will extract the necessary signing movements using open-source libraries such
as MediaPipe [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] or OpenPose [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], which enable tracking of key points on the hands, arms, face, and
body. For words without an associated sign, the fingerspelling alphabet will be used.
      </p>
      <p>
        Finally, to implement the avatar, we will use Blender [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], a free and open-source platform dedicated
to modelling, animation, and the creation of three-dimensional graphics.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>In this section, we present the results obtained in the construction of each module of the project.</p>
      <sec id="sec-3-1">
        <title>3.1. Speech to text</title>
        <p>
          The first module is in charge of transcribing LOE audio in LOE text; therefore, we have analysed several
pretrained automatic speech recognition (ASR) models in the open-source COSER corpus (that stands
for Corpus Oral y Sonoro del Español Rural, in English Audible Corpus of Rural Spanish) [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], which
captures the varieties of spoken European Spanish — the results of our analysis were presented in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
In particular, we considered 7 models: 6 Whisper-based [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], tiny, base, small, medium, large-v2, and
large-v3; and 1 Seamless [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] model, SeamlessM4T v2 large model.
        </p>
        <p>The mean and standard deviation of the performance of each ASR model is shown in Figure 2. As we
can see in those results, the mean performance in terms of Word Error Rate (WER) of the models ranges
from 0.81 in the case of the Whisper tiny model, to 0.292 in the case of the Whisper large v3 model (the
lower, the better). Moreover, we can notice that, as expected, increasing the size of the Whisper model
reduces the errors produced by the model. However, there is not a significant diference between the
large v2, large v3 and medium versions of Whisper; hence, in this context, the version trained with
more data does not provide a significant benefit. Finally, the performance of the Seamless model is only
better than the tiny version of Whisper; therefore, this model does not seem a suitable alternative to
the family of Whisper models.</p>
        <p>Additionally, we have studied how much time it takes for each ASR model to process 1 minute of
audio using a GPU NVIDIA GeForce RTX 3080, see Table 1 — note that the two large versions of Whisper
take the same time. In the case of the Whisper models, the bigger the model, the slower; namely, the
tiny version took approximately 4.75 seconds to process the audio, but the large models took almost 24
seconds. It is worth noticing that the Seamless model is the fastest of the analysed ASR systems, even
faster than the Whisper tiny model, but as we previously mentioned, its performance is not on par with
the bigger models of the Whisper family.</p>
        <p>From these results, we can conclude that the large v3 version of Whisper produces the most accurate
transcriptions; however, it is considerably slower than its smaller counterparts. In the context of
building the transcriptions from a recorded video, processing time is not usually an issue, since the
automatic transcription process can be run in the background, and after it finishes, it can be fed to the
following module of our architecture. Nevertheless, large models might require special hardware to run
in a reasonable time and are not suitable to be used for real-time processing; in such cases, the medium
version of Whisper provides a good trade-of between accuracy in the transcription and inference speed.
For this project, we decided to implement the large v3 version of Whisper since our system will initially
work in an ofline manner, and will not require real-time processing. Therefore, it is better to obtain
more accurate transcriptions even if they take more time to process.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. LOE to glosses</title>
        <p>
          The second module is in charge of translating LOE text to LSE glosses. We have approached this task as
a translation problem and trained several sequence-to-sequence models. In order to train those models,
we have used the synLSE dataset presented in [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. The synLSE dataset is, as far as we are aware, the
only dataset devoted to translate Spanish text to glosses. The synLSE dataset was a synthetically created
corpus, and this might introduce potential limitations in terms of linguistic variability, naturalness, and
generalization to real-world scenarios; hence, further research is necessary to incorporate real-world
data.
        </p>
        <p>
          Using the synLSE dataset, we fine-tuned three multilingual models (MBart Large 50 [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], Marian
MT [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] and T5-Small [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]) on Google Colab with the by-default hyperparameters shown in Table 2
and using the functionality provided by the Huggingface library [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. Finally, we have evaluated the
models using both BLEU, BLEU-3, BLEU-4 and Rouge-L [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ].
        </p>
        <p>
          The performance of the diferent models on the test set of the synLSE dataset is presented in Table 3.
From those results, we can conclude that the best model was obtained with the MBart architecture,
with a BLEU-4 of 56.47 and a ROUGE-L of 88.89, considerably improving the baseline results presented
in [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Avatar</title>
        <p>The last module of the project is the construction of an avatar that signs LSE glosses. The construction
of this module is still ongoing, and the preliminary steps are presented here.</p>
        <p>
          First of all, we have built a dataset of LSE glosses captured from three diferent sources: ARASAAC [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ],
Sématos [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] and Spreadthesign [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. A total of 23,488 videos were downloaded, 14,368 of them from
unique signs — the number of videos and unique signs from each site can be seen in Table 4.
        </p>
        <p>
          The next step is the extraction of the movements from each video. We have studied several 3D
Human Pose Estimation models for this step. To the best of our knowledge, there are two diferent
approaches. In the first approach, models directly estimate 3D locations of human joints; whereas, in
the second approach, a model first estimates 2D locations of human joints, and then another model lifts
the estimations to 3D. Some estimators we have tested are: for direct 3D estimations Mediapipe [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ];
for 2D estimations OpenPose [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] and AlphaPose [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]; and for lifting 2D estimations MotionBert [ 33]
and the method presented in [34]. Figure 3 illustrates an example obtained with these approximations
in which a person’s pose from a video frame is converted into a Blender armature by extracting the
locations of the human’s joints. Furthermore, the reader can watch a video of the armature signing
“LSEAvatar” in LSE using fingerspelling at the link https://www.youtube.com/shorts/BB-KhEqMAu4.
        </p>
        <p>Several tasks remain as future work to advance the development of the avatar. These include
(a) Frame of a person signing the
letter m from the
Spreadthesign dataset.</p>
        <p>(b) Skeleton extracted from the
frame in Figure 3a.</p>
        <p>(c) 3D armature obtained from the
skeleton in Figure 3b. This
armature is generated in Blender.
the seamless integration of multiple video segments to generate coherent and natural sign language
sentences, the incorporation of a full-body model into the existing 3D armature, and the comprehensive
validation of the constructed avatar. The evaluation of such technologies is typically carried out
manually by experts, who assess dimensions such as grammatical accuracy in sign language, naturalness
of expression, readability, and cultural appropriateness [35]. Additionally, some prior studies have
employed back-translation techniques as a complementary evaluation method [36]. In this work,
we plan to adopt both expert-based and back-translation approaches to assess the performance of
LSEAvatar.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Social Impact</title>
      <p>
        LSEAvatar aims to benefit all individuals who use Spanish Sign Language (LSE) as their primary means
of communication, which amounts to approximately 70,000 people [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Additionally, when a deaf
person requires simultaneous translation through an interpreter, the deaf individual currently bears
both the cost and the responsibility of securing the interpreter. The tool developed as a product of the
LSEAvatar project will benefit deaf individuals, who will gain direct access to relevant information, as
well as public administrations and businesses, which will be able to disseminate information to a larger
audience.
      </p>
      <p>This tool must be validated by LSE experts for it to be helpful to the deaf community. To this end, we
are collaborating with the Association of the Deaf of La Rioja. Specifically, between three and eight
individuals from the association will participate in the validation process. Members of this association
will be the first to test the tool and, consequently, benefit from its use.</p>
      <p>As the project expands and more users rely on the avatar for translation, a scalable computing
infrastructure will be required to handle the workload. This entails leveraging cloud computing technologies
that allow for vertical and horizontal scaling as needed. Scalability also involves continuously improving
and updating the avatar’s various components to enhance translation accuracy and fluency. Finally, as
the project grows, it will be crucial to ensure that the avatar is compatible with a wide range of devices
and platforms, such as mobile applications, websites, and smart devices, necessitating a flexible and
adaptive development approach.</p>
      <p>Furthermore, this project has the potential to be adapted to other sign languages, thereby increasing
its reach and impact. It is important to consider that, just as there are numerous spoken languages, the
same applies to sign languages, estimated over 200 diferent sign languages worldwide. In fact, the
sign languages used in Spanish-speaking countries vary, meaning that the avatar developed for Spain
cannot be directly used elsewhere. However, the avatar will have a modular design, requiring only
the development of a translation model from the spoken language to the corresponding sign language,
along with the provision of a dataset of sign language videos to adapt the avatar accordingly.</p>
    </sec>
    <sec id="sec-5">
      <title>Funding institutions</title>
      <p>This work was partially supported by INDRA through the call “Convocatoria de Ayudas de Proyectos de
Investigación en Tecnologías Accesibles 2024” and by the Government of La Rioja through Proyecto INICIA
2023/01 and AFIANZA 2024/01 and Grant PID2024-155834NB-I00 by MICIU/AEI/10.13039/501100011033
and by ERDF/EU.</p>
    </sec>
    <sec id="sec-6">
      <title>Research groups</title>
      <p>The development of LSEAvatar is conducted by members of two groups of University of La Rioja: the
Grupo de Informática de la Universidad de La Rioja (https://investigacion.unirioja.es/grupos/45/detalle),
and the Grupo de Lingüística Aplicada de la Universidad de La Rioja (https://investigacion.unirioja.es/
grupos/4/detalle).</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We are grateful to the Asociación de Personas Sordas de La Rioja for their help in the development of this
project. We thank M. Ivashechkin for his help with the task of 3D Human Pose Estimation.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
TPAMI.2022.3222784. doi:10.1109/TPAMI.2022.3222784.
[33] W. Zhu, X. Ma, Z. Liu, L. Liu, W. Wu, Y. Wang, MotionBERT: A Unified Perspective on Learning
Human Motion Representations , in: 2023 IEEE/CVF International Conference on Computer Vision
(ICCV), IEEE Computer Society, Los Alamitos, CA, USA, 2023, pp. 15039–15053. URL: https://doi.
ieeecomputersociety.org/10.1109/ICCV51070.2023.01385. doi:10.1109/ICCV51070.2023.01385.
[34] M. Ivashechkin, O. Mendez, R. Bowden, Improving 3D Pose Estimation for Sign Language, arXiv
preprint arXiv:2308.09525 (2023).
[35] Z. Yuan, Z. Ruiquan, Y. Dengfeng, C. Yidong, Translation Quality Evaluation of Sign Language
Avatar, in: Proceedings of the 23rd Chinese National Conference on Computational Linguistics
(Volume 3: Evaluations), 2024, pp. 405–415.
[36] R. Zuo, F. Wei, Z. Chen, B. Mak, J. Yang, X. Tong, A simple baseline for spoken language to sign
language translation with 3D avatars, in: European Conference on Computer Vision, Springer,
2024, pp. 36–54.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] World Federation of the Deaf, Our work,
          <year>2024</year>
          . URL: https://wfdeaf.org/our-work/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>INE</surname>
          </string-name>
          , Utilización de la lengua de signos por sexo y edad.
          <source>Población</source>
          de 6 y más años con discapacidad de audición,
          <year>2024</year>
          . URL: https://www.ine.es/, accessed:
          <fpage>2025</fpage>
          -03-13.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Confederación</given-names>
            <surname>Estatal de Personas Sordas</surname>
          </string-name>
          (CNSE),
          <source>Personas sordas</source>
          ,
          <year>2024</year>
          . URL: https://www.cnse. es/index.php/personas-sordas, accessed:
          <fpage>2024</fpage>
          -03-20.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Rodríguez-Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Martínez-Otzeta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sierra</surname>
          </string-name>
          ,
          <article-title>A Hierarchical Approach for Spanish Sign Language Recognition: From Weak Classification to Robust Recognition System</article-title>
          ,
          <source>in: Intelligent Systems and Applications</source>
          , Elsevier,
          <year>2023</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nuñez-Marcos</surname>
          </string-name>
          , et al.,
          <article-title>A survey on Sign Language machine translation</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>213</volume>
          (
          <year>2023</year>
          )
          <article-title>118993</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2023</year>
          .
          <volume>118993</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Google</surname>
            ,
            <given-names>Live</given-names>
          </string-name>
          <string-name>
            <surname>Transcribe</surname>
          </string-name>
          &amp; Notification,
          <year>2024</year>
          . URL: https://play.google.com/store/apps/details?id= com.google.audio.hearing.visualization.accessibility.scribe&amp;hl=en_US, accessed:
          <fpage>2025</fpage>
          -03-14.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Morillas-Espejo</surname>
          </string-name>
          , E. Martinez-Martin,
          <article-title>Sign4all: A Low-Cost Application for Deaf People Communication</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2024</year>
          )
          <fpage>98776</fpage>
          -
          <lpage>98786</lpage>
          . URL: https://ieeexplore.ieee.org/document/10242052.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Morillas-Espejo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Martinez-Martin</surname>
          </string-name>
          ,
          <article-title>A virtual avatar for sign language signing</article-title>
          ,
          <source>in: International Conference on Soft Computing Models in Industrial and Environmental Applications</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Martinez-Seis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pichardo-Lagunas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hernández-Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Rivera-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Miranda</surname>
          </string-name>
          ,
          <article-title>Automatic translation of sentences to mexican sign language: Rule-based machine translation and animation synthesis in avatar</article-title>
          ,
          <source>Computación y Sistemas</source>
          <volume>29</volume>
          (
          <year>2025</year>
          )
          <fpage>145</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Salamea-Palacios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Salcedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Peralta-Marin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Sacoto-Cabrera</surname>
          </string-name>
          ,
          <article-title>Prototype of a text to ecuadorian sign language translator using a 3d virtual avatar</article-title>
          ,
          <source>in: 2024 IEEE Colombian Conference on Communications and Computing (COLCOM)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Servicio</surname>
            <given-names>SVisual</given-names>
          </string-name>
          <source>en las comisarías de la Policía Nacional</source>
          <volume>091</volume>
          ,
          <year>2024</year>
          . URL: https://www.svisual.org/, accessed:
          <fpage>2025</fpage>
          -05-26.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          , T. Xu,
          <string-name>
            <given-names>G.</given-names>
            <surname>Brockman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>McLeavey</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Robust speech recognition via large-scale weak supervision</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>28492</fpage>
          -
          <lpage>28518</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Barrault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-A.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Meglioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Duquenne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Elsahar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hefernan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofman</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>SeamlessM4T-Massively</surname>
            <given-names>Multilingual</given-names>
          </string-name>
          &amp; Multimodal Machine Translation,
          <source>arXiv preprint arXiv:2308.11596</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>J. De la Rosa</surname>
            ,
            <given-names>E. G.</given-names>
          </string-name>
          <string-name>
            <surname>Ponferrada</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>P. G. d. P.</given-names>
          </string-name>
          <string-name>
            <surname>Salas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Grandury, BERTIN: Eficient Pre-training of a Spanish Language Model using Perplexity Sampling</article-title>
          ,
          <source>arXiv preprint arXiv:2207.06814</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gutiérrez-Fandiño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Armengol-Estapé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pàmies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Llop-Palao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Silveira-Ocampo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Carrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Armentano-Oller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rodriguez-Penagos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          ,
          <source>MarIA: Spanish Language Models, arXiv preprint arXiv:2107.07253</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siddhant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barua</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Rafel, mT5: A massively multilingual pre-trained text-to-text transformer</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>11934</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lugaresi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>McClanahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Uboweja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hays</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-L. Chang</surname>
            ,
            <given-names>M. G.</given-names>
          </string-name>
          <string-name>
            <surname>Yong</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          , et al.,
          <article-title>Mediapipe: A framework for building perception pipelines</article-title>
          , arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>08172</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Hidalgo</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Sheikh</surname>
          </string-name>
          ,
          <article-title>OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Afinity Fields</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hess</surname>
          </string-name>
          ,
          <article-title>Blender foundations: The essential guide to learning blender 2.5</article-title>
          ,
          <string-name>
            <surname>Routledge</surname>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Fernández-Ordóñez</surname>
          </string-name>
          , Inés, COSER.
          <source>Corpus Oral y Sonoro del Español Rural</source>
          ,
          <year>2005</year>
          . URL: https: //corpusrural.es/,
          <source>accessed: 2025-08-05. ISBN 978-84-616-4937-2 ISLRN 100-664-657-480-2.</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>San Martín</surname>
          </string-name>
          , J. Heras, G. Mata, S. Gómez,
          <string-name>
            <surname>Is</surname>
            <given-names>ASR</given-names>
          </string-name>
          <article-title>the right tool for the construction of Spoken Corpus Linguistics in European Spanish?</article-title>
          ,
          <source>Procesamiento del lenguaje natural 73</source>
          (
          <year>2024</year>
          )
          <fpage>165</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Perea-Trigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>Martínez-del Amor</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Álvarez-García</surname>
            ,
            <given-names>L. M.</given-names>
          </string-name>
          <string-name>
            <surname>SoriaMorillo</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          <string-name>
            <surname>Vegas-Olmos</surname>
          </string-name>
          ,
          <article-title>Synthetic corpus generation for deep learning-based translation of spanish sign language</article-title>
          ,
          <source>Sensors</source>
          <volume>24</volume>
          (
          <year>2024</year>
          )
          <fpage>1472</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <article-title>Multilingual translation with extensible multilingual pretraining and finetuning</article-title>
          , arXiv preprint arXiv:
          <year>2008</year>
          .
          <volume>00401</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>M.</given-names>
            <surname>Junczys-Dowmunt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Grundkiewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dwojak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Heafield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Neckermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Seide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Germann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Fikri</given-names>
            <surname>Aji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bogoychev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F. T.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Birch</surname>
          </string-name>
          , Marian: Fast Neural Machine Translation in C++,
          <source>in: Proceedings of ACL</source>
          <year>2018</year>
          ,
          <article-title>System Demonstrations, Association for Computational Linguistics</article-title>
          , Melbourne, Australia,
          <year>2018</year>
          , pp.
          <fpage>116</fpage>
          -
          <lpage>121</lpage>
          . URL: http://www.aclweb.org/ anthology/P18-4020.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the Limits of Transfer Learning with a Unified Text-to-</article-title>
          <string-name>
            <surname>Text</surname>
            <given-names>Transformer</given-names>
          </string-name>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          . URL: http://jmlr.org/papers/v21/
          <fpage>20</fpage>
          -
          <lpage>074</lpage>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          , et al.,
          <article-title>Huggingface's transformers: State-of-the-art natural language processing</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>03771</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>E.</given-names>
            <surname>Chatzikoumi</surname>
          </string-name>
          ,
          <article-title>How to evaluate machine translation: A review of automated and human metrics</article-title>
          ,
          <source>Natural Language Engineering</source>
          <volume>26</volume>
          (
          <year>2020</year>
          )
          <fpage>137</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>ARASAAC</surname>
          </string-name>
          , Last accessed on
          <year>June 2025</year>
          . URL: https://arasaac.org/.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Sématos</surname>
          </string-name>
          , Last accessed on
          <year>June 2025</year>
          . URL: https://www.sematos.eu/lse.html.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Spreadthesign</surname>
          </string-name>
          , Last accessed on
          <year>June 2025</year>
          . URL: https://spreadthesign.com/es.es/search/.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>H.-S.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Lu, AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis &amp; Machine Intelligence</source>
          <volume>45</volume>
          (
          <year>2023</year>
          )
          <fpage>7157</fpage>
          -
          <lpage>7173</lpage>
          . URL: https://doi.ieeecomputersociety.
          <source>org/10</source>
          .1109/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>