<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Research and development of a subtitle management system using artificial intelligence</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrii M. Striuk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladyslav V. Hordiienko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>PCWrEooUrckResehdoinpgs ISSNc1e6u1r-3w-0s0.o7r3g</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Academy of Cognitive and Natural Sciences</institution>
          ,
          <addr-line>54 Universytetskyi Ave., Kryvyi Rih, 50086</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kryvyi Rih National University</institution>
          ,
          <addr-line>11 Vitalii Matusevych Str., Kryvyi Rih, 50027</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Kryvyi Rih State Pedagogical University</institution>
          ,
          <addr-line>54 Universytetskyi Ave., Kryvyi Rih, 50086</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>415</fpage>
      <lpage>427</lpage>
      <abstract>
        <p>Subtitles play a vital role in making video content accessible to a wider audience, including individuals with hearing impairments and those who do not understand the spoken language. However, the manual creation of subtitles is a time-consuming and labor-intensive process. This paper proposes an AI-powered subtitle management system that automates the generation and management of subtitles for video content. The system leverages state-of-the-art automatic speech recognition (ASR) and machine translation (MT) technologies to generate accurate and synchronized subtitles in multiple languages. The proposed system architecture consists of a speech recognition module, a machine translation module, a subtitle segmentation and formatting module, and a user-friendly interface. The paper provides a comprehensive literature review of the related work in the field of AI-based subtitle generation, covering key aspects such as speech recognition techniques, machine translation approaches, multimodal methods, and evaluation methodologies. The implications of the proposed system for subtitle generation pipelines are discussed, highlighting its potential to enhance eficiency, scalability, and accessibility. The limitations of the current system and directions for future research are also outlined. This research contributes to the advancement of AI-powered subtitle generation and aims to make video content more inclusive and accessible to a global audience.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;subtitles</kwd>
        <kwd>artificial intelligence</kwd>
        <kwd>speech recognition</kwd>
        <kwd>machine translation</kwd>
        <kwd>video accessibility</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Background and motivation</title>
        <p>
          In today’s digital age, video content has become an integral part of communication, education, and
entertainment. However, the accessibility of video content remains a challenge for individuals with
hearing impairments or those who do not understand the spoken language. Subtitles play a crucial role
in making video content more inclusive and accessible to a wider audience [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>
          Despite the importance of subtitles, the process of manually creating them is time-consuming,
laborintensive, and prone to errors [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. It requires skilled human translators to listen to the audio, transcribe
the dialogue, and synchronize the subtitles with the video timestamps. This manual process often
results in delays in the availability of subtitles and limits the scalability of subtitle generation for large
volumes of video content.
        </p>
        <p>
          Advancements in artificial intelligence (AI) technologies, particularly in the fields of automatic speech
recognition (ASR) and machine translation (MT), have opened up new possibilities for automating the
subtitle generation process [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ]. AI-powered systems can significantly reduce the time and efort
required for subtitle creation while maintaining high levels of accuracy and quality.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Research objectives</title>
        <p>The primary objective of this research is to develop an AI-powered subtitle management system that
automates the generation and management of subtitles for video content. The proposed system aims to
leverage state-of-the-art ASR and MT technologies to generate accurate and synchronized subtitles in
multiple languages.</p>
        <p>Furthermore, this research aims to provide a comprehensive literature review of the existing
techniques, applications, and evaluation methodologies in the field of AI-based subtitle generation. By
synthesizing the current state of knowledge, we aim to identify the challenges, opportunities, and future
research directions in this domain.</p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Paper contributions and organization</title>
        <p>The main contributions of this paper are as follows:
• We propose an AI-powered subtitle management system that automates the generation and
management of subtitles for video content, leveraging state-of-the-art ASR and MT technologies.
• We provide an extensive literature review of the related work in the field of AI-based subtitle
generation, covering key aspects such as speech recognition, machine translation, multimodal
approaches, and evaluation methodologies.
• We present the architecture and key components of the proposed subtitle management system,
including the speech recognition module, machine translation module, subtitle segmentation and
formatting module, and user interface.</p>
        <p>The remainder of this paper is organized as follows: section 2 provides a comprehensive overview
of the related work in the field of AI-based subtitle generation. Section 3 describes the proposed
AI-powered subtitle management system, including its architecture, components, and functionalities.
Finally, section 4 concludes the paper and summarizes the key findings and contributions.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>Extensive research has been conducted in the field of AI-based subtitle generation, spanning various
techniques, applications, and evaluation methodologies. This section provides a comprehensive overview
of the related work, focusing on key aspects such as speech recognition, machine translation, multimodal
approaches, and subtitle evaluation metrics.</p>
      <sec id="sec-2-1">
        <title>2.1. Speech recognition for subtitle generation</title>
        <p>
          Automatic speech recognition (ASR) plays a crucial role in the subtitle generation pipeline by converting
spoken audio into textual transcripts. Researchers have explored various ASR techniques to improve the
accuracy and eficiency of subtitle generation. Radha and Pradeep [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] proposed an automated subtitle
generation system using hidden Markov models (HMMs) for speech recognition. They demonstrated the
efectiveness of their approach on English-language videos and highlighted the importance of accurate
speech recognition for subtitle quality.
        </p>
        <p>
          Convolutional neural networks (CNNs) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] have also been employed for ASR in subtitle generation
tasks. Ramani et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] developed an automatic subtitle generation system using CNNs for speech
recognition, achieving promising results on real-time video subtitling. They emphasized the significance
of audio preprocessing techniques and the choice of media player for seamless subtitle integration.
        </p>
        <p>
          Recurrent neural networks (RNNs), particularly long short-term memory (LSTM) networks, have
gained popularity in ASR for subtitle generation. Kiran et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] proposed a subtitle generation
system using sequence-to-sequence RNNs for speech recognition and video scene indexing. Their
approach demonstrated improved accuracy and the ability to handle longer video sequences compared
to traditional methods.
        </p>
        <p>
          The application of ASR techniques to specific domains, such as lecture videos, has also been explored.
Che et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] developed an automatic lecture subtitle generation system using ASR and evaluated
its performance against manual subtitling. They found that the ASR-generated subtitles significantly
reduced the time and efort required for subtitle creation while maintaining comparable quality. Similarly,
Sridhar et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] proposed a hybrid approach combining acoustic and linguistic features for subtitle
generation in computer science lecture videos, achieving improved accuracy in detecting discourse
boundaries.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Machine translation for multilingual subtitles</title>
        <p>Machine translation (MT) is essential for generating subtitles in multiple languages, enabling video
content to reach a wider global audience. Researchers have investigated various MT approaches,
including statistical and neural models, to improve the quality and eficiency of multilingual subtitle
generation.</p>
        <p>
          Karakanta et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] conducted a comparative study of diferent MT approaches for subtitle
generation, including phrase-based statistical MT and neural MT. They evaluated the performance of these
approaches on a multilingual subtitle corpus and highlighted the challenges in preserving linguistic
and cultural nuances in translated subtitles.
        </p>
        <p>
          Neural MT architectures, such as sequence-to-sequence models with attention mechanisms, have
shown promising results in subtitle translation tasks. Du and Lu [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] proposed a neural MT system
specifically designed for subtitle translation, incorporating features such as character-level encoding and
domain adaptation. Their system achieved significant improvements in translation quality compared to
traditional MT approaches.
        </p>
        <p>
          The quality of AI-generated subtitles compared to human translations has also been a focus of
research. Calvo-Ferrer [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] conducted a study comparing the quality of subtitles generated by machine
translation systems with those created by human translators. They found that while MT systems have
made significant progress, human translators still outperform them in terms of accuracy and contextual
understanding.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Multimodal and end-to-end subtitle generation</title>
        <p>Multimodal approaches that leverage both visual and linguistic information have emerged as promising
directions for subtitle generation. These approaches aim to capture the contextual and visual cues
present in the video to enhance the accuracy and coherence of the generated subtitles.</p>
        <p>
          Shanmugam et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] proposed a multimodal subtitle generation system that combines visual
features extracted from the video frames with linguistic information from the audio transcripts. Their
approach demonstrated improved synchronization and contextual relevance of the generated subtitles
compared to unimodal methods.
        </p>
        <p>
          Martín et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] developed a multimodal subtitle generation framework that incorporates visual,
acoustic, and linguistic features using deep learning techniques. They evaluated their system on a
dataset of educational videos and showed significant improvements in subtitle quality and alignment.
        </p>
        <p>
          End-to-end subtitle generation, where the entire process from speech recognition to subtitle
generation is performed by a single model, has also gained attention. Valor Miró et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] proposed an
end-to-end subtitle generation system that directly translates speech into subtitles in multiple
languages. Their approach achieved comparable performance to pipeline-based methods while reducing
the complexity and error propagation.
        </p>
        <p>
          Hotta et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] developed an end-to-end speech-to-text translation system specifically designed for
subtitle generation. Their system incorporated techniques such as attention mechanisms and beam
search to improve the quality and fluency of the generated subtitles.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Subtitle evaluation metrics and methodologies</title>
        <p>Evaluating the quality and efectiveness of AI-generated subtitles is crucial for assessing their usability
and acceptability. Researchers have proposed various evaluation metrics and methodologies to measure
the performance of subtitle generation systems.</p>
        <p>
          Automatic evaluation metrics, such as word error rate (WER) and bilingual evaluation understudy
(BLEU), have been widely used to assess the accuracy and fluency of generated subtitles. Ramani et al.
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] employed WER as a metric to evaluate the performance of their CNN-based subtitle generation
system, demonstrating its efectiveness in measuring transcript accuracy.
        </p>
        <p>
          Kaulage et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] utilized the BLEU score to evaluate the quality of machine-translated subtitles in
their multilingual subtitle generation system. They highlighted the importance of considering both the
accuracy and fluency of the translations when assessing subtitle quality.
        </p>
        <p>
          Human evaluation methodologies have also been employed to assess the subjective quality and user
experience of AI-generated subtitles. Al Sawi and Allam [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] conducted a comparative analysis of
human-generated and AI-generated Arabic subtitles, using qualitative and quantitative approaches to
evaluate the subtitle quality and viewer comprehension.
        </p>
        <p>
          Kuroiwa et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] proposed a human-in-the-loop approach for subtitle generation, combining AI
techniques with human intervention to improve the accuracy and cultural appropriateness of the
generated subtitles. They emphasized the importance of human expertise in overcoming the limitations
of AI systems in understanding cultural nuances.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Applications in education and entertainment</title>
        <p>AI-based subtitle generation has found significant applications in various domains, particularly in
education and entertainment. Researchers have explored the benefits and challenges of deploying
AI-powered subtitle systems in these contexts.</p>
        <p>
          In the educational domain, Qiu [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] proposed an automatic subtitle generation system for teaching
videos using cloud computing techniques. They demonstrated the efectiveness of their approach
in reducing the time and efort required for subtitle creation, thereby enhancing the accessibility of
educational content.
        </p>
        <p>
          Martín et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] developed an automatic subtitle generation system specifically tailored for
educational videos produced by the Government of La Rioja, Spain. Their system aimed to improve the
accessibility of important educational content for individuals with hearing impairments.
        </p>
        <p>
          In the entertainment industry, Malakul and Park [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] investigated the efects of using an auto-subtitle
system in educational videos on learning comprehension, cognitive load, and satisfaction. They found
that AI-generated subtitles significantly improved the learning experience for non-native speakers and
individuals with hearing impairments.
        </p>
        <p>
          Kuroiwa et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] explored the challenges and opportunities of AI-based subtitle generation for
anime content. They proposed a hybrid approach combining AI techniques with human intervention
to improve the accuracy and cultural appropriateness of the generated subtitles, highlighting the
importance of human expertise in overcoming linguistic and cultural barriers.
        </p>
        <p>The related work discussed in this section highlights the diverse techniques, applications, and
evaluation methodologies in the field of AI-based subtitle generation. Researchers have made significant
strides in developing efective speech recognition, machine translation, and multimodal approaches
for subtitle generation. However, challenges remain in terms of improving the accuracy, fluency, and
contextual understanding of AI-generated subtitles, particularly in handling linguistic and cultural
nuances. The evaluation of subtitle quality using both automatic metrics and human assessment is
crucial for ensuring the usability and acceptability of AI-generated subtitles in real-world applications.</p>
        <p>As the demand for accessible and multilingual video content continues to grow, AI-based subtitle
generation systems are expected to play an increasingly important role in facilitating the creation
and dissemination of subtitles. Future research directions include the development of more advanced
and integrated AI techniques, the incorporation of domain-specific knowledge, and the exploration of
user-centric evaluation methodologies to ensure the efectiveness and user satisfaction of AI-generated
subtitles.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed subtitle management system</title>
      <p>This section presents the proposed AI-powered subtitle management system, which aims to automate
the generation and management of subtitles for video content. The system leverages
state-of-theart speech recognition and machine translation technologies to generate accurate and synchronized
subtitles in multiple languages. The proposed system architecture, key components, and functionalities
are described in detail.</p>
      <sec id="sec-3-1">
        <title>3.1. System architecture overview</title>
        <p>The proposed subtitle management system follows a modular architecture, consisting of several
interconnected components that work together to achieve automated subtitle generation and management.
Figure 1 provides an overview of the system architecture, highlighting the main modules and their
interactions.</p>
        <p>The system architecture consists of the following main components:
• Speech recognition module is responsible for converting the audio content of the video into
textual transcripts. It employs advanced acoustic and language models to accurately recognize
speech and generate time-aligned transcriptions.
• Machine translation module takes the transcripts generated by the speech recognition module
and translates them into the desired target languages. It utilizes state-of-the-art neural machine
translation techniques to produce high-quality translations while preserving the context and
meaning of the original content.
• Subtitle segmentation and formatting module module handles the segmentation of the translated
text into appropriate subtitle blocks and applies proper formatting and styling to ensure readability
and compliance with subtitle standards.
• The system includes a user-friendly interface that allows users to upload videos, select target
languages, and manage generated subtitles.
• The database stores video metadata, transcripts, translations, and subtitle files for eficient retrieval
and management.</p>
        <p>The modular architecture of the proposed system enables flexibility, scalability, and ease of
maintenance. Each component can be independently developed, tested, and updated, allowing for continuous
improvement and adaptation to advancements in speech recognition and machine translation
technologies.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Speech recognition module</title>
        <p>The speech recognition module plays a crucial role in the subtitle generation pipeline by accurately
converting the spoken audio content into textual transcripts. Figure 2 illustrates the workflow of the
speech recognition module.</p>
        <p>Acoustic model</p>
        <p>Language model
Speaker diarization</p>
        <p>Audio input
Audio preprocessing</p>
        <p>Decoding</p>
        <p>Transcript output</p>
        <p>The speech recognition module incorporates the following key components and techniques:
• The acoustic model is trained on a large dataset of speech samples and their corresponding
transcriptions. It learns the relationship between audio features and phonemes, enabling it to
recognize speech patterns and map them to textual representations.
• The language model captures the statistical properties of the target language, including word
sequences and grammar. It helps in improving the accuracy of speech recognition by providing
contextual information and constraining the search space of possible transcriptions.
• Speaker diarization is the process of segmenting the audio stream into speaker-specific segments.</p>
        <p>It allows the system to identify and diferentiate between multiple speakers in the video, enabling
accurate attribution of subtitles to the corresponding speakers.
• Before feeding the audio content into the speech recognition module, various preprocessing
techniques are applied to enhance the quality and remove noise. These techniques include audio
normalization, noise reduction, and speaker adaptation.</p>
        <p>The speech recognition module employs state-of-the-art deep learning architectures, such as CNN
and RNN, to achieve high accuracy in transcribing speech. The module is trained on diverse speech
datasets, including various accents, dialects, and languages, to ensure robustness and generalization
capabilities.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Machine translation module</title>
        <p>The machine translation module is responsible for translating the transcripts generated by the speech
recognition module into the desired target languages. Figure 3 depicts the architecture of the machine
translation module.</p>
        <p>Encoder</p>
        <p>Encoded source</p>
        <p>Decoder
DEommbaeinddaidnagpltaayteiorn</p>
        <p>Named entity handling</p>
        <p>Attention mechanism</p>
        <p>Parallel training data</p>
        <p>The machine translation module utilizes an encoder-decoder architecture, which has become the de
facto standard in neural machine translation. The key components of the machine translation module
are as follows:
• The encoder takes the source language transcript as input and converts it into a fixed-length
vector representation. It employs techniques such as word embeddings and recurrent neural
networks to capture the semantic and syntactic information of the input sequence.
• The decoder takes the encoded representation produced by the encoder and generates the target
language translation. It uses attention mechanisms to selectively focus on relevant parts of the
input sequence during the decoding process, enabling the generation of accurate and fluent
translations.
• The machine translation module incorporates techniques to handle out-of-vocabulary words and
named entities. This includes subword tokenization, which breaks down rare words into smaller
units, and named entity recognition, which identifies and preserves named entities during the
translation process.
• To improve translation quality for specific domains, such as educational or entertainment content,
the machine translation module can be fine-tuned on domain-specific parallel corpora. This
allows the module to learn domain-specific terminology and style, resulting in more accurate and
contextually relevant translations.</p>
        <p>The machine translation module is trained on large-scale parallel corpora, consisting of sentence
pairs in the source and target languages. Advanced training techniques, such as teacher forcing and
back-translation, are employed to improve the quality and fluency of the generated translations.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Subtitle segmentation and formatting</title>
        <p>The subtitle segmentation and formatting module takes the translated text and performs necessary
segmentation and formatting to generate properly structured subtitle files. Figure 4 illustrates the
process of subtitle segmentation and formatting.</p>
        <p>Translated text</p>
        <p>Text segmentation
Timing synchronization
Formatting and styling</p>
        <p>SRT/WebVTT subtitle file</p>
        <p>The subtitle segmentation and formatting module incorporates the following key steps:
1. The translated text is segmented into appropriate subtitle blocks based on factors such as sentence
boundaries, dialogue turns, and reading speed. The segmentation ensures that each subtitle block
is concise, readable, and synchronized with the audio.
2. Timing synchronization module aligns the segmented subtitle blocks with the corresponding
timestamps in the video. It takes into account the start and end times of each subtitle block,
ensuring that the subtitles appear at the appropriate moments and remain synchronized with the
audio.
3. Formatting and styling module applies proper formatting and styling to the subtitle text, following
established subtitle standards and guidelines. This includes setting font properties, such as size
and color, and applying text formatting, such as italics or bold, to emphasize specific words or
phrases.
4. The segmented and formatted subtitle blocks are combined to generate standard subtitle file
formats, such as SubRip Text (SRT) or Web Video Text Tracks (WebVTT). These subtitle files can
be easily integrated with video players and streaming platforms.</p>
        <p>The subtitle segmentation and formatting module ensures that the generated subtitles adhere to
industry standards and best practices, enhancing the readability and usability of the subtitles for viewers.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. User interface and interaction design</title>
        <p>The proposed subtitle management system includes a user-friendly interface that allows users to
seamlessly interact with the system and manage the subtitle generation process. Figure 5 presents a
high-level overview of the user interface and interaction design.
The user interface incorporates the following key features and functionalities:
• Video upload: users can easily upload their video files to the system through a simple and intuitive
interface. The system supports various video formats and provides options for selecting the
desired target languages for subtitle generation.
• Language selection: the interface allows users to choose the target languages for subtitle generation.</p>
        <p>Users can select multiple languages simultaneously, enabling the creation of multilingual subtitles
for their videos.
• Subtitle preview and editing: the system provides a subtitle preview feature that allows users
to review the generated subtitles alongside the video. Users can make necessary edits and
adjustments to the subtitles, ensuring their accuracy and synchronization with the video content.
• Subtitle download and integration: once the subtitles are generated and reviewed, users can easily
download the subtitle files in standard formats. The interface provides instructions and guides on
how to integrate the subtitle files with popular video players and platforms.
• Subtitle management: the system ofers a centralized subtitle management feature, allowing users
to organize, search, and manage their generated subtitles. Users can view their subtitle history,
update existing subtitles, and delete unwanted subtitle files.</p>
        <p>The user interface is designed with usability and accessibility in mind, ensuring that users with
varying technical backgrounds can easily navigate and utilize the subtitle management system. The
interface incorporates responsive design principles, enabling access from diferent devices, including
desktops, laptops, tablets, and smartphones.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and future work</title>
      <sec id="sec-4-1">
        <title>4.1. Summary of findings</title>
        <p>This research presented an AI-powered subtitle management system that automates the generation
and management of subtitles for video content. The proposed system leveraged state-of-the-art speech
recognition and machine translation technologies to generate accurate and synchronized subtitles in
multiple languages.</p>
        <p>The system architecture was designed to be modular, scalable, and adaptable to advancements in
AI technologies. It consisted of key components such as the speech recognition module, machine
translation module, subtitle segmentation and formatting module, and user interface.</p>
        <p>The speech recognition module utilized advanced acoustic and language models, along with
techniques like speaker diarization and audio preprocessing, to accurately convert spoken audio into textual
transcripts. The machine translation module employed an encoder-decoder architecture with attention
mechanisms to translate the transcripts into desired target languages while preserving context and
meaning.</p>
        <p>The subtitle segmentation and formatting module ensured that the translated text was properly
segmented, synchronized, and formatted according to subtitle standards and guidelines. The user
interface provided a user-friendly and intuitive platform for users to upload videos, select target
languages, preview and edit subtitles, and manage their subtitle files.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Implications for subtitle generation pipelines</title>
        <p>The proposed AI-powered subtitle management system has significant implications for the eficiency
and scalability of subtitle generation pipelines. By automating the process of speech recognition,
translation, and subtitle formatting, the system can greatly reduce the time and efort required for
manual subtitle creation.</p>
        <p>The modular architecture of the system allows for easy integration with existing video platforms
and workflows. It enables content creators, educational institutions, and entertainment providers to
generate high-quality subtitles for their video content quickly and cost-efectively.</p>
        <p>The system’s ability to generate subtitles in multiple languages opens up new opportunities for content
localization and global accessibility. It facilitates the dissemination of educational and entertainment
content to a wider audience, breaking down language barriers and promoting inclusivity.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Limitations and directions for further research</title>
        <p>While the proposed subtitle management system demonstrates promising results, there are certain
limitations and areas for further research:
• Language coverage: the current system focuses on a limited set of languages for subtitle generation.</p>
        <p>Expanding the language coverage to include more diverse and low-resource languages would
enhance the system’s applicability and reach.
• Domain adaptation: the performance of the speech recognition and machine translation modules
can be further improved by fine-tuning them on domain-specific datasets. Investigating techniques
for domain adaptation, such as transfer learning and unsupervised adaptation, would enhance the
system’s efectiveness in various domains like education, entertainment, and specialized fields.
• Contextual understanding: although the system incorporates techniques to handle named entities
and preserve context during translation, there is room for improvement in capturing and
conveying subtle nuances, idiomatic expressions, and cultural references. Exploring advanced natural
language processing techniques, such as contextual embeddings and knowledge graphs, could
enhance the system’s ability to generate more contextually accurate and culturally appropriate
subtitles.
• Incorporating user feedback and interaction mechanisms into the system could greatly improve its
usability and adaptability. Allowing users to provide feedback on generated subtitles, suggest
corrections, and contribute to the system’s learning process would lead to continuous improvement
in subtitle quality and user satisfaction.
• Exploring the integration of visual and acoustic cues from the video content, such as scene changes,
speaker identification, and emotion recognition, could further enhance the accuracy and
synchronization of the generated subtitles.</p>
        <p>Future research directions could focus on addressing these limitations and expanding the capabilities
of the AI-powered subtitle management system. Collaborations between researchers, language experts,
and industry stakeholders would be crucial in driving innovation and advancing the state of the art in
automated subtitle generation.</p>
        <p>Declaration on Generative AI: During the preparation of this work, the authors used Claude 3 Opus in order to: Drafting
content, Generate literature review. After using this service, the authors reviewed and edited the content as needed and takes
full responsibility for the publication’s content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Malakul</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Park,</surname>
          </string-name>
          <article-title>The efects of using an auto-subtitle system in educational videos to facilitate learning for secondary school students: learning comprehension, cognitive load, and satisfaction</article-title>
          ,
          <source>Smart Learning Environments</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <article-title>4</article-title>
          . doi:
          <volume>10</volume>
          .1186/s40561-023-00224-2.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mathur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishnamurthi</surname>
          </string-name>
          ,
          <article-title>Generating subtitles automatically using audio extraction and speech recognition</article-title>
          ,
          <source>in: Proceedings - 2015 IEEE International Conference on Computational Intelligence and Communication Technology, CICT</source>
          <year>2015</year>
          ,
          <article-title>Institute of Electrical and Electronics Engineers Inc</article-title>
          .,
          <year>2015</year>
          , pp.
          <fpage>621</fpage>
          -
          <lpage>626</lpage>
          . doi:
          <volume>10</volume>
          .1109/CICT.
          <year>2015</year>
          .
          <volume>46</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V. B.</given-names>
            <surname>Aswin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Javed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Parihar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Aswanth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Druval</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. V.</given-names>
            <surname>Aravinda</surname>
          </string-name>
          ,
          <article-title>NLP-Driven Ensemble-Based Automatic Subtitle Generation and Semantic Video Summarization Technique</article-title>
          , in: N. N. Chiplunkar, T. Fukao (Eds.),
          <source>Advances in Artificial Intelligence and Data Engineering</source>
          , volume
          <volume>1133</volume>
          <source>of Advances in Intelligent Systems and Computing</source>
          , Springer Nature Singapore, Singapore,
          <year>2021</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>13</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-15-3514-
          <issue>7</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Comparative</surname>
          </string-name>
          <article-title>Study on the Translation Quality between Human and MachineGenerated Subtitles</article-title>
          ,
          <source>in: 2024 6th International Conference on Natural Language Processing, ICNLP</source>
          <year>2024</year>
          ,
          <article-title>Institute of Electrical and Electronics Engineers Inc</article-title>
          .,
          <year>2024</year>
          , pp.
          <fpage>62</fpage>
          -
          <lpage>66</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ICNLP60986.
          <year>2024</year>
          .
          <volume>10692675</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Radha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pradeep</surname>
          </string-name>
          , Automated subtitle generation,
          <source>International Journal of Applied Engineering Research</source>
          <volume>10</volume>
          (
          <year>2015</year>
          )
          <fpage>24741</fpage>
          -
          <lpage>24746</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mukovoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vakaliuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Semerikov</surname>
          </string-name>
          ,
          <article-title>Road Sign Recognition Using Convolutional Neural Networks</article-title>
          , in: E. Faure,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tryus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vartiainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Danchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bondarenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bazilo</surname>
          </string-name>
          , G. Zaspa (Eds.),
          <source>Information Technology for Education, Science, and Technics</source>
          , volume
          <volume>222</volume>
          <source>of Lecture Notes on Data Engineering and Communications Technologies</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>172</fpage>
          -
          <lpage>188</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -71804-5_
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vidya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. R. B.</given-names>
            <surname>Prasad</surname>
          </string-name>
          ,
          <article-title>Automatic Subtitle Generation for Videos</article-title>
          ,
          <source>in: 2020 6th International Conference on Advanced Computing and Communication Systems, ICACCS</source>
          <year>2020</year>
          ,
          <article-title>Institute of Electrical and Electronics Engineers Inc</article-title>
          .,
          <year>2020</year>
          , pp.
          <fpage>132</fpage>
          -
          <lpage>135</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ICACCS48705.
          <year>2020</year>
          .
          <volume>9074180</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kiran</surname>
          </string-name>
          , U. Patil,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Shankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ghuli</surname>
          </string-name>
          ,
          <article-title>Subtitle Generation and Video Scene Indexing using Recurrent Neural Networks</article-title>
          ,
          <source>in: Proceedings of the 3rd International Conference on Inventive Research in Computing Applications, ICIRCA</source>
          <year>2021</year>
          ,
          <article-title>Institute of Electrical and Electronics Engineers Inc</article-title>
          .,
          <year>2021</year>
          , pp.
          <fpage>847</fpage>
          -
          <lpage>854</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICIRCA51532.
          <year>2021</year>
          .
          <volume>9544837</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Che</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meinel</surname>
          </string-name>
          ,
          <source>Automatic Lecture Subtitle Generation and How</source>
          It Helps, in: R. Huang,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vasiu</surname>
          </string-name>
          , Kinshuk,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Sampson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.-S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          Chang (Eds.),
          <source>Proceedings - IEEE 17th International Conference on Advanced Learning Technologies, ICALT</source>
          <year>2017</year>
          ,
          <article-title>Institute of Electrical and Electronics Engineers Inc</article-title>
          .,
          <year>2017</year>
          , pp.
          <fpage>34</fpage>
          -
          <lpage>38</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICALT.
          <year>2017</year>
          .
          <volume>11</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sridhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Aravind</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Muneerulhudhakalvathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Sibi</given-names>
            <surname>Senthur</surname>
          </string-name>
          ,
          <article-title>A hybrid approach for Discourse Segment Detection in the automatic subtitle generation of computer science lecture videos</article-title>
          , in: D. E. Comer,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mallick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mukherjea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Thampi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krishnaswamy</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Sikora (Eds.),
          <source>Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics</source>
          ,
          <string-name>
            <surname>ICACCI</surname>
          </string-name>
          <year>2014</year>
          ,
          <article-title>Institute of Electrical and Electronics Engineers Inc</article-title>
          .,
          <year>2014</year>
          , pp.
          <fpage>284</fpage>
          -
          <lpage>287</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICACCI.
          <year>2014</year>
          .
          <volume>6968422</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Karakanta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Buet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cettolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yvon</surname>
          </string-name>
          ,
          <article-title>Evaluating Subtitle Segmentation for End-to-end Generation Systems</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bechet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Blache</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cieri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Goggi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Isahara</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mazo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>2022 Language Resources and Evaluation Conference</source>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2022</year>
          ,
          <article-title>European Language Resources Association (ELRA</article-title>
          ),
          <year>2022</year>
          , pp.
          <fpage>3069</fpage>
          -
          <lpage>3078</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .328/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Calvo-Ferrer</surname>
          </string-name>
          ,
          <article-title>Can you tell the diference? A study of human vs machine-translated subtitles</article-title>
          ,
          <source>Perspectives: Studies in Translation Theory and Practice</source>
          <volume>32</volume>
          (
          <year>2024</year>
          )
          <fpage>1115</fpage>
          -
          <lpage>1132</lpage>
          . doi:
          <volume>10</volume>
          .1080/ 0907676X.
          <year>2023</year>
          .
          <volume>2268149</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>D. D. Shanmugam</surname>
            ,
            <given-names>S. F.</given-names>
          </string-name>
          <string-name>
            <surname>Syed</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Dinesh</surname>
            , S. Chitrakala,
            <given-names>VAR:</given-names>
          </string-name>
          <article-title>An Eficient Silent Video to Speech System with Subtitle Generation using Visual Audio Recall</article-title>
          ,
          <source>in: Proceedings of the 5th International Conference on Inventive Research in Computing Applications, ICIRCA</source>
          <year>2023</year>
          ,
          <article-title>Institute of Electrical and Electronics Engineers Inc</article-title>
          .,
          <year>2023</year>
          , pp.
          <fpage>814</fpage>
          -
          <lpage>821</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICIRCA57980.
          <year>2023</year>
          .
          <volume>10220944</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Martín</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heras</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Mata, Automatic Generation of Subtitles for Videos of the Government of La Rioja</article-title>
          , in: B.
          <string-name>
            <surname>Dorronsoro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Chicano</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Danoy</surname>
          </string-name>
          , E.-G. Talbi (Eds.),
          <source>Optimization and Learning</source>
          , volume
          <volume>1824</volume>
          <source>of Communications in Computer and Information Science</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>402</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -34020-8_
          <fpage>30</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>J. D. Valor Miró</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestre-Cerdà</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Civera</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Turró</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Juan</surname>
          </string-name>
          ,
          <article-title>Eficient Generation of High-Quality Multilingual Subtitles for Video Lecture Repositories</article-title>
          , in: G. Conole,
          <string-name>
            <given-names>T.</given-names>
            <surname>Klobučar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rensing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Konert</surname>
          </string-name>
          , E. Lavoué (Eds.),
          <article-title>Design for Teaching and Learning in a Networked World</article-title>
          , volume
          <volume>9307</volume>
          , Springer International Publishing, Cham,
          <year>2015</year>
          , pp.
          <fpage>485</fpage>
          -
          <lpage>490</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>319</fpage>
          -24258-3_
          <fpage>44</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hotta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Leow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kitaoka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nishizaki</surname>
          </string-name>
          ,
          <article-title>Evaluation of Speech Translation Subtitles Generated by ASR with Unnecessary Word Detection</article-title>
          ,
          <source>in: GCCE 2024 - 2024 IEEE 13th Global Conference on Consumer Electronics</source>
          ,
          <article-title>Institute of Electrical and Electronics Engineers Inc</article-title>
          .,
          <year>2024</year>
          , pp.
          <fpage>815</fpage>
          -
          <lpage>819</lpage>
          . doi:
          <volume>10</volume>
          .1109/GCCE62371.
          <year>2024</year>
          .
          <volume>10760522</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaulage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Walunj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhandari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dighe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sagri</surname>
          </string-name>
          ,
          <article-title>Edu-lingo: A Unified NLP Video System with Comprehensive Multilingual Subtitles</article-title>
          ,
          <source>in: 2nd IEEE International Conference on Data Science and Information System</source>
          ,
          <string-name>
            <surname>ICDSIS</surname>
          </string-name>
          <year>2024</year>
          ,
          <article-title>Institute of Electrical and Electronics Engineers Inc</article-title>
          .,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1109/ICDSIS61070.
          <year>2024</year>
          .
          <volume>10594128</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>I. Al</given-names>
            <surname>Sawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Allam</surname>
          </string-name>
          ,
          <article-title>Exploring challenges in audiovisual translation: A comparative analysis of human- and AI-generated Arabic subtitles in Birdman</article-title>
          ,
          <source>PLoS ONE 19</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1371/ journal.pone.
          <volume>0311020</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kuroiwa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Oshima</surname>
          </string-name>
          , T. Koita,
          <article-title>Exploring a Hybrid System Combining AI and Human Intervention for Subtitle Creation in Entertainment Content</article-title>
          , in: N. C.
          <string-name>
            <surname>Callaos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Gaile-Sarkane</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Lace</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Sanchez</surname>
          </string-name>
          , M. Savoie (Eds.),
          <source>Proceedings of World Multi-Conference on Systemics, Cybernetics</source>
          and Informatics, WMSCI, volume 2024-September,
          <source>International Institute of Informatics and Cybernetics</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>72</fpage>
          -
          <lpage>73</lpage>
          . doi:
          <volume>10</volume>
          .54808/WMSCI2024.01.72.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>X.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <source>Study on Automatic Generation of Teaching Video Subtitles Based on Cloud Computing, Smart Innovation, Systems and Technologies</source>
          <volume>156</volume>
          (
          <year>2020</year>
          )
          <fpage>309</fpage>
          -
          <lpage>314</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-13-9714-1_
          <fpage>34</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>