<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bridging Cultures through AI: The Art of Multilingual Storytelling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shazia Mannan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Asha Hegde</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sharal Coelho</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Mangalore University</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>The use of Artificial Intelligence (AI) in visual storytelling has created a new way for cultural exchange, but there are still issues with producing consistent and culturally relevant graphics for multilingual stories. This paper introduces an AI-powered pipeline for multilingual story illustration, focusing on English and Hindi texts influenced by culturally rich sources such as the Panchatantra. Addressing limitations like fixed character sets that limit diversity and infinite characters that compromise consistency, we use publicly accessible datasets with appropriate attribution to produce coherent visuals. Our proposed methodology combines T5-large for abstractive summarization of story segments and Stable Difusion XL (SDXL) for generating cartoon-style illustrations with a fixed stylistic prompt to ensure narrative relevance and visual uniformity. Results demonstrate high narrative ifdelity and cultural appropriateness, evaluated through expert criteria on relevance, quality, and consistency. However, issues like summarization, generality, and model biases highlight areas for improvement. By bridging linguistic gaps, our study promotes inclusive storytelling and paves the way for AI systems that can adapt to diverse cultures within global storylines.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Stable Difusion XL</kwd>
        <kwd>Text-to-Image</kwd>
        <kwd>Visual Storytelling</kwd>
        <kwd>Artificial Intelligence</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The Artificial Intelligence (AI) evolved toward image recognition and generation, expanding capabilities
to visual data. The Large Language Models (LLMs) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Text-to-Image (T2I) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] systems have
demonstrated remarkable abilities in creating contextually rich textual and visual narratives. T2I
generation is basically the process of generating a visually realistic image that matches a given text
description [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. AI systems that can produce outstanding, contextually aware, and stylistically varied
images straight from verbal descriptions and automate visual storytelling, which previously depended
almost entirely on the talent and creativity of human illustrators [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Within this broader adaptation,
multilingual story illustration has become an important and socially significant domain. This paradigm
change presents important issues regarding cultural authenticity, linguistic diversity, visual consistency,
and many more, but it also has great potential to democratize creativity and speed up the creation of
illustrated information [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Stories in culturally varied places like India cover a remarkable variety of languages, customs, and
visual elements. Oral folktales, modern children’s books, and classical compilations like the Panchatantra
and Hitopadesha sometimes use culturally rooted symbols, idioms, and artistic norms in addition to
words to express meaning. However, existing AI illustration pipelines are largely trained on datasets with
a Western cultural bias, which results in illustrations that poorly reflect local aesthetics, environmental
settings, and symbolic representations. T2I generation has achieved exceptional quality and has been
extensively used for a variety of tasks due to the rapid development of difusion models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. From a
technical perspective, current T2I systems still struggle with fundamental bottlenecks in multilingual
storytelling, such as narrative comprehension across languages, character and visual consistency, and
cultural adaptability and inclusivity.
      </p>
      <p>To address these gaps, we worked on task MUSIA: Multilingual Story Illustration: Bridging Cultures
through AI Artistry. An AI-powered pipeline designed for end-to-end multilingual storytelling and
illustration is proposed in the work. Our system introduces three key innovations:
1. Narrative Understanding Module – A multilingual story-processing engine capable of
segmenting narratives written in English and Hindi into semantically coherent plot points and
generating concise, structured summaries optimized for illustration.
2. Character-Consistency Mechanism – A module that maintains visual identity across multiple
illustrations, ensuring coherence in character attributes, environment, and stylistic representation
throughout the narrative.
3. Cultural Evaluation Framework – An integrated evaluation methodology that combines
automated quality metrics with human expert decisions to consider cultural relevance, narrative
accuracy, and visual coherence. This helps in bridging the gap between technical performance
and real-world applicability.</p>
      <p>The goal of the MUSIA shared task is to enable culturally aligned, linguistically inclusive, and visually
coherent storytelling at scale. As a result, this work has significant applications in the fields of education
such as illustrated textbooks and children’s books, entertainment like digital comics, graphic novels,
and animated shorts. Beyond its technical contributions, this research highlights how AI can serve as a
medium for cultural preservation and exchange, bridging divides across languages, geographies, and
artistic traditions, and making literature more accessible to global audiences.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Visual storytelling involves generating a series of images based on narrative text prompts to create a
continuous and coherent story. Sudharsan et al. introduced the KAHANI pipeline, which focuses on
nonWestern cultural narratives [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Their framework uses GPT-4 Turbo and Stable Difusion XL. Maharana
and Bansal [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] proposed an approach to enhance story-to-image generation. By employing parse trees,
dense captioning, and dual learning frameworks, their method improves narrative segmentation, spatial
coherence, and visual consistency. This work provides valuable technical insights into the challenges of
maintaining coherence across sequential illustrations.
      </p>
      <p>
        From an educational and user-centered perspective, Han and Cai [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] examined the role of generative
AI in children’s visual storytelling. Their research incorporated feedback from parents, teachers,
and researchers and ultimately proposed a prototype system called AIStory for literacy development.
Their findings highlight the importance of usability and child-friendly design in visual storytelling
applications. Antony et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposed ID.8, a co-creative system for collaborative visual storytelling,
which emphasizes creativity, author control, and user experience in generating illustrative narratives.
Similarly, Kim et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] explored multimodal story-to-image frameworks with a particular focus on
character preservation and narrative consistency, addressing one of the key technical bottlenecks in
long-form illustrated storytelling.
      </p>
      <p>In summary, previous works have advanced the field through innovations in cultural grounding,
linguistic integration, and user-centered design. However, most existing systems remain monolingual or
culturally limited, with insuficient attention to multilingual narratives, script diversity, and sustained
character coherence across culturally rich storylines. Building upon these foundations, our work
contributes a multilingual, culturally adaptive pipeline that integrates narrative segmentation,
characterconsistency mechanisms, and human-centered evaluation to bridge cultural divides and enable inclusive
AI-driven storytelling.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Task</title>
      <p>
        This MUSIA shared task highlights the generation of visual illustrations for multilingual stories,
particularly in English and Hindi languages [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. A key challenge in this domain lies in character
representation: approaches that rely on a limited, predefined set of characters often fail to capture
the richness and diversity inherent in culturally rooted narratives such as the Panchatantra, whereas
allowing for an unrestricted number of characters can compromise visual consistency, weakening the
reader’s connection with the story.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>Our methodology adopts a hybrid AI pipeline that integrates Natural Language Processing (NLP)
for narrative understanding with difusion-based generative models for illustration. The pipeline is
designed to process stories in both English and Hindi, thereby ensuring multilingual compatibility. The
complete pipeline is organized into three main stages: dataset preparation, narrative summarization,
and illustration generation.</p>
      <sec id="sec-4-1">
        <title>4.1. Dataset Preparation</title>
        <p>The initial stage focuses on preparing the textual corpus for downstream processing. Each story is
provided as a full text file and preprocessed to remove extraneous whitespace. The cleaned text is then
segmented into sentences using period-based splitting to maintain logical narrative units. To ensure
balanced image generation across a given story, the text is partitioned into sub-stories. Specifically, the
sentences are divided into equal-sized chunks, based on the number of images specified for the story.
Each chunk, therefore, represents a key narrative segment corresponding to one illustration frame.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Narrative Summarization</title>
        <p>Once the text is segmented, each sub-story undergoes abstractive summarization to filter the narrative
into concise prompts suitable for image generation. We employed the T5-large1 model from the
Hugging Face Transformers library, a state-of-the-art model for abstractive summarization tasks.</p>
        <p>For each sub-story, the model is configured with the following parameters: max_length=50,
min_length=35, and do_sample=False. These constraints ensure summaries that are both concise
and semantically accurate, while remaining deterministic across runs. The summarization step captures
the essence of the plot, filtering out redundant information and highlighting the critical narrative events.
This refined representation serves as the semantic backbone for the subsequent illustration prompts,
enabling the difusion model to focus on the most relevant story elements.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Illustration Generation</title>
        <p>The last stage of the pipeline involves generating visual illustrations using the Stable Difusion XL (SDXL)
base model2 from Stability AI. To optimize performance, the model is loaded with 16-bit floating-point
precision and SafeTensors, thereby reducing memory usage and improving inference eficiency. The
pipeline execution environment is GPU-enabled (Kaggle platform), with CUDA acceleration prioritized
when available, to support the computational demands of large-scale difusion models.</p>
        <p>To maintain stylistic uniformity across all illustrations, a fixed prompt prefix is appended to each
summary. The prefix is concatenated with the corresponding narrative summary to form the final input
prompt for SDXL. Each prompt is then passed to the difusion pipeline ( pipe(prompt).images[0])
to generate a single image, which is subsequently stored in PNG format within language-specific
directories (e.g., generated_images_hindi for Hindi stories).</p>
        <p>Through this multi-stage design, our methodology ensures that story texts are transformed into
visually coherent and culturally adaptive illustrations, while preserving narrative flow and character
consistency across diferent languages and story genres.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and Results</title>
      <p>The dataset used for the multilingual story illustration task was divided into training and testing sets,
covering narratives in both English and Hindi. It was carefully curated from culturally rich sources,
particularly Panchatantra-inspired stories, and each entry consisted of textual narratives paired with
1https://huggingface.co/google-t5/t5-large
2https://huggingface.co/stabilityai/stable-difusion-xl-base-1.0
illustrations in the case of training data, or mapping files for the test data that indicated how many
images were required to be generated per story. The training set comprised 360 English stories and 185
Hindi stories, organized into a “Stories” folder containing the texts and an “Images” folder containing
the corresponding illustrations. The testing set included 40 English stories and 30 Hindi stories, with a
similar folder structure for narratives and an accompanying mapping file to guide the image generation
process.</p>
      <p>Since the field of multilingual story illustration currently lacks standardized automatic evaluation
metrics, the evaluation of results was performed entirely through human judgment. The evaluation
was carried out across three key dimensions: relevance, which measured how efectively the generated
illustrations captured the important moments of the story; visual quality, which reflected the artistic
and innovative aspects of the illustrations; and consistency, which examined whether the illustrations
maintained a coherent visual narrative when considered together.</p>
      <p>To ensure reliability in the evaluation, Cohen’s kappa was calculated to measure inter-annotator
agreement. Only those stories for which the agreement value exceeded 0.75, indicating a strong level of
consistency among the annotators, were considered in the analysis. As this threshold was achieved for
almost all the stories, the dataset used for evaluation remained comprehensive. The scores were further
categorized into three buckets: values between 4 and 5 were classified as good, a score of 3 was treated
as moderate, and scores between 1 and 2 were considered fair. For each metric, the number of stories
falling into these categories was counted, and performance was judged based on the distribution of
stories across the buckets. In this way, a higher number of stories evaluated as good indicated stronger
performance, followed by those rated as moderate and then fair.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>The proposed work demonstrated its efectiveness in generating illustrations for multilingual stories,
achieving strong alignment between narrative content and visual output. In the case of Hindi narratives,
the segmentation and summarization stages ensured that each frame captured critical plot moments, such
as character interactions and moral resolutions, while the fixed stylistic prompt preserved consistency
in character design and cultural aesthetics. The use of warm color palettes and hand-drawn textures
contributed to evoking traditional Indian artistic sensibilities. Similarly, the English narratives yielded
coherent results, underscoring the pipeline’s language-agnostic adaptability.</p>
      <p>Despite the promising outcomes, certain limitations remain. The T5 summarizer sometimes generated
outputs that were overly generic for complex narratives. Similarly, while SDXL proved robust in
producing visually compelling illustrations, issues of character consistency occasionally surfaced, especially
when depicting culturally specific elements such as traditional attire or symbolic artifacts. From a
practical standpoint, the high computational demands of difusion-based models restrict accessibility in
resource-constrained environments, limiting broader adoption.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This research proposed an AI-driven framework for multilingual story illustration, combining narrative
summarization with generative visual models to enhance the accessibility and cultural inclusiveness
of storytelling. By focusing on English and Hindi narratives, the study demonstrated how technology
can bridge linguistic and cultural gaps, making stories more engaging and visually coherent across
diverse audiences. Human evaluation based on relevance, quality, and consistency further validated the
efectiveness of the approach.</p>
      <p>Despite these promising outcomes, several challenges remain. Character consistency, narrative
summarization precision, and mitigation of cultural or linguistic biases require further refinement.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>In preparing this work, the author(s) utilized Grok3 for grammar and spelling checks. Paraphrasing
was handled via QuillBot. With this tool, the author(s) reviewed and revised the content as required,
while assuming full responsibility for the publication’s integrity.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , et al.,
          <article-title>A survey on evaluation of large language models</article-title>
          ,
          <source>ACM transactions on intelligent systems and technology 15</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rombach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blattmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Esser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ommer</surname>
          </string-name>
          ,
          <article-title>High-resolution image synthesis with latent difusion models</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>10684</fpage>
          -
          <lpage>10695</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Saharia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Whang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghasemipour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Gontijo</given-names>
            <surname>Lopes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Karagol</given-names>
            <surname>Ayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          , et al.,
          <article-title>Photorealistic text-to-image difusion models with deep language understanding</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>36479</fpage>
          -
          <lpage>36494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <article-title>Mirrorgan: Learning text-to-image generation by redescription</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1505</fpage>
          -
          <lpage>1514</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lui</surname>
          </string-name>
          ,
          <article-title>Story illustration using generative adversarial networks (gans) (</article-title>
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Ai illustrator: Art illustration generation based on generative adversarial network, in: 2020 IEEE 5th International conference on image, vision and computing (ICIVC)</article-title>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>155</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dong</surname>
          </string-name>
          , I. Shaheen,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mallick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Bargal</surname>
          </string-name>
          , Vista:
          <article-title>Visual storytelling using multi-modal adapters for text-to-image difusion models</article-title>
          ,
          <source>arXiv preprint arXiv:2506.12198</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sudharsan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Budhiraja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Khullar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vashistha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Segal</surname>
          </string-name>
          , et al.,
          <article-title>Kahani: Culturally-nuanced visual storytelling pipeline for non-western cultures</article-title>
          , arXiv e-prints (
          <year>2024</year>
          ) arXiv-
          <fpage>2410</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Maharana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <article-title>Integrating visuospatial, linguistic and commonsense structure into story visualization</article-title>
          ,
          <source>arXiv preprint arXiv:2110.10834</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z</surname>
          </string-name>
          . Cai,
          <article-title>Design implications of generative ai systems for visual storytelling for young learners</article-title>
          ,
          <source>in: Proceedings of the 22nd annual ACM interaction design and children conference</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>470</fpage>
          -
          <lpage>474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Antony</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-M. Huang</surname>
          </string-name>
          , Id.
          <article-title>8: Co-creating visual stories with generative ai</article-title>
          ,
          <source>ACM Transactions on Interactive Intelligent Systems</source>
          <volume>14</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Heo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nang</surname>
          </string-name>
          ,
          <article-title>A multi-modal story generation framework with ai-driven storyline guidance</article-title>
          ,
          <source>Electronics</source>
          <volume>12</volume>
          (
          <year>2023</year>
          )
          <fpage>1289</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Tewari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Malviya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Overview of the Shared Task on Multilingual Story Illustration: Bridging Cultures through AI Artistry, in: Proceedings of the 17th Annual Meeting of the Forum for Information Retrieval Evaluation</article-title>
          , FIRE '25,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K.</given-names>
            <surname>Tewari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Malviya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Findings of the Shared Task on Multilingual Story Illustration: Bridging Cultures through AI Artistry, in: Proceedings of the 17th Annual Meeting of the Forum for Information Retrieval Evaluation</article-title>
          , FIRE '25, CEUR Working Notes,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>