<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RetaiLLM: A Multilingual, Optimised LLM-based Chatbot System for Retail Management</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedro José Vivancos-Vicente</string-name>
          <email>pedro.vivancos@vocali.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan Salvador Castejón-Garrido</string-name>
          <email>juans.castejon@vocali.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Camilo Caparrós-Laiz</string-name>
          <email>camilo.caparrosl@um.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ronghao Pan</string-name>
          <email>ronghao.pan@um.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Antonio García-Díaz</string-name>
          <email>joseantonio.garcia8@um.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Valencia-García</string-name>
          <email>valencia@um.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Informática y Sistemas, Universidad de Murcia, Campus de Espinardo</institution>
          ,
          <addr-line>30100 Murcia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>VÓCALI SISTEMAS INTELIGENTES S.L. Parque Científico de Murcia, Carretera de Madrid km 388. Complejo de Espinardo</institution>
          ,
          <addr-line>30100 Murcia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>RetaiLLM enhances chatbot solutions powered by Large Language Models adapted for small retail businesses. Although LLMs have impressive conversational capabilities, they struggle in retail scenarios due to specific challenges. These include hallucinations, the need for additional fine-tuning to handle domain-specific vocabulary and interactions, and the computational demands of real-time deployment. These limitations can result in misinformation, loss of customer trust and increased support costs, which is especially critical for small businesses with limited resources. Furthermore, many existing chatbot systems struggle with multilingual interactions and integration with popular communication platforms. To overcome these challenges, VÓCALI has developed RetaiLLM, a project that combines LLMs with contextual information from retail management companies and online sources. Using a combination of quantisation techniques and a Retrieval-Augmented Generation approach, RetaiLLM optimises LLMs by providing a hybrid search mechanism that combines semantic vector search with fuzzy text-based retrieval. This ensures precise, contextually relevant answers, reduces hallucinations and provides a correction mechanism.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models</kwd>
        <kwd>Quantization</kwd>
        <kwd>Hallucination</kwd>
        <kwd>Chatbot</kwd>
        <kwd>Retrieval-Augmented Generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the context of retail management, many small businesses lack the economic, human or technological
resources required to respond quickly and efectively to customer queries. Although Large Language
Models (LLMs) have impressive conversational capabilities, they are still not widely used in
realworld retail scenarios due to challenges such as hallucinations, the need for additional fine-tuning and
high computational requirements. Additionally, current chatbot systems often struggle to integrate
seamlessly with popular communication platforms, and they do not always provide a satisfactory
multilingual experience [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These limitations hinder the adoption of advanced conversational solutions
by small and medium-sized retailers.
      </p>
      <p>To address these challenges, we present RetaiLLM: a multilingual, LLM-based chatbot system
optimised for small retail businesses. The project’s primary goal is to develop an eficient, reliable, and
customisable solution that can easily integrate into the communication channels commonly used by
these businesses. This solution will provide accurate, contextually relevant answers without requiring
large-scale infrastructure. RetaiLLM incorporates various techniques, such as model quantisation,
Retrieval-Augmented Generation (RAG), hallucination detection, plugin-based customisation and
emotion recognition. These components are implemented within a flexible, multi-platform, multi-lingual
architecture equipped with a graphical interface and an administration dashboard for chatbot
configuration and monitoring.</p>
      <p>The project is divided into four main objectives: (OB1) development of a multilingual, reliable and
eficient LLM-based chatbot system for retail management; (OB2) design of input/output communication
interfaces; (OB3) implementation of a plugin system; and (OB4) development of an administration
dashboard for chatbot creation and monitoring.</p>
      <p>This project is a collaboration between VÓCALI, a company specialised in Natural Language
Processing (NLP) and speech technologies, and the TECNOMOD research group at the University of Murcia.
RetaiLLM is funded by the CDTI and the European Union (ERDF) under the project code IDI-20240115.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background information</title>
      <p>
        Recent advances in LLMs have significantly improved the performance of conversational systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Models such as ChatGPT, LLaMA [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or Gemma [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] demonstrate strong capabilities in understanding
and generating natural language. However, their adoption in real-world applications, especially in small
business environments such as retail, remains limited due to their high resource requirements, lack of
domain adaptation, and the risk of generating incorrect or fabricated content [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>To make these technologies more accessible, several optimisation techniques have been proposed [6].
One of the most important is quantization, which reduces the model size and memory consumption
by lowering numerical precision (e.g., to 8-bit or 4-bit formats). Combined with QLoRA (Quantized
Low-Rank Adaptation) [7], models can be fine-tuned with minimal computational cost, making them
viable for deployment on local servers or mid-range GPUs [8].</p>
      <p>Another key approach is RAG, which improves response accuracy by incorporating relevant external
knowledge during inference. By combining semantic vector search with traditional keyword-based
retrieval, RAG enables chatbots to provide contextual and factual responses without retraining the
model. In some cases, this process can be enhanced by using structured knowledge representations
such as ontologies [9, 10].</p>
      <p>One of the biggest disadvantages of LLMs is their tendency to generate hallucinated content [11, 12].
Instead of relying on larger models to verify responses, our approach focuses on detecting and correcting
factual errors by comparing the chatbot’s responses with retrieved documents, improving reliability in
customer-facing scenarios.</p>
    </sec>
    <sec id="sec-3">
      <title>3. System architecture</title>
      <p>In short, the system is a modular, multilingual chatbot platform designed to support small retail
businesses using state-of-the-art natural language processing techniques. It connects to popular messaging
services via customisable interfaces, processes user queries with optimised LLMs and enhances
responses using a RAG mechanism. To ensure fluency and factual accuracy, it incorporates modules for
hallucination detection and response regeneration. The system also includes an emotion recognition
module that can classify user input into diferent emotional states, providing valuable insight into
customer satisfaction. The system’s architecture also supports plug-ins that can be used to trigger
actions such as reservations or order tracking, and it provides a dashboard for chatbot configuration
and usage monitoring. The system’s overall structure is shown in Figure 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Eficient LLM-Based Chatbot System</title>
        <p>This module comprises two components: the I/O interfaces and the LLMs. To ensure seamless integration
with popular communication platforms such as Telegram, Facebook and email, the system includes a
RESTful API with specific endpoints for each channel. These endpoints convert incoming messages
into a consistent internal format. Using a webhook-based architecture, external services can forward
user messages to the system, which processes them asynchronously and returns responses adapted
Efficient and distiled chatbot system based on LLMs
I/O Interfaces LLMs</p>
        <p>Plugin system for reservations
and orders</p>
        <p>ChatbotAdministration
Module
Messaging</p>
        <p>LLM selection
internationalisation
and location</p>
        <p>Plugin manager and
integration with
third-party applications
Emails</p>
        <p>Halucination detection
Distilation an
optimisation</p>
        <p>Integration with
customer databases</p>
        <p>Dashboard</p>
        <p>Emotion detection
Creating and configuring
chatbots
Users</p>
        <p>Administrators</p>
        <p>Customer Information Systems
to each platform’s format. The LLM component, on the other hand, is responsible for generating
coherent, context-aware responses based on user input and retrieved information. This component is
divided into four main functional blocks: (1) model selection, where appropriate instruction-tuned LLMs
are selected based on performance and resource constraints; (2) internationalisation, which ensures
multilingual support through language detection and translation; (3) distillation and optimisation, which
reduce model size and improve inference speed through quantization and QLoRA; and (4) hallucination
detection and correction, which aims to identify and mitigate factual inconsistencies in the model’s
responses.</p>
        <p>Firstly, model selection identifies models that strike a balance between response quality and
computational eficiency, rendering them suitable for use in small business environments. To this end, we
evaluated several instruction-tuned LLMs that support multilingual input and are compatible with
quantisation techniques.</p>
        <p>The selected candidates include Gemma 2 (9B and 2B) and LLaMA (3.1 8B and 3.2 3B) models, which
are optimised to follow natural language instructions and adapt to diferent communication contexts.
In this case, an exhaustive study was carried out to select these models. This included analysing the
disk space requirements and performance of the models in diferent hardware configurations, as well as
evaluating their response time on a specific dataset. This allowed us to identify models that met the
technical requirements of the available hardware while maintaining acceptable eficiency in a production
scenario. These models are integrated into a conversation module that enables the system to retain
and reuse information from previous interactions with users. Prompting strategies have therefore been
evaluated and adapted for each communication channel. For example, longer and more formal messages
are generated for email, while shorter and more direct messages are generated for instant messaging.
To improve the relevance and factual accuracy of the answers generated, the RAG module conducts a
hybrid search of a document collection, combining semantic similarity via sentence embeddings, such as
Distiluse Base Multilingual, Paraphrase Multilingual, All MiniLM and Multi-QA, with a fuzzy keyword
search based on edit distance. The retrieval engine, implemented via Elasticsearch, supports both dense
vector and raw text indexing. User queries are encoded and compared using k-nearest neighbours with
cosine similarity to retrieve the most relevant passages. These are then ranked and incorporated into
the query, enabling the LLM to generate more accurate, contextualised responses.</p>
        <p>Secondly, to facilitate multilingual communication, the system incorporates an internationalisation
module that utilises automatic language identification and translation. Incoming messages are identified
and translated into a reference language for processing. Once the response has been generated, the
system ensures it is delivered in the user’s original language by carrying out a consistency check and, if
necessary, performing a final translation. This strategy guarantees clear and consistent communication,
regardless of the user’s language.</p>
        <p>Thirdly, in order to ensure that LLMs can be used eficiently in environments with limited resources,
we applied quantisation and QLoRA. However, these techniques have drawbacks, such as longer training
times and an increased risk of hallucinations due to quantisation noise. We tested the models using
two hardware setups: one based on an AMD EPYC MILAN 7313 CPU with an NVIDIA RTX 4090 GPU
and another using an Intel Xeon 6530 CPU with an NVIDIA L40S GPU. Quantising the models to
8-bit and 4-bit significantly reduced memory usage and increased generation speed, with the smallest
models showing the most significant gains. For instance, the LLaMA and Qwen variants experienced
notable speed increases at 4-bit quantisation, particularly on high-performance hardware. While larger
models benefited from reduced size, their speed improvements were more modest, likely due to their
greater architectural complexity. Hardware diferences also played a significant role: the NVIDIA L40S
GPU system consistently outperformed the RTX 4090 setup, particularly with larger models and under
heavier loads. These results emphasise the importance of balancing model size, quantisation level, and
hardware capabilities when selecting LLMs for the eficient deployment of chatbots.</p>
        <p>The process begins with intent classification, for which Transformer-based models are used that
have been trained on domain-specific YAML datasets. Each intent is then linked to a prompt and a set
of decoding parameters, such as temperature, top-k or top-p. These parameters reduce the likelihood
of hallucinated outputs. Intents such as price enquiries or business hours are treated with stricter
settings to prioritise factual accuracy. Once the LLM has generated a response, the system uses a
custom Named Entity Recognition (NER) module developed for this project to extract relevant entities
such as dates, locations, prices and phone numbers. These entities are then cross-checked against the
retrieved RAG context, which contains verified business information from the vector database. If any
discrepancies are found, the LLM regenerates the response using the corrected context and constraints
in an iterative process until all critical entities align with the factual data. While formal benchmarking
of the consistency module is ongoing, preliminary internal testing has shown promising results in terms
of detecting and correcting factual errors.</p>
        <p>Finally, a two-step detection and correction mechanism has been developed. First, a model trained
with transformer-based architectures analyses the user query to determine whether a high degree of
factual precision is required (e.g. when asking for prices or contact information). In such cases, the
system adjusts the decoding parameters of the LLM, reducing the temperature to 0.25, the top-k to
30, and the top-p to 0.7, in order to promote more accurate and deterministic responses. For standard
scenarios, more flexible values are employed (temperature 0.7, top-k 80 and top-p 1.0) to prioritise
lfuency. After generation, a named entity comparison is performed between the model output and the
retrieved context. If any are found, the response is regenerated iteratively until it matches the source
information. Finally, a consistency check is performed on the multilingual output to ensure that the
translations retain the factual content, and then the final response is delivered to the user.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Plugin System for Reservations and Orders</title>
        <p>To extend the system’s functionality, a plugin architecture was implemented that enables administrators
to activate or deactivate specific actions, such as booking reservations, retrieving order details or
managing surveys. To support these real-world interactions, the chatbot engine’s intent detection
model was enhanced to recognise a broader set of user intents linked to plugin-triggered actions. When
a relevant intent is detected, the system prompts the user for any missing information and interacts
with external services to complete the task. The architecture is fully extensible, enabling new plugins
to be easily integrated as future requirements arise.</p>
        <p>These plugins incorporate customer-specific information using database adapters that connect to
external client databases and import their contents into the vector database. This allows structured
business data, such as product catalogues or customer records, to be included in the context used when
generating responses. These connections are configurable, and administrators can add or remove them
based on the specific needs of each deployment.</p>
        <p>An example of this system can be seen in Figure 2, where the administrator can configure a user
authentication plugin. Through a dedicated settings panel, the administrator is able to select which
attributes should be requested from users when they need to log in via the chat interface. Available
options include name, phone number, national ID, email address, and date of birth.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Chatbot Administration Module</title>
        <p>The platform includes an administration dashboard ofering a centralised interface for managing chatbot
configurations and monitoring usage. Administrators can customise prompts, datasets, plugins and
communication channels, and access usage analytics such as conversation volume, reservation activity
and word cloud visualisations. Calendar-based views and detailed statistics support efective supervision
and ongoing system optimisation.</p>
        <p>To analyse user behaviour and detect satisfaction levels, a transformer-based text emotion
classification module is included. It has been trained on various corpora, including EmoContext [13], LiSSS [14],
EMOVO [15] and MELD [16] among other datasets [17]. The model estimates the emotional content of
each user message. These emotions are then integrated into the conversation logs and used to generate
aggregated user sentiment statistics for administrators.</p>
        <p>Finally, there is a graphical user interface (GUI) that enables administrators to eficiently configure and
deploy chatbot instances. Through this interface, users can customise plugin access, select input/output
channels (e.g. messaging apps, email or the web) and define the data sources for each chatbot. Once
configuration is complete, the system automatically generates the deployment settings required for
each selected interface, enabling tailored chatbot instances to be launched with minimal efort.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions and further work</title>
      <p>This project marks a significant milestone in the development of flexible, eficient, multilingual chatbot
systems designed to meet the needs of small businesses. RetaiLLM combines optimised LLMs with
a modular architecture that includes plug-in integration, emotion detection, hallucination control
and multilingual support, enabling seamless deployment across multiple communication channels.
Its extensibility and configuration interface make RetaiLLM suitable for a wide range of real-world
scenarios, reducing the technical barriers to adoption in sectors such as retail.</p>
      <p>In future iterations of the system, we plan to enhance the chatbot’s emotional intelligence by
incorporating multimodal emotion detection. This involves combining textual and prosodic features
from voice inputs [18, 19]. This will enable the system to better understand user intent and satisfaction,
particularly in voice-based interactions. Furthermore, we intend to enhance the customisation of LLM
prompts via user feedback loops, facilitating dynamic prompt adjustment based on actual usage. Other
developments will include extending plugin functionality to support more complex business workflows
and integrating external knowledge graphs to improve response generation in domain-specific scenarios.</p>
      <p>In future stages, we will conduct a quantitative evaluation of RetailLLM. To achieve this, we will
conduct thorough benchmarking of its multilingual capabilities and hallucination mitigation
strategies, and analyse the eficiency gains derived from using retrieval-augmented generation (RAG) and
quantisation techniques. We also plan to conduct comparative studies with existing LLM-based retail
assistants to validate our approach. To promote reproducibility and encourage adoption within the
research and development community, we also intend to publish implementation details, including
model configurations, dataset sizes and key hyperparameters.</p>
      <p>Future work will focus on exploring strategies to ensure compliance with data protection regulations,
such as the European GDPR. This is particularly important when RetaiLLM is integrated with private
customer databases or internal retail systems. This will involve adopting techniques for data
anonymisation and pseudonymisation, implementing access control mechanisms and establishing secure data
processing pipelines. We also intend to examine the implications of storing conversational histories,
implementing user consent protocols and ensuring transparency in automated decision-making. These
are all key aspects of the ethical deployment of LLM-based solutions in real-world retail environments.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is being funded by CDTI and the European Regional Development Fund (FEDER / ERDF)
through project RetaiLLM IDI-20240115.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used DeepL for grammatical and spelling correction.
After using this tool, the authors reviewed and edited the content as needed and takes full responsibility
for the publication’s content.
[6] L. Donisch, S. Schacht, C. Lanquillon, Inference optimizations for large language models:
Efects, challenges, and practical considerations, 2024. URL: https://arxiv.org/abs/2408.03130.
arXiv:2408.03130.
[7] T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettlemoyer, QLoRA: Eficient Finetuning of Quantized</p>
      <p>LLMs, 2023. URL: https://arxiv.org/abs/2305.14314. arXiv:2305.14314.
[8] A. Chavan, R. Magazine, S. Kushwaha, M. Debbah, D. Gupta, Faster and Lighter LLMs: A
Survey on Current Challenges and Way Forward, 2024. URL: https://arxiv.org/abs/2402.01799.
arXiv:2402.01799.
[9] J. M. Ruiz-Sánchez, R. Valencia-García, J. T. Fernández-Breis, R. Martıínez-Béjar, P. Compton,
An approach for incremental knowledge acquisition from text, Expert Systems with
Applications 25 (2003) 77–86. URL: https://www.sciencedirect.com/science/article/pii/S0957417403000083.
doi:https://doi.org/10.1016/S0957-4174(03)00008-3.
[10] J. A. García-Díaz, M. Cánovas-García, R. Valencia-García, Ontology-driven aspect-based sentiment
analysis classification: An infodemiological case study regarding infectious diseases in Latin
America, Future Generation Computer Systems 112 (2020) 641–657.
[11] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin,
T. Liu, A survey on hallucination in large language models: Principles, taxonomy, challenges,
and open questions, ACM Transactions on Information Systems 43 (2025) 1–55. URL: http:
//dx.doi.org/10.1145/3703155. doi:10.1145/3703155.
[12] Z. Xu, S. Jain, M. Kankanhalli, Hallucination is inevitable: An innate limitation of large language
models, 2025. URL: https://arxiv.org/abs/2401.11817. arXiv:2401.11817.
[13] A. Chatterjee, K. N. Narahari, M. Joshi, P. Agrawal, SemEval-2019 task 3: EmoContext contextual
emotion detection in text, in: J. May, E. Shutova, A. Herbelot, X. Zhu, M. Apidianaki, S. M.
Mohammad (Eds.), Proceedings of the 13th International Workshop on Semantic Evaluation,
Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019, pp. 39–48. URL:
https://aclanthology.org/S19-2005/. doi:10.18653/v1/S19-2005.
[14] J.-M. Torres-Moreno, L.-G. Moreno-Jiménez, LiSSS: A toy corpus of Spanish Literary Sentences for</p>
      <p>Emotions detection, 2020. URL: https://arxiv.org/abs/2005.08223. arXiv:2005.08223.
[15] G. Costantini, I. Iaderola, A. Paoloni, M. Todisco, EMOVO corpus: an Italian emotional speech
database, in: N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno,
J. Odijk, S. Piperidis (Eds.), Proceedings of the Ninth International Conference on Language
Resources and Evaluation (LREC‘14), European Language Resources Association (ELRA), Reykjavik,
Iceland, 2014, pp. 3501–3504. URL: https://aclanthology.org/L14-1478/.
[16] S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, R. Mihalcea, MELD: A Multimodal
Multi-Party Dataset for Emotion Recognition in Conversations, 2019. URL: https://arxiv.org/abs/
1810.02508. arXiv:1810.02508.
[17] A. Salmerón-Ríos, J. A. García-Díaz, R. Pan, R. Valencia-García, Fine grain emotion analysis in</p>
      <p>Spanish using linguistic features and transformers, PeerJ Computer Science 10 (2024) e1992.
[18] R. Pan, J. A. García-Díaz, M. Á. Rodríguez-García, R. Valencia-García, Spanish MEACorpus 2023:
A multimodal speech–text corpus for emotion analysis in Spanish from natural environments,
Computer Standards &amp; Interfaces 90 (2024) 103856.
[19] R. Pan, J. A. García-Díaz, M. Á. Rondríguez-García, F. García-Sánchez, R. Valencia-García, Overview
of EmoSPeech at IberLEF 2024: Multimodal Speech-text Emotion Recognition in Spanish,
Procesamiento del Lenguaje Natural 73 (2024) 359–368.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Nikhil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Yeligatla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Chaparala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. T.</given-names>
            <surname>Chalavadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>An Analysis on Conversational AI: The Multimodal Frontier in Chatbot System Advancements</article-title>
          , in: 2024 Second International Conference on Inventive Computing and
          <article-title>Informatics (ICICI)</article-title>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>383</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          , et al.,
          <source>A survey of large language models</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2303.18223. arXiv:
          <volume>2303</volume>
          .
          <fpage>18223</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grattafiori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jauhri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Dahle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Letman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mathur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schelten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaughan</surname>
          </string-name>
          , et al.,
          <source>The llama 3 herd of models</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2407. 21783. arXiv:
          <volume>2407</volume>
          .
          <fpage>21783</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riviere</surname>
          </string-name>
          , et al.,
          <source>Gemma</source>
          <volume>2</volume>
          :
          <article-title>Improving open language models at a practical size</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2408.00118. arXiv:
          <volume>2408</volume>
          .
          <fpage>00118</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hyland-Wood</surname>
          </string-name>
          ,
          <article-title>A primer on large language models and their limitations</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2412.04503. arXiv:
          <volume>2412</volume>
          .
          <fpage>04503</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>