<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UC-UCO-Plenitas Team - Exploring in the PRESTA 2025 challenge: Question Answering over Tabular Data in Spanish</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yoan Martínez-López</string-name>
          <email>yoan.martinez@plenitas.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mayte Guerra Saborit</string-name>
          <email>mayte.guerra@reduc.edu.cu</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ansel Rodríguez-Gónzalez</string-name>
          <email>ansel@cicese.edu.mx</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Madera</string-name>
          <email>julio.madera@reduc.edu.cu</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ana Orquidea López Correoso</string-name>
          <email>analopezcorreoso@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos de Castro Lozano</string-name>
          <email>carlosdecastrolozano@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose Miguel Ramírez Uceda</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Carlos Arévalo Fernández</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CICESE-UT3</institution>
          ,
          <addr-line>Nayarit</addr-line>
          ,
          <country country="MX">México</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IPVCE "Máximo Gómez Báez"</institution>
          ,
          <addr-line>Circunvalación Norte, Camaguey</addr-line>
          ,
          <country country="CU">Cuba</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Plénitas</institution>
          ,
          <addr-line>C/ Le Corbusier s/n, 14005 Córdoba</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universidad de Camaguey</institution>
          ,
          <addr-line>Circunvalación Norte, Camino Viejo Km 5 y 1/2, Camaguey</addr-line>
          ,
          <country country="CU">Cuba</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Universidad de Córdoba</institution>
          ,
          <addr-line>Córdoba</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The UC-UCO-Plenitas team participated in the PRESTA 2025 challenge, focused on question answering over tabular data in Spanish using the DataBenchSPA benchmark. This benchmark is the first of its kind in the Spanish language and includes real-world tables with diverse data types, designed to evaluate system capabilities in answering natural language questions. The team implemented a solution leveraging GPT-4o, a multimodal large language model developed by OpenAI, known for its real-time, multi-input processing capabilities. GPT-4o was used to handle text-based question answering tasks, with the final system developed using under 150 lines of code, integrating the evaluation functions provided by the organizers. Among 23 competing teams, the UC-UCOPlenitas team secured 7th place, achieving 66.0% accuracy, showcasing the model's potential and competitive performance against other state-of-the-art approaches. While not reaching the top three, the team's results highlight opportunities for further performance optimization through better prompt design and fine-tuning. The paper also provides insights into deep learning architectures, particularly transformers, and emphasizes the role of large language models (LLMs) in advancing natural language understanding over structured datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Question Answering</kwd>
        <kwd>DataBenchSPA</kwd>
        <kwd>Spanish Language Benchmark</kwd>
        <kwd>GPT-4o</kwd>
        <kwd>Large Language Models (LLMs)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Lexical complexity relates to complexity of words. Its assessment can be beneficial in a number of
ifelds, ranging from education to communication. For instance, lexical complexity studies can assist in
providing language learners with learning materials suitable for their proficiency level or aid in text
simplification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These studies are also a central part of reading comprehension, as lexical complexity
can predict which words might be dificult to understand and could hinder the readability of the text.
Lexical complexity studies typically make use of Natural Language Processing and Machine Learning
methods [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Previous similar studies focus on ComplexWord Identification (CWI), which is a process
of identifying complex words in a text [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In this case, lexical complexity is assumed to be binary
words are either complex or not. LCP Shared Task 2021 addresses this limitation by introducing a new
dataset designed for continuous rather than binary complexity prediction [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].In this paper, we present
our participation in the PRESTA 2025: Question Answering over Tabular Data in Spanish, describing
our methodological approach, model choices, and results across the diferent subtasks.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>
        Dataset The IberLEF 2025 shared task is centered on Question Answering over Tabular Data, making
use of the newly developed DataBenchSPA benchmark [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. DataBenchSPA represents the first
Spanishlanguage benchmark that features real-world tabular datasets with a substantial number of rows and
columns [
        <xref ref-type="bibr" rid="ref3 ref4">4, 3</xref>
        ]. This benchmark includes a wide variety of data types, facilitating the evaluation of
diferent question formats that are specific to each type of data. The task encourages participants to
develop systems capable of answering questions based on daily-use datasets included in DataBenchSPA.
The expected answers may be numerical values, categorical values, boolean outputs, or lists comprising
elements of various types. While DataBenchSPA serves as the training and validation dataset, a separate
test set is released specifically for the competition phase. Each system receives a set of (dataset, question)
pairs and is required to return an answer that is then compared against a predefined gold standard.
Participants are allowed to use any method of their choice to compute the answers. To facilitate
participation, the organizers provide a Python library that enables a straightforward submission process
using fewer than 150 lines of code. This library also includes the oficial evaluation function used during
the competition, allowing teams to evaluate their systems locally on the development set [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Deep Learning Deep learning is a type of machine learning that uses artificial neural networks
with many layers (hence "deep") to model complex patterns in data. It’s especially good at handling
unstructured data like images, text, and audio. Deep learning enables computers to learn from data
much like the human brain does. Instead of being programmed with specific rules, a deep learning
system figures out the rules on its own by training on large datasets[
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
      </p>
      <p>
        Transformers Transformers are a type of deep learning architecture introduced by [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. They’re
built around a self-attention mechanism that lets the model weigh the importance of diferent tokens
(words or subwords) in an input sequence when generating representations or predictions. Unlike
recurrent or convolutional networks, Transformers process all tokens in parallel, which makes them
highly eficient on modern hardware and capable of modeling long -range dependencies. Computes
attention scores between every pair of positions in the input, producing a weighted sum of value
vectors that captures contextual relationships. Runs several attention “heads” in parallel, letting the
model attend to diferent types of relationships simultaneously. After attention, each position is passed
through a small fully connected network (the same one for every position) to mix features. Since
Transformers lack recurrence, they add sine/cosine or learned embeddings to each token to encode its
position in the sequence. Each sub-layer (attention or feed-forward) is wrapped with skip connections
and normalization for stable training. Transformers power many state-of-the-art models—BERT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
GPT, T5, etc.—and have become the go-to architecture for NLP, and increasingly for vision and audio
tasks, due to their scalability and strong performance.
      </p>
      <p>
        Large Language Models (LLMs) Large Language Models are deep learning models trained on
massive amounts of text data to understand, generate, and manipulate human language. These models
use neural networks—typically transformer architectures—to learn patterns in text and perform tasks
like Text generation, Translation, Summarization, Sentiment analysis, Question answering and Code
generation. A Large Language Model is a type of artificial intelligence (AI) trained to understand,
generate, and interact using human language. These models are built using deep learning techniques
(usually transformers) and are trained on vast amounts of text data — including books, websites, and
documents — to learn grammar, facts, reasoning, and context[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Examples of well-known LLMs include
GPT (OpenAI) [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], LLaMA (Meta)[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], Mistral (Mistral AI)[
        <xref ref-type="bibr" rid="ref15 ref16 ref17">15, 16, 17</xref>
        ], Claude (Anthropic)[
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ],
Qwen (Alibaba)[
        <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
        ], DeepSeek(Deepseek AI)[
        <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
        ] and Gemma (Google DeepMind)[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. They are
trained on trillions of words (from books, websites, etc.) using self-supervised learning. Most use the
Transformer architecture, enabling them to understand context and relationships in language. Text is
broken into "tokens" (words or subwords), and the model predicts the next token.
      </p>
      <p>
        GPT4o GPT is a type of Large Language Model (LLM) developed by OpenAI [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]. It’s designed
to understand and generate natural language text, making it capable of performing a wide range of
language-based tasks such as writing, summarizing, translating, and answering questions. GPT in
Simple Terms: 1) Generative: It can create new text based on a prompt. 2) Pre-trained: It’s trained on
a massive amount of text before being fine-tuned for specific tasks and 3)Transformer: It’s built on a
powerful deep learning architecture called the transformer, which allows it to understand context in long
passages of text. Also, GPT-4o (pronounced “GPT-4 omni”) is the latest multimodal model developed by
OpenAI, released in May 2024 [
        <xref ref-type="bibr" rid="ref13 ref25">25, 13</xref>
        ]. It represents a major upgrade to the GPT-4 family. GPT-4o is
natively multimodal [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], meaning it can handle: Text, Images, Audio, Video (limited interactions) and
Real-time speech input/output. Unlike previous versions that used separate systems (like Whisper for
audio and DALL·E for images)[
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], GPT-4o processes all these inputs in a unified model. They abilities
are:
• Hold natural conversations with tone and emotion.
• See and describe images, including charts and diagrams.
• Listen and respond to live speech.
• Translate, transcribe, summarize, and interpret audio in real-time.
      </p>
      <p>• Solve math problems from handwritten notes or photos.</p>
      <p>Setup and functionality The setup and functionality of an automated system designed to answer
natural language questions about tabular datasets, in the context of the PRESTA 2025 (IberLEF)
competition. The system uses OpenAI’s GPT-4o language model to generate Python code that directly answers
each question by manipulating the data using the pandas and numpy libraries. Its workflow is structured
into four main stages: prompt generation, model query, code execution, and answer evaluation.</p>
      <p>To get started, Python 3.8 or higher is required, along with the installation of key dependencies:
pandas, numpy, openai, nest_asyncio, and the databench_eval library, the latter provided as part of the
oficial evaluation setup. Once the environment is ready, a valid OpenAI API key must be configured.
Although the base code defines it as a constant, best practice is to store the key in an environment variable
using export OPEN_AI_KEY="sk-..." and access it within the script using os.getenv("OPEN_AI_KEY").
It is also recommended to download the datasets in Parquet format and store them under the path
/̇databenchSPAdatasetall.parquet, although HuggingFace datasets can also be used via the "hf:// prefix".</p>
      <p>The system loads the development question set (qa_dev) using the load_qa function. For each question,
it dynamically constructs a prompt that instructs the model to generate a single line of code within a
predefined answer(df) function. This line should directly return the answer to the question using only
the provided DataFrame. For example, for the question "What is the average number of bedrooms?" on
the airbnb dataset, the system generates a prompt like:</p>
      <p>"You are a pandas code generator. Your goal is to complete the function provided... def answer(df:
pd.DataFrame): return".</p>
      <p>The model might respond with a line such as df[’bedrooms’].mean(), which the system executes
dynamically using exec() to compute the final answer. This technique allows the automatic evaluation
of the model’s reasoning capabilities on tabular data. The generated results are saved in a file named
predictions.txt, which can then be submitted as an oficial run. Additionally, performance is measured
by comparing the generated responses to ground truth labels using the Evaluator module. This baseline
system achieves approximately 49% accuracy on the development set.</p>
      <p>An optional utility function, column_generator(), is also included. It generates prompts to filter and
select only the most relevant columns needed to answer each question. This is useful for reducing
input size when dealing with models that have context limitations. Despite its efectiveness, this system
carries potential security risks due to the use of exec() for code execution, and should only be used in
controlled environments.</p>
      <p>Evaluation An automated evaluation function is currently provided to handle most of the
assessment process. When a participant uploads a submission, the default evaluation function from the
databench_eval package is executed, comparing the submission against the ground truth set. This
function has been modified to be less strict than the one used in the initial experiment. The adjustment
accommodates slight variations in formatting, allowing smaller models to avoid penalties for minor
errors. Given the heuristic nature of the evaluation, the characteristics of the models being used, and
the open-ended nature of the task, the organizers will manually review the top-scoring submissions
before selecting a winner.</p>
      <p>Types of Answers Expected
According to the expected answer types:
• Boolean: Valid answers include True/False, Y/N, Yes/No (all case insensitive).
• Category: A value from a cell (or a substring of a cell) in the dataset.
• Number: A numerical value from a cell in the dataset, which may represent a computed statistic
(e.g., average, maximum, minimum).
• List[category]: A list containing a fixed number of categories. The expected format is: "[’cat’,
’dog’]". Pay attention to the wording of the question to determine if uniqueness is required or if
repeated values are allowed.</p>
      <p>• List[number]: Similar to List[category], but with numbers as its elements.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>Competition phase In the competition, our team employed GPT-4o as the primary method for solving
the tasks, achieving the following results: In the competition phase, out of 23 total participants, the
UCUCO-Plenitas team secured 7th place with an accuracy of 66.0% . This result reflects a solid performance
considering the number of competitors, although there is clear room for improvement to reach the
top positions. The top three teams — itunlp , sonrobok4 , and hcerezo — achieved significantly higher
accuracies (85–87%), indicating a high level of performance in the task. Teams ranked from 4th to 6th
(e.g., LyS Group , quang3010 , and ScottyPoseidon ) also outperformed UC-UCO-Plenitas by a noticeable
margin. Despite not reaching the podium, UC-UCO-Plenitas demonstrated competitive capability,
especially when compared to other mid-to-lower-ranked teams. With further refinement, particularly
in precision and fine-tuning of classification strategies, the model could potentially move up in future
rankings. See table 1.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>The participation of the UC-UCO-Plenitas team in the PRESTA 2025 competition demonstrated the
growing applicability of Large Language Models, particularly GPT-4o, in the domain of question
answering over tabular data. Achieving a 66.0% accuracy rate, the team’s performance places them in
the top third of participating systems and highlights the feasibility of using GPT-4o in data-intensive
Spanish-language NLP tasks. The results validate the use of minimal coding approaches with powerful
pre-trained models and point toward the potential of improved performance through enhanced prompt
engineering and fine-tuning strategies. The competition served as a valuable benchmark for evaluating
the capabilities of modern LLMs in multilingual and structured data contexts. Future eforts will focus on
improving system generalization and interpretability to reach the performance levels of the top-ranking
teams.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4 and Grammarly in order to: Grammar
and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-6">
      <title>A. Online Resources</title>
      <p>The results are available via
• PRESTA Codabench,
• PRESTA Dataset.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Siddharthan</surname>
          </string-name>
          ,
          <article-title>A survey of research on text simplification. itl-international journal of applied linguistics. special issue on readability and text simplification</article-title>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Paetzold</surname>
          </string-name>
          , L. Specia,
          <article-title>Inferring psycholinguistic properties of words, in: Proceedings of the 2016 conference of the north american chapter of the association for computational linguistics: Human language technologies</article-title>
          ,
          <year>2016</year>
          , pp.
          <fpage>435</fpage>
          -
          <lpage>440</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shardlow</surname>
          </string-name>
          ,
          <article-title>A comparison of techniques to automatically identify complex words., in: 51st annual meeting of the association for computational linguistics</article-title>
          <source>proceedings of the student research workshop</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shardlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Paetzold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Semeval
          <article-title>-2021 task 1: Lexical complexity prediction</article-title>
          ,
          <source>arXiv preprint arXiv:2106.00473</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Osés-Grijalba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Ureña-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Cámara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          , Overview of PRESTA at IberLEF 2025:
          <article-title>Question Answering Over Tabular Data In Spanish, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , Y. Bengio, G. Hinton,
          <article-title>Deep learning</article-title>
          , nature
          <volume>521</volume>
          (
          <year>2015</year>
          )
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Rusk</surname>
          </string-name>
          ,
          <article-title>Deep learning</article-title>
          ,
          <source>Nature Methods</source>
          <volume>13</volume>
          (
          <year>2016</year>
          )
          <fpage>35</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers</article-title>
          ),
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Johnsen</surname>
          </string-name>
          ,
          <article-title>Large language models (LLMs)</article-title>
          ,
          <source>Maria Johnsen</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A commentary of gpt-3 in mit technology review 2021</article-title>
          ,
          <source>Fundamental Research</source>
          <volume>1</volume>
          (
          <year>2021</year>
          )
          <fpage>831</fpage>
          -
          <lpage>833</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K. I.</given-names>
            <surname>Roumeliotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Tselikas</surname>
          </string-name>
          ,
          <article-title>Chatgpt and open-ai models: A preliminary review</article-title>
          ,
          <source>Future Internet</source>
          <volume>15</volume>
          (
          <year>2023</year>
          )
          <fpage>192</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>O.</given-names>
            <surname>Analytica</surname>
          </string-name>
          ,
          <article-title>Meta llama leak raises risk of ai-linked harms</article-title>
          ,
          <source>Emerald Expert Briefings</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>O.</given-names>
            <surname>Aydin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Karaarslan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. S.</given-names>
            <surname>Erenay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bacanin</surname>
          </string-name>
          ,
          <article-title>Generative ai in academic writing: A comparison of deepseek, qwen, chatgpt, gemini, llama, mistral, and gemma</article-title>
          ,
          <source>arXiv preprint arXiv:2503.04765</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hamzah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sulaiman</surname>
          </string-name>
          ,
          <article-title>Multimodal integration in large language models: A case study with mistral llm (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mistral</surname>
          </string-name>
          , Mixtral of experts,
          <source>Fecha de Publicación</source>
          <volume>11</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Priyanshu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Maurya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <article-title>Ai governance and accountability: An analysis of anthropic's claude</article-title>
          ,
          <source>arXiv preprint arXiv:2407.01557</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Adetayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Aborisade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Sanni</surname>
          </string-name>
          ,
          <article-title>Microsoft copilot and anthropic claude ai in education and library service</article-title>
          ,
          <source>Library Hi Tech News</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ge</surname>
          </string-name>
          , Y. Han,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          , et al.,
          <source>Qwen technical report, arXiv preprint arXiv:2309.16609</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cloud</surname>
          </string-name>
          , Qwen
          <volume>2</volume>
          .5 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ruan</surname>
          </string-name>
          , et al.,
          <source>Deepseek-v3 technical report, arXiv preprint arXiv:2412.19437</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiong</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Q.-L. Han,
          <string-name>
            <surname>Y</surname>
          </string-name>
          . Tang, Deepseek:
          <article-title>Paradigm shifts and technical evolution in large ai models</article-title>
          ,
          <source>IEEE/CAA Journal of Automatica Sinica</source>
          <volume>12</volume>
          (
          <year>2025</year>
          )
          <fpage>841</fpage>
          -
          <lpage>858</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>G.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mesnard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hardin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dadashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhupatiraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rivière</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Kale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Love</surname>
          </string-name>
          , et al.,
          <source>Gemma: Open models based on gemini research and technology, arXiv preprint arXiv:2403.08295</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K. I.</given-names>
            <surname>Roumeliotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Tselikas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Nasiopoulos</surname>
          </string-name>
          ,
          <article-title>Leveraging large language models in tourism: A comparative study of the latest gpt omni models and bert nlp for customer review classification and sentiment analysis</article-title>
          ,
          <source>Information</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>792</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Athiwaratkun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Zou,
          <article-title>Mixture-of-agents enhances large language model capabilities</article-title>
          ,
          <source>arXiv preprint arXiv:2406.04692</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>F. B.</given-names>
            <surname>Kern</surname>
          </string-name>
          , C.-
          <string-name>
            <surname>T. Wu</surname>
            ,
            <given-names>Z. C.</given-names>
          </string-name>
          <string-name>
            <surname>Chao</surname>
          </string-name>
          ,
          <article-title>Assessing novelty, feasibility and value of creative ideas with an unsupervised approach using gpt-4</article-title>
          ,
          <source>British Journal of Psychology</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>