<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Separating Linguistic Competence from Factual Knowledge in Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jaime Collado-Montañez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science (University of Jaén)</institution>
          ,
          <addr-line>Campus Las Lagunillas, s/n, Jaén, 23071</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Deep neural networks have significantly advanced natural language processing techniques, enabling the development of large language models that exhibit impressive capabilities in language understanding and generation. However, these models often internalize vast amounts of factual knowledge, which can lead to issues such as hallucinations and the use of outdated information. This research explores the hypothesis that linguistic competence, the ability to understand and produce natural language, can be codified separately from memorized factual knowledge in neural networks. By developing “fundamental language models” that focus on language understanding and reasoning without internalizing factual data, we aim to create smaller, more eficient models that access up-to-date factual knowledge through external sources using techniques like Retrieval Augmented Generation. Our main objective is to understand the functioning of Large Language Models as reasoning engines, with a special focus on language models for Spanish.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Model</kwd>
        <kwd>Fundamental Language Model</kwd>
        <kwd>Hallucination</kwd>
        <kwd>Retrieval Augmented Generation</kwd>
        <kwd>Explainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Deep neural networks have transformed the landscape of natural language processing techniques,
enabling the development of Large Language Models (LLMs) through training on massive text
collections using neural networks based on the transformer architecture [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This has led to the creation of
autoregressive systems for text generation with a high capacity for understanding human language.
Thus, LLMs have become the core of an increasing number of artificial intelligence tools, with
notable examples such as GPT3 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and LLaMA-2 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which are specially trained for natural language
conversation.
      </p>
      <p>However, alongside their remarkable abilities, LLMs also face significant challenges. One prominent
issue is hallucination, which can lead to the propagation of misinformation or the generation of content
that appears valid but lacks authenticity.</p>
      <p>The objective of this research is to explore architectures for solving artificial intelligence (AI) tasks
where LLMs are the central reasoning engine, enhancing their capabilities with external tools, such
as other knowledge bases. This architecture we call Fundamental Language Model (FLM) could be
achieved by removing parts of the neural network storing factual knowledge while preserving the
ones related to reasoning and language understanding. Other options to obtain such a model could be
pretraining with large datasets curated from every possible factual information.</p>
      <p>The remainder of this work is organized as follows: Section 2 presents an overview of the relevant
literature concerning LLM emergent abilities and problems; Section 3 shows the main hypothesis and
objectives planned for this research. Finally, Section 4 details the methodology followed during the
development of this thesis, and Section 5 concludes with some specific research elements proposed for
discussion.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>
        Transformer models are language models pretrained to understand language structures by using
semisupervised learning with huge amounts of data. Encoder transformers such as BERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or RoBERTa [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
use Masked Language Modeling (MLM) mainly while decoder or generative transformers like LLaMa [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
Mistral [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and GPT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are trained using Causal Language Modeling (CLM).
      </p>
      <p>
        According to the following general definition of emergence, as stated by the Nobel prize-winning
physicist Philip Anderson [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]: “Emergence is when quantitative changes in a system result in qualitative
changes in behavior”, the rapidly growing size of such models, especially the generative ones, into
what we call LLMs is allowing them to showcase new emergent abilities such as reasoning [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Along
with these properties, this semi-supervised pretraining technique allows LLMs to memorize lots of
factual data [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] that, in some cases, may lead to problems such as hallucinations [12] and outdated
answers when training data does not include the latest events and news. Hallucinations in LLMs refer to
instances where the model generates responses that are not factual or grounded in reality but rather are
inferred from patterns in the training data. These hallucinations can occur when the model synthesizes
information based on statistical correlations in the data rather than true understanding [13].
      </p>
      <p>In addition to that, the use of large corpora of texts from various sources in the generation of
pre-trained models results in the model capturing stereotypical patterns present in the texts. This
issue, known as bias detection, is related to explainability but focuses on the detection, evaluation, and
mitigation of gender, profession, origin, ethnicity, or religion stereotypes present in trained models [14].
The problem has become a topic of interest beyond the field of AI algorithm research and is known as
fairness [15] due to its ethical and legal implications.</p>
      <p>Additionally, although they seem powerful in terms of results and predictions, large language models
have their own limitations. The most significant is opacity or lack of transparency [ 16]. This means
that the logic and internal functioning of these models are hidden from the user, which is a serious
disadvantage because it prevents a human, whether expert or not, from verifying, interpreting, and
understanding the system’s reasoning and how decisions are made. In other words, any suficiently
complex system acts as a black box when it is easier to experiment with than to understand [17].</p>
      <p>The study of "foundational" language models can help address both bias issues and contribute to
explainability by focusing on the core competencies of natural language understanding and separating
knowledge from language-based reasoning. This hypothesis would be applicable to the project’s
challenges: the analysis of harmful and beneficial content, both in its detection, characterization, and
generation.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Hypothesis and Objectives</title>
      <p>The hypothesis behind this line of research is that the ability to understand and produce natural
language, i. e. linguistic competence, can be codified separately from memorized factual knowledge in
neural networks.</p>
      <p>This hypothesis would imply that we can build FLMs that do not store facts but retain language
understanding and reasoning capabilities. Thus, we could train smaller models that access factual
knowledge through external sources and techniques such as Retrieval Augmented Generation (RAG),
potentially removing all sources of hallucination and outdated information.</p>
      <p>With this purpose, the main objective of this thesis is to understand the functioning of LLMs as
reasoning engines, with a special focus on language models for Spanish. In order to achieve this main
objective, the following secondary objectives have been defined:
1. Study of the internal encoding of knowledge for language understanding.
2. Decomposition of the language model’s capabilities into diferent skills.
3. Enhancement of LLM-based AI capabilities through the use of external knowledge bases.
4. Improvement of explainability by decomposing the AI task resolution process.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>The following methodology aims to systematically explore the hypothesis that linguistic competence
can be separated from factual knowledge in LLMs, focusing on Spanish language models:
• Literature review and initial study: Conduct a comprehensive literature review on the current
techniques and advancements in LLMs, focusing on Spanish language models and identify key
resources, including scientific forums (AAAI, NeurIPS, ACL) and reference bulletins
(PapersWithCode, The Batch).
• Experimental design and evaluation: Define experimental setups for each objective, including
initial hypotheses, required datasets, and evaluation metrics. Participate in evaluation forums
(CLEF, SemEval, IberLEF) to benchmark against other models and solutions.
• Study of internal encoding of knowledge: Perform a series of experiments to analyze the internal
representations of LLMs, utilizing techniques such as layer-wise analysis and compare these
internal representations across diferent models to identify common patterns and structures.
• Decomposition of language model capabilities: Design tasks and benchmarks to isolate and
evaluate individual skills and use controlled experiments to test the models’ performance on
these tasks.
• Enhancement through external knowledge bases: Develop methods to connect LLMs with external
databases and knowledge bases using techniques like RAG and conduct experiments to compare
the performance of enhanced models against traditional models.
• Dissemination of findings: Prepare and submit research papers to high-impact journals and
conferences.</p>
      <p>This outline is in its initial stages, focusing on setting up foundational explorations and experiments
to investigate the separation of linguistic competence and factual knowledge in language models.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Research elements proposed for discussion</title>
      <p>The following specific research elements provide a comprehensive framework for discussing the
hypothesis and objectives, facilitating a deeper exploration of the potential benefits, limitations, and
implications of using fundamental language models in AI systems:
• Does separating linguistic competence from factual knowledge reduce hallucinations
and improve the accuracy of generated information? By separating linguistic competence
from internalized factual knowledge, we aim to mitigate instances where models produce
inaccurate or speculative content based on outdated or erroneous information.
• How efective is RAG in supplementing FLMs with accurate and up-to-date factual
knowledge? RAG techniques enable FLMs to retrieve relevant information from external sources
during generation, potentially enhancing the models’ factual accuracy by accessing the latest and
most relevant data available.
• How does the separation of linguistic competence and factual knowledge impact the
explainability and transparency of FLMs? This inquiry explores whether separating linguistic
competence from factual knowledge enhances the model’s ability to explain its reasoning process
transparently, thereby improving trust and interpretability in AI-driven decision-making.
• To what extent can FLMs retain comprehensive language understanding and reasoning
capabilities without internal factual knowledge? Assessing whether FLMs, despite not
internalizing factual data, can maintain robust language understanding and reasoning capabilities
necessary for complex AI tasks.
This work has been funded by the scholarship (FPI-PRE2022-105603) from the Ministry of Science,
Innovation and Universities of the Spanish Government. I am grateful to my thesis supervisors Arturo
Montejo-Ráez and L. Alfonso Ureña-López for their guidance and help during the work done up to
now.
[12] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, T. Liu,
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open
questions, 2023. arXiv:2311.05232.
[13] W. Wang, B. Haddow, A. Birch, W. Peng, Assessing factual reliability of large language model
knowledge, in: K. Duh, H. Gomez, S. Bethard (Eds.), Proceedings of the 2024 Conference of the
North American Chapter of the Association for Computational Linguistics: Human Language
Technologies (Volume 1: Long Papers), Association for Computational Linguistics, Mexico City,
Mexico, 2024, pp. 805–819. URL: https://aclanthology.org/2024.naacl-long.46. doi:10.18653/v1/
2024.naacl-long.46.
[14] I. Garrido-Muñoz , A. Montejo-Ráez , F. Martínez-Santiago , L. A. Ureña-López , A survey on
bias in deep nlp, Applied Sciences 11 (2021). URL: https://www.mdpi.com/2076-3417/11/7/3184.
doi:10.3390/app11073184.
[15] P. Hacker, Teaching fairness to artificial intelligence: existing and novel strategies against
algorithmic discrimination under eu law, Common market law review 55 (2018).
[16] M. Du, N. Liu, X. Hu, Techniques for interpretable machine learning, Communications of the</p>
      <p>ACM 63 (2019) 68–77.
[17] D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, D. Sculley, Google vizier: A service for
black-box optimization, in: Proceedings of the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD ’17, Association for Computing Machinery, New
York, NY, USA, 2017, p. 1487–1495. URL: https://doi.org/10.1145/3097983.3098043. doi:10.1145/
3097983.3098043.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , L. u. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          , in: I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>30</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2017</year>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Albert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Almahairi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Babaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bashlykov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Batra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhargava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhosale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bikel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Blecher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Ferrer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cucurull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Esiobu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fernandes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fuller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hartshorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hosseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Inan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kardas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kerkez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khabsa</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kloumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Korenev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Koura</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lavril</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Liskovich</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Martinet</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Mihaylov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mishra</surname>
            , I. Molybog,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Poulton</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Reizenstein</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Rungta</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Saladi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Schelten</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>E. M.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>X. E.</given-names>
          </string-name>
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Taylor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>J. X.</given-names>
          </string-name>
          <string-name>
            <surname>Kuan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Yan</surname>
            , I. Zarov,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kambadur</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Narang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Stojnic</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Edunov</surname>
          </string-name>
          ,
          <source>T. Scialom, Llama</source>
          <volume>2</volume>
          :
          <article-title>Open foundation and fine-tuned chat models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2307</volume>
          .
          <fpage>09288</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave, G. Lample,
          <article-title>Llama: Open and eficient foundation language models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2302</volume>
          .
          <fpage>13971</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bamford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Chaplot</surname>
          </string-name>
          , D. de las Casas,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bressand</surname>
          </string-name>
          , G. Lengyel,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Saulnier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Lavaud</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Stock</surname>
            ,
            <given-names>T. L.</given-names>
          </string-name>
          <string-name>
            <surname>Scao</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lavril</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>W. E.</given-names>
          </string-name>
          <string-name>
            <surname>Sayed</surname>
          </string-name>
          , Mistral 7b,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>06825</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Improving language understanding by generative pre-training (</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <article-title>More is diferent: Broken symmetry and the nature of the hierarchical structure of science</article-title>
          .,
          <source>Science</source>
          <volume>177</volume>
          (
          <year>1972</year>
          )
          <fpage>393</fpage>
          -
          <lpage>396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgeaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yogatama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hashimoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fedus</surname>
          </string-name>
          ,
          <source>Emergent abilities of large language models</source>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2206</volume>
          .
          <fpage>07682</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Seo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-S.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Seo</surname>
          </string-name>
          ,
          <article-title>How do large language models acquire factual knowledge during pretraining?</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2406.11813. arXiv:
          <volume>2406</volume>
          .
          <fpage>11813</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>