<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LLMs on the Fly: Text-to-JSON for Custom API Calling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Miguel Escarda-Fernández</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iñigo López-Riobóo-Botana</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Santiago Barro-Tojeiro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lara Padrón-Cousillas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sonia Gonzalez-Vázquez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Carreiro-Alonso</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pablo Gómez-Area</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>You have a brief description of the FlyThings</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the rapidly evolving landscape of Natural Language Processing (NLP), there is a growing demand for agile and intuitive tools due to the increasing model capabilities, primarily in the field of Large Language Models (LLMs). In recent months, we have seen great progress in the Natural Language Generation (NLG) landscape, with proliferation of generative AI applications leveraging LLMs for a vast number of tasks. The power of LLMs resides in their ability to generalize almost any NLP task to the problem of next token prediction, thus simplifying the traditional NLP pipelines consisting in intensive data labeling and domain-specific fine-tuning for a single task. Moreover, LLMs are enhanced (1) with external knowledge bases, which improve their reasoning and domain understanding and (2) with external tools, which improve their ability to perform actions. We present a novel approach that harnesses the power of LLMs to transform natural language inputs into structured data representations, facilitating seamless interaction with custom APIs for real-time data visualization. We explore the integration of Flythings® Technologies API for Internet of Things (IoT) device solutions in the Industry 4.0 domain. This system demonstration presents a chat-based virtual assistant that allows users to query the status of monitored machines and devices. The core component of the application is a LLM that serves as a bridge between user queries and machinereadable JSON objects, which adhere to a predefined schema following the Flythings standard. Our LLM output facilitates the interaction with the Flythings API, leading to the generation of visualizations that illustrate IoT device status in real time.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;NLP</kwd>
        <kwd>LLM</kwd>
        <kwd>Fine-tuning</kwd>
        <kwd>agents</kwd>
        <kwd>assistants</kwd>
        <kwd>visualization</kwd>
        <kwd>API tools</kwd>
        <kwd>IoT</kwd>
        <kwd>Monitoring</kwd>
        <kwd>Industry 4</kwd>
        <kwd>0</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>tuning and deployment of the optimized and
productionready LLM. In Section 4, we illustrate the practical
examples carried out and the real world utility of our tool, In this section, we present our methodology, covering
presenting its limitations in Section 5. We conclude with all the steps involved in our pipeline. We describe our
Section 6 by summarizing our findings and outlining the data preparation stage, including the seed data creation
future directions of our research. and data augmentation process. We also formulate our
supervised fine-tuning (SFT) method for our information
extraction task, as well as the inference optimizations
2. Related Work taken into account for our LLM deployment. The overall
process is depicted in Figure 1.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Proposed Method</title>
      <sec id="sec-2-1">
        <title>In recent months, we have seen a myriad of LLM re</title>
        <p>
          search papers addressing the topic of context-aware
LLMs through in-context learning. This capability en- 3.1. Seed Data
ables them to generalize to almost any NLP task, com- In the absence of pre-existing user data for our task,
demonly unseen during pre-training and fine-tuning stages pendent on the FlyThings® technology, we started
creat[3, 5, 6]. This direction has led the research commu- ing a dataset. We collected feedback from the Flythings
nity to explore the integration of LLMs with external team, who provided us with the initial examples of
potentools such as document stores [7] or APIs [8], enhancing tial user inputs and expected outputs. In this way, we got
their generalization capabilities even more. LLM agents a seed dataset consisting of 6 outputs, each of them with 3
[
          <xref ref-type="bibr" rid="ref2">9</xref>
          ] are a new concept arised from providing LLMs with diferent ways of expressing the input in accordance with
(1) extensive up-to-date data pools beyond their fixed the Flythings team. Given these pairs, we agreed on a
knowledge representations and (2) functions or tools to specification, defining a JSON schema as the golden rule.
perform actions and automate processes [
          <xref ref-type="bibr" rid="ref11 ref4">10, 11, 12, 13</xref>
          ]. Our pipeline starts with (1) a template-based method for
Such two-fold strategy reduces the need for regular re- generating new JSON outputs as described in Figure 1,
training. For example, Gorilla [8] leverages a multitude randomly selecting one of the available options for each
of APIs and documentation through document retrievers, of the JSON fields, following the schema depicted in
Fighighlighting the efectiveness of this framework. ure 2. In this way, we got a pool of examples for the next
        </p>
        <p>
          Moreover, the reasoning capabilities of LLMs are in- data augmentation step.
lfuenced by the prompt strategies followed [ 5, 14, 15],
where how natural language instructions are written
significantly afects the performance [ 16]. More com- 3.2. Data Augmentation
plex prompting strategies like ReAct [
          <xref ref-type="bibr" rid="ref2">9</xref>
          ] became popular, Our seed dataset was scarce and limited in scope, lacking
combining reasoning and planning techniques by adding from input query diversity. Therefore, we followed a data
reasoning traces and task-specific actions to the prompt. augmentation approach. We created a custom pipeline for
These strategies benefit the integration of the LLM with generating alternative input queries, given the reference
external sources. In this new landscape, new benchmark (input, output) pairs from the seed data. For this task,
frameworks were proposed [17, 18], which aim at design- we leveraged the Mixture of Experts (MoE) LLM
Mixtraling reliable and robust evaluation methodologies. 8x7B-Instruct-v0.1 model from Mistral AI [24].
        </p>
        <p>The introduction of Generative Information Extraction We aimed at generating variant inputs for each JSON
(GIE) has further boosted the NLP field [ 19]. Recent stud- output from the previous pool depicted in Figure 1, so that
ies [20] propose LLMs to generate structured information we could increase the available (input, output) pairs. We
from natural language. Some closely-related tasks, like used the original seed as reference within the instruction
text-to-SQL [21, 22], involve the transformation of nat- illustrated in Figure 3, generating 3 variations of the input
ural language into SQL language for querying external for each target through few-shot in-context learning [6].
tools (i.e., databases). This generative approach proves to This process corresponds to the (2) data augmentation
be efective even in scenarios involving complex schemas step depicted in Figure 1. We increased our dataset up to
with millions of entities involved [23]. The ability of 355 curated samples for the following SFT stage.
LLMs to manage these large schemas without dropping
performance (efectively generating the target query fol- 3.3. Supervised Fine-Tuning
lowing a specific format) is particularly signicfiant for
our research. We propose a generation step aiming at
transforming natural language queries (sent to our virtual
assistant) into structured JSON objects with the relevant
parameters for the integration of the FlyThings®API.</p>
        <p>Before diving into the details of the fine-tuning process,
it is important to understand why supervised fine-tuning
was necessary in the first place. While zero-shot or
fewshot (i.e., in-context) learning [25] can be efective for
general NLP tasks, it entails challenges when the task
(1)
Output
generator</p>
        <p>JSON</p>
        <p>LLM
(5)</p>
        <p>Task</p>
        <p>(2)
Instruction: Your task is to generate in Spanish 3
alternative inputs for a specific JSON output (...)
This is the output schema: {json_schema}</p>
        <p>Input-Output</p>
        <p>Pool
Inference
(4)</p>
        <p>AWQ
Quantization
with some examples of the task in the initial instruction,
was limited and biased by the quality and expressiveness
of the provided sequences at inference time. In short,
these two methods neither captured the complexity nor
the specificity of our domain, leading us to sub-optimal
performance in terms of both accuracy and reliability.
is very specific and requires a thorough generation pro- Recognized these limitations, we transitioned to a
finecess, limiting hallucinations [26]. In our case, we faced tuning approach to tailor the model for our specific needs.
some issues with the in-context learning approach for During the fine-tuning stage, we assessed multiple
modclassifying and extracting the corresponding fields for els up to 7 billion parameters, considering the
tradethe Flythings® task. On the one hand, (1) zero-shot learn- of between the model performance and our hardware
ing, which involves making direct predictions without limitations. We finally chose the instruction fine-tuned
any previous examples in the training distribution, had model teknium/OpenHermes-2.5-Mistral-7B2 based on the
problems with detailed input queries requiring complex mistralai/Mistral-7B-Instruct-v0.1 model3. We leveraged
JSON outputs, in which the corresponding JSON schema the dataset from our previous data augmentation step
in the instruction was not enough. These led to
classification inaccuracies in the generation step. Similarly, (2)
few-shot learning, which relies on providing the model</p>
      </sec>
      <sec id="sec-2-2">
        <title>2https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B 3https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1</title>
        <p>{
}</p>
        <p>Json Schema
"series":[
{
"property": String,
"foi": String,
"module": String,
"asIncremental": Boolean
}
],
"visualization":{
"config":{
"type": Enum,
"subtype": Enum
},
"body":{
"period": Enum,
(...)
(...)
"temporalScaleType": Enum
}
}
Instruction: Your task is to generate 3 alternative inputs for a
specific JSON output. {rules_to_follow}
This is the output schema:
{"series": [{ "property": "tap 2", "foi": "greehouse water",
"asIncremental": True }], "visualization": {"config": {"type":
"chart", "subtype": "line"}, "body":{ "temporalScale": "DAILY",
"temporalScaleType": "CHANGES" }}}
Input1: View the accumulated status changes for tap 2 of the
greenhouse water device on a daily graph.</p>
        <p>Input2: Observe the daily graph that displays the collective
status alterations of tap 2 in the greenhouse watering device.</p>
        <p>Input3: Examine the daily chart showing the aggregate
changes in the status of greenhouse water device's tap 2.
from Section 3.2, following the QLoRA [27] approach
for eficient fine-tuning. Similar to LoRA (Low-Rank
Adaptation of large language models) [28], which freezes
the pre-trained model weights and adds trainable rank
decomposition matrices to each transformer block
(eliminating the need for full fine-tuning), QLoRA goes a step
further by quantizing the weights of the frozen backbone
LLM, adding the LoRA adapters with paged optimizers to
manage memory spikes. This results in a more eficient
memory management for fine-tuning [27].
3.4. Inference Optimization
the visual widget is loaded, showing the results to the
user. We include an example in Figure 4. We also provide
a video demonstration6 of the virtual assistant.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Limitations</title>
      <p>In this paper we introduce the first version of the
system as a proof-of-concept demo, still in its early stage
of development. We focused on the data augmentation,
ifne-tuning and deployment stages mainly due to time
constraints. We did not perform thorough evaluation
and we acknowledge the importance of this process, but
since the project is linked to a new market product by the
Flythings® company, we aligned with the team
requirements, which were more oriented to fast prototyping for
a first usable version of the chat interface.</p>
      <p>
        After the supervised fine-tuning stage of our model, we
had to determine the inference requirements under a
production environment, considering (1) our hardware
limitations and (2) the need for low latency supporting
real-time queries. In this way, we explored the available
options for reducing the computational requirements, 6. Conclusions and Future Work
while maintaining (or minimally decreasing) the LLM
performance. We opted for the vLLM [29] library, specif- In this paper we present a novel approach for
queryically designed for fast and eficient serving of LLMs in- ing the Flythings® framework. We described the system
cluding, but not limited to, paged attention optimizations, architecture and the NLP pipeline for the dataset
preparacontinuous batching of incoming requests and optimized tion, LLM fine-tuning and inference optimization stages.
CUDA kernels. We compared the performance of difer- Our approach is generalizable to any text-to-JSON or
textent quantization techniques supported by vLLM, such to-API task following the proposed pipeline. We handle
as GPTQ [30] and AWQ [31]. We chose AWQ because it user queries in natural language with a virtual assistant,
ofered the best throughput while maintaining the perfor- considering visual feedback. Our next steps include
remance4. We deployed our LLM service in the proprietary fining the fine-tuned LLM using preference data from
ITG clusters, using a RTX A6000 48 GB GDDR6 GPU. users interacting with the system. We will study in more
detail both the helpfulness and the accuracy of our model
4. Chatbot Experimentation outputs by means of thorough evaluation and
benchmarking. We plan to explore Reinforcement Learning from
Human Feedback (RLHF) [32] and Directed Preference
Optimization (DPO) [
        <xref ref-type="bibr" rid="ref23">33</xref>
        ] for further alignment with
human preferences. We also foresee future applications of
Virtual Reality (VR), which would improve usability
under real conditions and enhance user experience. We aim
to broaden the current functionality beyond querying IoT
devices, adding more complex Flythings® IoT operations,
such as managing device actions, alerts or dashboards.
      </p>
      <sec id="sec-3-1">
        <title>For our experimentation, we implemented a new vir</title>
        <p>tual assistant view in the FlyThings® framework. The
front-end of the chatbot is in charge of loading the user
contexts, which is the list of their IoT devices available.</p>
        <p>With the environment all set, each input query is sent
to the LLM service, which generates the corresponding
JSON output following the schema described in Figure 2.</p>
        <p>We identify the closest IoT device information matching
the extracted device and property (and optionally module,
if present) JSON fields. Then, we follow these steps: (1) if Acknowledgments
there are no matches, the user is prompted to try again;
(2) if there is exclusively one match, the next step is exe- This ongoing R&amp;D project is supported by the CEL.IA
cuted; (3) if there are more than one match, a radio button network initiative7 through the CDTI (Centro para el
Deis displayed for the user to choose among them. Depend- sarrollo Tecnológico Industrial) (grant CER-20211022) by
ing on the visualization format (graph, table, indicator the Ministerio de Ciencia e Innovación. This research is
and so on), a request to the observation API endpoints5 is also possible thanks to the ITG-Flythings collaboration.
processed, including all the chart configuration. Finally, We would like to express our gratitude to the Flythings</p>
      </sec>
      <sec id="sec-3-2">
        <title>4The AWQ quantization method consistently outperforms</title>
        <p>GPTQ across diferent model scales in their evaluation benchmark.
Check the original work for more details.</p>
        <p>5https://deviot.flythings.io/api/apidocs/index.html#
api-03-Request_Observations</p>
      </sec>
      <sec id="sec-3-3">
        <title>6Demo (video) available at https://youtu.be/qHs47rcmpHU 7https://itg.es/cervera-celia/</title>
        <p>developers team, for their continuous support and
feedback to enhance our LLM generation capabilities and
integration within their systems.</p>
        <p>2201.07207. arXiv:2201.07207. A Survey on Hallucination in Large Language
Mod[16] Anthropic, Long context prompting for claude els: Principles, Taxonomy, Challenges, and Open
2.1, 2023. URL: https://www.anthropic.com/news/ Questions, ArXiv abs/2311.05232 (2023). URL: https:
claude-2-1-prompting. //api.semanticscholar.org/CorpusID:265067168.
[17] Q. Xu, F. Hong, B. Li, C. Hu, Z. Chen, J. Zhang, [27] T. Dettmers, A. Pagnoni, A. Holtzman, L.
ZettleOn the Tool Manipulation Capability of Open- moyer, QLoRA: Eficient Finetuning of Quantized
source Large Language Models, arXiv preprint LLMs, ArXiv abs/2305.14314 (2023). URL: https:
arXiv:2305.16504 (2023). //api.semanticscholar.org/CorpusID:258841328.
[18] Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, Y. Lin, [28] E. J. Hu, yelong shen, P. Wallis, Z. Allen-Zhu, Y. Li,
X. Cong, X. Tang, B. Qian, et al., Toolllm: Facilitat- S. Wang, L. Wang, W. Chen, LoRA: Low-Rank
ing large language models to master 16000+ real- Adaptation of Large Language Models, in:
Inworld apis, arXiv preprint arXiv:2307.16789 (2023). ternational Conference on Learning
Representa[19] D. Xu, W. Chen, W. Peng, C. Zhang, T. Xu, X. Zhao, tions, 2022. URL: https://openreview.net/forum?id=
X. Wu, Y. Zheng, E. Chen, Large Language Mod- nZeVKeeFYf9.
els for Generative Information Extraction: A Sur- [29] W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H.
vey, ArXiv abs/2312.17617 (2023). URL: https://api. Yu, J. E. Gonzalez, H. Zhang, I. Stoica, Eficient
semanticscholar.org/CorpusID:266690657. Memory Management for Large Language Model
[20] A. Dunn, J. Dagdelen, N. Walker, S. Lee, A. S. Serving with PagedAttention, in: Proceedings of
Rosen, G. Ceder, K. Persson, A. Jain, Structured the ACM SIGOPS 29th Symposium on Operating
information extraction from complex scientific Systems Principles, 2023.
text with fine-tuned large language models, 2022. [30] E. Frantar, S. Ashkboos, T. Hoefler, D. Alistarh,
arXiv:2212.05238. GPTQ: Accurate Post-training Compression for
[21] J. Li, B. Hui, G. Qu, J. Yang, B. Li, B. Li, B. Wang, Generative Pretrained Transformers, arXiv preprint
B. Qin, R. Cao, R. Geng, N. Huo, X. Zhou, C. Ma, arXiv:2210.17323 (2022).</p>
        <p>G. Li, K. C. C. Chang, F. Huang, R. Cheng, Y. Li, [31] J. Lin, J. Tang, H. Tang, S. Yang, X. Dang, S. Han,
Can LLM Already Serve as A Database Interface? AWQ: Activation-aware Weight Quantization for
A BIg Bench for Large-Scale Database Grounded LLM Compression and Acceleration, arXiv (2023).</p>
        <p>Text-to-SQLs, 2023. arXiv:2305.03111. [32] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L.
Wain[22] R. Srivastava, Defog SQLCoder, 2023. URL: https: wright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama,
//github.com/defog-ai/sqlcoder. A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller,
[23] M. Josifoski, N. De Cao, M. Peyrard, F. Petroni, M. Simens, A. Askell, P. Welinder, P. Christiano,
R. West, GenIE: Generative information extraction, J. Leike, R. Lowe, Training language models to
in: M. Carpuat, M.-C. de Marnefe, I. V. Meza Ruiz follow instructions with human feedback, 2022.
(Eds.), Proceedings of the 2022 Conference of the arXiv:2203.02155.</p>
        <p>
          North American Chapter of the Association for [
          <xref ref-type="bibr" rid="ref23">33</xref>
          ] R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D.
Computational Linguistics: Human Language Tech- Manning, C. Finn, Direct Preference Optimization:
nologies, Association for Computational Linguis- Your Language Model is Secretly a Reward Model,
tics, Seattle, United States, 2022, pp. 4626–4643. 2023. arXiv:2305.18290.
        </p>
        <p>URL: https://aclanthology.org/2022.naacl-main.342.</p>
        <p>doi:10.18653/v1/2022.naacl-main.342.
[24] Mistral AI, Mixtral of experts, 2023. A. Flythings
https://mistral.ai/news/mixtral-of-experts/
and https://huggingface.co/mistralai/ The FlyThings® platform is an all-in-one tool for IoT
Mixtral-8x7B-Instruct-v0.1. device management for many diferent productive
sec[25] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, tors. It is designed for the analysis and forecasting of
J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, data records of IoT devices, considering any of the data
G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, types available at scale. FlyThings® handles a wide
vaG. Krueger, T. Henighan, R. Child, A. Ramesh, riety of sensors, systems and applications for specific
D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, use cases including, but not limited to, smart
indusE. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, tries or intelligent energy. FlyThings® helps in the
deC. Berner, S. McCandlish, A. Radford, I. Sutskever, cision making process, yielding better results for
enD. Amodei, Language Models are Few-Shot Learn- terprises, with ad hoc oferings including modular Big
ers, 2020. arXiv:2005.14165. Data as a Service (BDaaS) with standard APIs for data
[26] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, management and visualization. Check https://itg.es/en/
H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, T. Liu, monitoring-iot-platform-flythings/ for more details.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>APIs</surname>
          </string-name>
          , arXiv preprint arXiv:
          <volume>2305</volume>
          .15334 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          , I. Shafran,
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>arXiv:2210.03629</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Parisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fiedel</surname>
          </string-name>
          , TALM: Tool Aug[1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bubeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chandrasekaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Eldan</surname>
          </string-name>
          , J. Gehrke, mented Language Models,
          <source>ArXiv abs/2205.12255</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          , E. Kamar,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. T.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lund-</surname>
          </string-name>
          (
          <year>2022</year>
          ). URL: https://api.semanticscholar.org/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>berg</surname>
          </string-name>
          , et al.,
          <source>Sparks of artificial general intelli- CorpusID:249017698.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>gence: Early experiments with gpt-4</article-title>
          , arXiv preprint [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dwivedi-Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dessì</surname>
          </string-name>
          , R. Raileanu,
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>arXiv:2303.12712</source>
          (
          <year>2023</year>
          ). M.
          <string-name>
            <surname>Lomeli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Cancedda</surname>
            , T. Scialom, [2]
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kulkarni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Shivananda</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kulkarni</surname>
          </string-name>
          , D. Gu- Toolformer:
          <article-title>Language models can teach themselves</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>divada, LLMs for Enterprise and LLMOps</article-title>
          , Apress, to use tools,
          <source>arXiv preprint arXiv:2302.04761</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          Berkeley, CA,
          <year>2023</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>154</lpage>
          . URL: https://doi. [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nakano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Balaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          , L. Ouyang,
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>org/10</source>
          .1007/978-1-
          <fpage>4842</fpage>
          -9994-
          <issue>4</issue>
          _7. doi:
          <volume>10</volume>
          .1007/ C. Kim,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kosaraju</surname>
          </string-name>
          , W. Saunders,
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          978-1-
          <fpage>4842</fpage>
          -9994-
          <issue>4</issue>
          _
          <fpage>7</fpage>
          .
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cobbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Eloundou</surname>
          </string-name>
          , G. Krueger, K. But[3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          , D. Amodei, ton, M. Knight,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          , J. Schulman, WebGPT:
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>vised Multitask</surname>
            <given-names>Learners</given-names>
          </string-name>
          ,
          <year>2019</year>
          . URL: https://api. feedback,
          <source>CoRR abs/2112</source>
          .09332 (
          <year>2021</year>
          ). URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          semanticscholar.org/CorpusID:160025533. [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Guu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Yu</surname>
          </string-name>
          , [
          <volume>13</volume>
          ] /S/.arYxaiov,.oRrg. /Rabaso/,
          <year>21M12</year>
          ..
          <source>H09a3u3s2k</source>
          . naercXhit,v:
          <fpage>K2</fpage>
          .
          <year>1N1a2r</year>
          .
          <year>a0s9im33h2an</year>
          .,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          ers,
          <source>ArXiv abs/2109</source>
          .01652 (
          <year>2021</year>
          ). URL: https://api. B.
          <string-name>
            <surname>Webber</surname>
            , T. Cohn,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.), Pro-
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          semanticscholar.org/CorpusID:237416585. ceedings of the 2020 Conference on Empiri[5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kojima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          , Y. Iwa- cal
          <source>Methods in Natural Language Processing</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          soners,
          <source>ArXiv abs/2205</source>
          .11916 (
          <year>2022</year>
          ). URL: https: tics, Online,
          <year>2020</year>
          , pp.
          <fpage>8736</fpage>
          -
          <lpage>8754</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          //api.semanticscholar.org/CorpusID:249017743. //aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>704</volume>
          . doi:10. [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Why Can GPT Learn</surname>
          </string-name>
          In-Context? Language Mod- [
          <volume>14</volume>
          ]
          <fpage>1J</fpage>
          .8W65e3i,/Xv.1W/2a0n2g0,.De.mSnchlupu-rmmaainns.,
          <year>7M04</year>
          ..
          <string-name>
            <surname>Bosma</surname>
            ,
            <given-names>E. H.</given-names>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Optimizers</surname>
          </string-name>
          (
          <year>2023</year>
          ).
          <source>arXiv:2212.10559. Prompting Elicits Reasoning in Large Language</source>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          , Models,
          <source>ArXiv abs/2201</source>
          .11903 (
          <year>2022</year>
          ). URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. //api.semanticscholar.org/CorpusID:246411621.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Yih</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Rocktäschel</surname>
            , et al., Retrieval-augmented [15]
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Abbeel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Pathak</surname>
            ,
            <given-names>I. Mordatch</given-names>
          </string-name>
          , Lan-
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          . abs/2201.07207 (
          <year>2022</year>
          ). URL: https://arxiv.org/abs/ [8]
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          , Gorilla:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>