1. Introduction

LLMs on the Fly: Text-to-JSON for Custom API Calling

Miguel Escarda-Fernández

Iñigo López-Riobóo-Botana

Santiago Barro-Tojeiro

Lara Padrón-Cousillas

Sonia Gonzalez-Vázquez

Antonio Carreiro-Alonso

Pablo Gómez-Area

0 0 You have a brief description of the FlyThings

In the rapidly evolving landscape of Natural Language Processing (NLP), there is a growing demand for agile and intuitive tools due to the increasing model capabilities, primarily in the field of Large Language Models (LLMs). In recent months, we have seen great progress in the Natural Language Generation (NLG) landscape, with proliferation of generative AI applications leveraging LLMs for a vast number of tasks. The power of LLMs resides in their ability to generalize almost any NLP task to the problem of next token prediction, thus simplifying the traditional NLP pipelines consisting in intensive data labeling and domain-specific fine-tuning for a single task. Moreover, LLMs are enhanced (1) with external knowledge bases, which improve their reasoning and domain understanding and (2) with external tools, which improve their ability to perform actions. We present a novel approach that harnesses the power of LLMs to transform natural language inputs into structured data representations, facilitating seamless interaction with custom APIs for real-time data visualization. We explore the integration of Flythings® Technologies API for Internet of Things (IoT) device solutions in the Industry 4.0 domain. This system demonstration presents a chat-based virtual assistant that allows users to query the status of monitored machines and devices. The core component of the application is a LLM that serves as a bridge between user queries and machinereadable JSON objects, which adhere to a predefined schema following the Flythings standard. Our LLM output facilitates the interaction with the Flythings API, leading to the generation of visualizations that illustrate IoT device status in real time.

eol>NLP LLM Fine-tuning agents assistants visualization API tools IoT Monitoring Industry 4 0

1. Introduction

tuning and deployment of the optimized and productionready LLM. In Section 4, we illustrate the practical examples carried out and the real world utility of our tool, In this section, we present our methodology, covering presenting its limitations in Section 5. We conclude with all the steps involved in our pipeline. We describe our Section 6 by summarizing our findings and outlining the data preparation stage, including the seed data creation future directions of our research. and data augmentation process. We also formulate our supervised fine-tuning (SFT) method for our information extraction task, as well as the inference optimizations 2. Related Work taken into account for our LLM deployment. The overall process is depicted in Figure 1.

3. Proposed Method In recent months, we have seen a myriad of LLM re

search papers addressing the topic of context-aware LLMs through in-context learning. This capability en- 3.1. Seed Data ables them to generalize to almost any NLP task, com- In the absence of pre-existing user data for our task, demonly unseen during pre-training and fine-tuning stages pendent on the FlyThings® technology, we started creat[3, 5, 6]. This direction has led the research commu- ing a dataset. We collected feedback from the Flythings nity to explore the integration of LLMs with external team, who provided us with the initial examples of potentools such as document stores [7] or APIs [8], enhancing tial user inputs and expected outputs. In this way, we got their generalization capabilities even more. LLM agents a seed dataset consisting of 6 outputs, each of them with 3 [ 9 ] are a new concept arised from providing LLMs with diferent ways of expressing the input in accordance with (1) extensive up-to-date data pools beyond their fixed the Flythings team. Given these pairs, we agreed on a knowledge representations and (2) functions or tools to specification, defining a JSON schema as the golden rule. perform actions and automate processes [ 10, 11, 12, 13 ]. Our pipeline starts with (1) a template-based method for Such two-fold strategy reduces the need for regular re- generating new JSON outputs as described in Figure 1, training. For example, Gorilla [8] leverages a multitude randomly selecting one of the available options for each of APIs and documentation through document retrievers, of the JSON fields, following the schema depicted in Fighighlighting the efectiveness of this framework. ure 2. In this way, we got a pool of examples for the next

Moreover, the reasoning capabilities of LLMs are in- data augmentation step. lfuenced by the prompt strategies followed [ 5, 14, 15], where how natural language instructions are written significantly afects the performance [ 16]. More com- 3.2. Data Augmentation plex prompting strategies like ReAct [ 9 ] became popular, Our seed dataset was scarce and limited in scope, lacking combining reasoning and planning techniques by adding from input query diversity. Therefore, we followed a data reasoning traces and task-specific actions to the prompt. augmentation approach. We created a custom pipeline for These strategies benefit the integration of the LLM with generating alternative input queries, given the reference external sources. In this new landscape, new benchmark (input, output) pairs from the seed data. For this task, frameworks were proposed [17, 18], which aim at design- we leveraged the Mixture of Experts (MoE) LLM Mixtraling reliable and robust evaluation methodologies. 8x7B-Instruct-v0.1 model from Mistral AI [24].

The introduction of Generative Information Extraction We aimed at generating variant inputs for each JSON (GIE) has further boosted the NLP field [ 19]. Recent stud- output from the previous pool depicted in Figure 1, so that ies [20] propose LLMs to generate structured information we could increase the available (input, output) pairs. We from natural language. Some closely-related tasks, like used the original seed as reference within the instruction text-to-SQL [21, 22], involve the transformation of nat- illustrated in Figure 3, generating 3 variations of the input ural language into SQL language for querying external for each target through few-shot in-context learning [6]. tools (i.e., databases). This generative approach proves to This process corresponds to the (2) data augmentation be efective even in scenarios involving complex schemas step depicted in Figure 1. We increased our dataset up to with millions of entities involved [23]. The ability of 355 curated samples for the following SFT stage. LLMs to manage these large schemas without dropping performance (efectively generating the target query fol- 3.3. Supervised Fine-Tuning lowing a specific format) is particularly signicfiant for our research. We propose a generation step aiming at transforming natural language queries (sent to our virtual assistant) into structured JSON objects with the relevant parameters for the integration of the FlyThings®API.

Before diving into the details of the fine-tuning process, it is important to understand why supervised fine-tuning was necessary in the first place. While zero-shot or fewshot (i.e., in-context) learning [25] can be efective for general NLP tasks, it entails challenges when the task (1) Output generator

JSON

LLM (5)

Task

(2) Instruction: Your task is to generate in Spanish 3 alternative inputs for a specific JSON output (...) This is the output schema: {json_schema}

Input-Output

Pool Inference (4)

AWQ Quantization with some examples of the task in the initial instruction, was limited and biased by the quality and expressiveness of the provided sequences at inference time. In short, these two methods neither captured the complexity nor the specificity of our domain, leading us to sub-optimal performance in terms of both accuracy and reliability. is very specific and requires a thorough generation pro- Recognized these limitations, we transitioned to a finecess, limiting hallucinations [26]. In our case, we faced tuning approach to tailor the model for our specific needs. some issues with the in-context learning approach for During the fine-tuning stage, we assessed multiple modclassifying and extracting the corresponding fields for els up to 7 billion parameters, considering the tradethe Flythings® task. On the one hand, (1) zero-shot learn- of between the model performance and our hardware ing, which involves making direct predictions without limitations. We finally chose the instruction fine-tuned any previous examples in the training distribution, had model teknium/OpenHermes-2.5-Mistral-7B2 based on the problems with detailed input queries requiring complex mistralai/Mistral-7B-Instruct-v0.1 model3. We leveraged JSON outputs, in which the corresponding JSON schema the dataset from our previous data augmentation step in the instruction was not enough. These led to classification inaccuracies in the generation step. Similarly, (2) few-shot learning, which relies on providing the model

2https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B 3https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1

{ }

Json Schema "series":[ { "property": String, "foi": String, "module": String, "asIncremental": Boolean } ], "visualization":{ "config":{ "type": Enum, "subtype": Enum }, "body":{ "period": Enum, (...) (...) "temporalScaleType": Enum } } Instruction: Your task is to generate 3 alternative inputs for a specific JSON output. {rules_to_follow} This is the output schema: {"series": [{ "property": "tap 2", "foi": "greehouse water", "asIncremental": True }], "visualization": {"config": {"type": "chart", "subtype": "line"}, "body":{ "temporalScale": "DAILY", "temporalScaleType": "CHANGES" }}} Input1: View the accumulated status changes for tap 2 of the greenhouse water device on a daily graph.

Input2: Observe the daily graph that displays the collective status alterations of tap 2 in the greenhouse watering device.

Input3: Examine the daily chart showing the aggregate changes in the status of greenhouse water device's tap 2. from Section 3.2, following the QLoRA [27] approach for eficient fine-tuning. Similar to LoRA (Low-Rank Adaptation of large language models) [28], which freezes the pre-trained model weights and adds trainable rank decomposition matrices to each transformer block (eliminating the need for full fine-tuning), QLoRA goes a step further by quantizing the weights of the frozen backbone LLM, adding the LoRA adapters with paged optimizers to manage memory spikes. This results in a more eficient memory management for fine-tuning [27]. 3.4. Inference Optimization the visual widget is loaded, showing the results to the user. We include an example in Figure 4. We also provide a video demonstration6 of the virtual assistant.

5. Limitations

In this paper we introduce the first version of the system as a proof-of-concept demo, still in its early stage of development. We focused on the data augmentation, ifne-tuning and deployment stages mainly due to time constraints. We did not perform thorough evaluation and we acknowledge the importance of this process, but since the project is linked to a new market product by the Flythings® company, we aligned with the team requirements, which were more oriented to fast prototyping for a first usable version of the chat interface.

After the supervised fine-tuning stage of our model, we had to determine the inference requirements under a production environment, considering (1) our hardware limitations and (2) the need for low latency supporting real-time queries. In this way, we explored the available options for reducing the computational requirements, 6. Conclusions and Future Work while maintaining (or minimally decreasing) the LLM performance. We opted for the vLLM [29] library, specif- In this paper we present a novel approach for queryically designed for fast and eficient serving of LLMs in- ing the Flythings® framework. We described the system cluding, but not limited to, paged attention optimizations, architecture and the NLP pipeline for the dataset preparacontinuous batching of incoming requests and optimized tion, LLM fine-tuning and inference optimization stages. CUDA kernels. We compared the performance of difer- Our approach is generalizable to any text-to-JSON or textent quantization techniques supported by vLLM, such to-API task following the proposed pipeline. We handle as GPTQ [30] and AWQ [31]. We chose AWQ because it user queries in natural language with a virtual assistant, ofered the best throughput while maintaining the perfor- considering visual feedback. Our next steps include remance4. We deployed our LLM service in the proprietary fining the fine-tuned LLM using preference data from ITG clusters, using a RTX A6000 48 GB GDDR6 GPU. users interacting with the system. We will study in more detail both the helpfulness and the accuracy of our model 4. Chatbot Experimentation outputs by means of thorough evaluation and benchmarking. We plan to explore Reinforcement Learning from Human Feedback (RLHF) [32] and Directed Preference Optimization (DPO) [ 33 ] for further alignment with human preferences. We also foresee future applications of Virtual Reality (VR), which would improve usability under real conditions and enhance user experience. We aim to broaden the current functionality beyond querying IoT devices, adding more complex Flythings® IoT operations, such as managing device actions, alerts or dashboards.

For our experimentation, we implemented a new vir

tual assistant view in the FlyThings® framework. The front-end of the chatbot is in charge of loading the user contexts, which is the list of their IoT devices available.

With the environment all set, each input query is sent to the LLM service, which generates the corresponding JSON output following the schema described in Figure 2.

We identify the closest IoT device information matching the extracted device and property (and optionally module, if present) JSON fields. Then, we follow these steps: (1) if Acknowledgments there are no matches, the user is prompted to try again; (2) if there is exclusively one match, the next step is exe- This ongoing R&D project is supported by the CEL.IA cuted; (3) if there are more than one match, a radio button network initiative7 through the CDTI (Centro para el Deis displayed for the user to choose among them. Depend- sarrollo Tecnológico Industrial) (grant CER-20211022) by ing on the visualization format (graph, table, indicator the Ministerio de Ciencia e Innovación. This research is and so on), a request to the observation API endpoints5 is also possible thanks to the ITG-Flythings collaboration. processed, including all the chart configuration. Finally, We would like to express our gratitude to the Flythings

4The AWQ quantization method consistently outperforms

GPTQ across diferent model scales in their evaluation benchmark. Check the original work for more details.

5https://deviot.flythings.io/api/apidocs/index.html# api-03-Request_Observations

6Demo (video) available at https://youtu.be/qHs47rcmpHU 7https://itg.es/cervera-celia/

developers team, for their continuous support and feedback to enhance our LLM generation capabilities and integration within their systems.

2201.07207. arXiv:2201.07207. A Survey on Hallucination in Large Language Mod[16] Anthropic, Long context prompting for claude els: Principles, Taxonomy, Challenges, and Open 2.1, 2023. URL: https://www.anthropic.com/news/ Questions, ArXiv abs/2311.05232 (2023). URL: https: claude-2-1-prompting. //api.semanticscholar.org/CorpusID:265067168. [17] Q. Xu, F. Hong, B. Li, C. Hu, Z. Chen, J. Zhang, [27] T. Dettmers, A. Pagnoni, A. Holtzman, L. ZettleOn the Tool Manipulation Capability of Open- moyer, QLoRA: Eficient Finetuning of Quantized source Large Language Models, arXiv preprint LLMs, ArXiv abs/2305.14314 (2023). URL: https: arXiv:2305.16504 (2023). //api.semanticscholar.org/CorpusID:258841328. [18] Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, Y. Lin, [28] E. J. Hu, yelong shen, P. Wallis, Z. Allen-Zhu, Y. Li, X. Cong, X. Tang, B. Qian, et al., Toolllm: Facilitat- S. Wang, L. Wang, W. Chen, LoRA: Low-Rank ing large language models to master 16000+ real- Adaptation of Large Language Models, in: Inworld apis, arXiv preprint arXiv:2307.16789 (2023). ternational Conference on Learning Representa[19] D. Xu, W. Chen, W. Peng, C. Zhang, T. Xu, X. Zhao, tions, 2022. URL: https://openreview.net/forum?id= X. Wu, Y. Zheng, E. Chen, Large Language Mod- nZeVKeeFYf9. els for Generative Information Extraction: A Sur- [29] W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. vey, ArXiv abs/2312.17617 (2023). URL: https://api. Yu, J. E. Gonzalez, H. Zhang, I. Stoica, Eficient semanticscholar.org/CorpusID:266690657. Memory Management for Large Language Model [20] A. Dunn, J. Dagdelen, N. Walker, S. Lee, A. S. Serving with PagedAttention, in: Proceedings of Rosen, G. Ceder, K. Persson, A. Jain, Structured the ACM SIGOPS 29th Symposium on Operating information extraction from complex scientific Systems Principles, 2023. text with fine-tuned large language models, 2022. [30] E. Frantar, S. Ashkboos, T. Hoefler, D. Alistarh, arXiv:2212.05238. GPTQ: Accurate Post-training Compression for [21] J. Li, B. Hui, G. Qu, J. Yang, B. Li, B. Li, B. Wang, Generative Pretrained Transformers, arXiv preprint B. Qin, R. Cao, R. Geng, N. Huo, X. Zhou, C. Ma, arXiv:2210.17323 (2022).

G. Li, K. C. C. Chang, F. Huang, R. Cheng, Y. Li, [31] J. Lin, J. Tang, H. Tang, S. Yang, X. Dang, S. Han, Can LLM Already Serve as A Database Interface? AWQ: Activation-aware Weight Quantization for A BIg Bench for Large-Scale Database Grounded LLM Compression and Acceleration, arXiv (2023).

Text-to-SQLs, 2023. arXiv:2305.03111. [32] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wain[22] R. Srivastava, Defog SQLCoder, 2023. URL: https: wright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, //github.com/defog-ai/sqlcoder. A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, [23] M. Josifoski, N. De Cao, M. Peyrard, F. Petroni, M. Simens, A. Askell, P. Welinder, P. Christiano, R. West, GenIE: Generative information extraction, J. Leike, R. Lowe, Training language models to in: M. Carpuat, M.-C. de Marnefe, I. V. Meza Ruiz follow instructions with human feedback, 2022. (Eds.), Proceedings of the 2022 Conference of the arXiv:2203.02155.

North American Chapter of the Association for [ 33 ] R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Computational Linguistics: Human Language Tech- Manning, C. Finn, Direct Preference Optimization: nologies, Association for Computational Linguis- Your Language Model is Secretly a Reward Model, tics, Seattle, United States, 2022, pp. 4626–4643. 2023. arXiv:2305.18290.

URL: https://aclanthology.org/2022.naacl-main.342.

doi:10.18653/v1/2022.naacl-main.342. [24] Mistral AI, Mixtral of experts, 2023. A. Flythings https://mistral.ai/news/mixtral-of-experts/ and https://huggingface.co/mistralai/ The FlyThings® platform is an all-in-one tool for IoT Mixtral-8x7B-Instruct-v0.1. device management for many diferent productive sec[25] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, tors. It is designed for the analysis and forecasting of J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, data records of IoT devices, considering any of the data G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, types available at scale. FlyThings® handles a wide vaG. Krueger, T. Henighan, R. Child, A. Ramesh, riety of sensors, systems and applications for specific D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, use cases including, but not limited to, smart indusE. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, tries or intelligent energy. FlyThings® helps in the deC. Berner, S. McCandlish, A. Radford, I. Sutskever, cision making process, yielding better results for enD. Amodei, Language Models are Few-Shot Learn- terprises, with ad hoc oferings including modular Big ers, 2020. arXiv:2005.14165. Data as a Service (BDaaS) with standard APIs for data [26] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, management and visualization. Check https://itg.es/en/ H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, T. Liu, monitoring-iot-platform-flythings/ for more details.

APIs , arXiv preprint arXiv: 2305 .15334 ( 2023 ).

[9]

Yao ,

Zhao ,

Yu ,

Du , I. Shafran,

arXiv:2210.03629 ( 2022 ).

[10]

Parisi ,

Zhao ,

Fiedel , TALM: Tool Aug[1]

Bubeck ,

Chandrasekaran ,

Eldan , J. Gehrke, mented Language Models, ArXiv abs/2205.12255

Horvitz , E. Kamar,

Lee ,

Y. T.

Lee ,

Li ,

Lund- ( 2022 ). URL: https://api.semanticscholar.org/

berg , et al., Sparks of artificial general intelli- CorpusID:249017698.

gence: Early experiments with gpt-4 , arXiv preprint [11]

Schick ,

Dwivedi-Yu ,

Dessì , R. Raileanu,

arXiv:2303.12712 ( 2023 ). M. Lomeli , L.

Zettlemoyer , N.

Cancedda , T. Scialom, [2] A.

Kulkarni , A.

Shivananda , A.

Kulkarni , D. Gu- Toolformer: Language models can teach themselves

divada, LLMs for Enterprise and LLMOps , Apress, to use tools, arXiv preprint arXiv:2302.04761 ( 2023 ).

Berkeley, CA, 2023 , pp. 117 - 154 . URL: https://doi. [12]

Nakano ,

Hilton ,

Balaji ,

Wu , L. Ouyang,

org/10 .1007/978-1- 4842 -9994- 4 _7. doi: 10 .1007/ C. Kim,

Hesse ,

Jain ,

Kosaraju , W. Saunders,

978-1- 4842 -9994- 4 _ 7 .

Jiang ,

Cobbe ,

Eloundou , G. Krueger, K. But[3]

Radford , J. Wu ,

Child ,

Luan , D. Amodei, ton, M. Knight,

Chess , J. Schulman, WebGPT:

vised Multitask

Learners

, 2019 . URL: https://api. feedback, CoRR abs/2112 .09332 ( 2021 ). URL: https:

semanticscholar.org/CorpusID:160025533. [4]

Wei ,

Bosma ,

Zhao ,

Guu ,

A. W.

Yu , [ 13 ] /S/.arYxaiov,.oRrg. /Rabaso/, 21M12 .. H09a3u3s2k . naercXhit,v: K2 . 1N1a2r . a0s9im33h2an .,

ers, ArXiv abs/2109 .01652 ( 2021 ). URL: https://api. B. Webber , T. Cohn, Y. He , Y. Liu (Eds.), Pro-

semanticscholar.org/CorpusID:237416585. ceedings of the 2020 Conference on Empiri[5]

Kojima ,

S. S.

Gu ,

Reid ,

Matsuo , Y. Iwa- cal Methods in Natural Language Processing

soners, ArXiv abs/2205 .11916 ( 2022 ). URL: https: tics, Online, 2020 , pp. 8736 - 8754 . URL: https:

//api.semanticscholar.org/CorpusID:249017743. //aclanthology.org/ 2020 .emnlp-main. 704 . doi:10. [6]

Dai ,

Sun ,

Dong ,

Hao ,

Ma ,

Sui ,

Wei ,

Why Can GPT Learn In-Context? Language Mod- [ 14 ] 1J .8W65e3i,/Xv.1W/2a0n2g0,.De.mSnchlupu-rmmaainns., 7M04 .. Bosma , E. H.

Optimizers ( 2023 ). arXiv:2212.10559. Prompting Elicits Reasoning in Large Language [7]

Lewis ,

Perez ,

Piktus ,

Petroni , Models, ArXiv abs/2201 .11903 ( 2022 ). URL: https:

Karpukhin ,

Goyal ,

Küttler ,

Lewis , W.-t. //api.semanticscholar.org/CorpusID:246411621.

Yih , T.

Rocktäschel , et al., Retrieval-augmented [15] W.

Huang , P.

Abbeel , D.

Pathak , I. Mordatch , Lan-

Systems 33 ( 2020 ) 9459 - 9474 . abs/2201.07207 ( 2022 ). URL: https://arxiv.org/abs/ [8]

S. G.

Patil ,

Zhang ,

Wang ,

J. E.

Gonzalez , Gorilla: