Towards an LLM-based Intelligent Assistant for Industry 5.0 Roberto Figliè1,* , Tommaso Turchi1 , Giacomo Baldi2 and Daniele Mazzei1,2 1 Computer Science Department, University of Pisa, Pisa, 56127, Italy 2 Zerynth, Pisa, 56124, Italy Abstract Industry 4.0 (I4.0) has revolutionised industrial operations by enabling remote monitoring and control of machines, thereby enhancing productivity through data analysis. Dashboards have traditionally been the primary interface for accessing and interpreting data. However, they can lack adaptability and may overwhelm users with information. It is important to consider alternative methods of presenting data to avoid information overload. In response, Industry 5.0 (I5.0) has emerged, advocating for a human-centric approach. Advancements in technology, specifically auto-regressive Large Language Models (LLMs), have enabled the development of Intelligent Cognitive Assistants (ICAs) that enhance user interactions through natural language dialog. This paper presents the initial steps towards constructing an LLM-based ICA for I5.0 applications. Our system integrates industrial data from IoT-connected machines into a chatbot interface, with the aim of simplifying the decision-making process for managers and operators. Through expert evaluation, we are iteratively refining our prototype before conducting usability tests with end-users. This will lay the groundwork for future developments in human-centric industrial solutions. Keywords Human-Computer Interaction, Artificial Intelligence, Industry 5.0, Large Language Models, Chatbots 1. Introduction Since its emergence, Industry 4.0 (I4.0) has demonstrated its potential in the industrial market by connecting a wide range of machines to the cloud, allowing them to be monitored and controlled remotely. This enabled companies to become more productive [1] as they could analyse in an efficient way the performance of production, their bottlenecks, their waste, etc. In this context, the primary means of interaction between the end user and the machine data or its analysis was inherited from the pre-I4.0 era: the dashboard. Dashboards provide a familiar interface that presents a visual representation of key performance indicators (KPIs), real-time data, and actionable insights [2]. They allow operators, managers, and decision-makers to monitor operations, identify trends, and make informed decisions in a timely manner [3]. Proceedings of the 1st International Workshop on Designing and Building Hybrid Human–AI Systems (SYNERGY 2024), Arenzano (Genoa), Italy, June 03, 2024. * Corresponding author. $ roberto.figlie@phd.unipi.it (R. Figliè); tommaso.turchi@unipi.it (T. Turchi); g.baldi@zerynth.com (G. Baldi); daniele.mazzei@unipi.it (D. Mazzei)  0000-0002-7208-6865 (R. Figliè); 0000-0001-6826-9688 (T. Turchi); 0000-0001-8383-3355 (D. Mazzei) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings However, as Industry 4.0 continues to evolve, there is a growing recognition of the limitations of traditional dashboards. Although they provide valuable information, they often present data in a static or pre-defined format [4], which limits flexibility and adaptability to changing operational needs. Furthermore, the sheer volume and complexity of data generated in Industry 4.0 environments can overwhelm users, causing information overload and making it challenging to extract meaningful insights efficiently. In response to these and many other considerations, there has recently been a shift towards a human-centric approach, commonly referred to as Industry 5.0 (I5.0). On the other hand, technology has not halted its progress, and it can even enhance the human-centeredness of industrial solutions. This is demonstrated by the recent success of auto- regressive large language models (LLMs). Although more sophisticated and technologically impressive than previous approaches to natural language, LLMs have sparkled a new interest in "old" interaction methods, such as natural language dialog. However, in contrast to traditional chatbots, an LLM-based chatbot can respond to users’ queries by following non-predefined flows, allowing for a wider range of possibilities within the dialog. While chatbots integrated into Business Intelligence systems have traditionally been used to formulate queries for data retrieval, it is now possible to envision a new era of Intelligent Cognitive Assistants (ICAs) —an AI system that assists and enhances users in various tasks by understanding, reasoning and learning from interactions [5]— that synergistically collaborate with users. In industrial scenarios where efficient knowledge transfer is increasingly important [6], such systems are particularly relevant and can streamline the work of decision-makers and operators. This article presents the initial steps to construct an ICA for I5.0 based on LLMs, in order to establish the groundwork for subsequent work. The article presents the core architecture of the first testable iteration in section 2. This is followed by an expert evaluation (section 3.1) and an outline of goals and methodology for future usability testing (section 3.2). 2. Chatbot Prototype Development The ICA’s initial version is a chatbot developed in partnership with Zerynth (https://zerynth.com/), an Italian company that offers Industrial IoT devices and soft- ware to digitise manufacturing processes. The main aim of this iteration was to develop a chatbot that could demonstrate its core capabilities in information retrieval tasks to the company’s customers (the intended users of the system). The assistant was tested with experts to refine it before testing with end users to gather feedback for future design iterations. (see Section 3). 2.1. Methods The prototype was developed with the GPT-4 model from OpenAI [7], which has the ability to call functions, i.e. predefined methods that take input parameters from the LLM, process them (or not), and return the generated output to the LLM. This allows the assistant to retrieve data from exogenous sources that are not included in its training data, thus enabling communica- tion with Zerynth’s APIs to retrieve machine data information. Furthermore, using multiple functions enables the system to select the most appropriate one based on the user’s request. Figure 1: Assistant architecture. This design allows the system to determine the optimal way to interact with the user, similar to mixed-initiative approaches. Additionally, the system’s choice of function and response type is influenced by the conversation’s history, making the assistant aware of any possible omissions due to previous mentions. 2.2. Architecture As shown in Fig. 1, the assistant architecture is based on 3 functions, 3 types of prompts and 2 data sources. The functions are: • Current week retriever provides direct connection with Zerynth’s APIs to retrieve real-time or recent data. • Historical retriever, uses pre-processed data to simulate the aggregation and filtering of historical data. • Question helper assists the user in formulating useful questions depending on the user type. The prompts provide both the functions and the LLM with the right context to correctly understand and parameterize the user’s request. They also help to appropriately link and understand retrieved data to formulate an answer. The data sources are queried based on the parameters extracted by the LLM. Specifically, they are: • The Zerynth database, to retrieve real-time data in a structured JSON format through a custom API connector. • Pre-processed Pandas dataframes, that only need to be filtered depending on the extracted parameters. Figure 2 shows an excerpt of a response generation flow. After the user input (1), the LLM is provided with the chat history for context (2), along with the tools definitions and parameter information. The chat history is limited to the 10 previous messages to prevent hallucinations Figure 2: Excerpt of a response generation flow. caused by exceeding the LLM’s context window. The LLM is called upon initially and, after comprehending the user’s message (3), it can independently determine whether to utilise any of the provided tools (4) or not. If it chooses to do so, the LLM will respond with a list of the tool/s to be used (4a), along with their corresponding parameters extracted from the user’s utterance. The chat history can be useful in identifying omissions by the user due to implicit references to previous messages, which is important for correctly identifying parameters. Therefore, the mentioned tool can be run with its parameters (4b). If the LLM has selected additional tools, they will be executed in parallel at this stage. The retrieved data (4c) will be augmented with new context information to facilitate the LLM ’reading’ process and ensure it adheres to the appropriate language and domain knowledge. Once the response is received, the LLM will be called again (4d) to utilize the retrieved data and formulate a response (5). If no tool is required, the first LLM call will directly generate a response, which will be displayed on the chatbot user interface (6). The colours in Figure 2 have been mapped to those in Figure 1 to facilitate matching between architecture and flow. For testing purposes, the chatbot was then embedded into Telegram as a bot. In the future, it will also have a dedicated user interface (section 4.2). Figure 3 shows an example of conversation with the chatbot in the Telegram UI. 3. Chatbot Prototype Evaluation 3.1. Experts Evaluation The chatbot prototype was evaluated by three experts from different fields: an academic with a background in HCI, an academic with mixed industrial-HCI expertise, and an industrial expert. 3.1.1. Methods The evaluation methodology used was extracted from BOT-Check, a design checklist presented in [8] along with the Chatbot Usability Scale (BUS) (refer to 3.2). The experts were presented with the chatbot and two typical domain tasks. After the test, Figure 3: Example of a conversation with the chatbot. they were asked to evaluate their experience and interaction using the BOT-Check checklist. During the evaluation, the following information was collected for each BOT-Check element: • Evaluation status, with the following possible assessments: – A check mark (✓) to indicate that the element was present and satisfactory. – A slash mark (/) to denote partial fulfillment or a somewhat present status. – An ’X’ mark (×) to signify that the element was absent or not satisfactory. • Main identified issues. • Main suggested recommendations. At the end of the collection stage, we analysed and categorised the evaluations by main themes. 3.1.2. Results The experts generally agreed that the chatbot was easy to use and flexible in adapting to different conversational styles, while also being capable of maintaining a themed and enjoyable discussion. However, there was disagreement regarding how well the chatbot met the needs of neurodiverse users and their preferences. For instance, the experts had varying opinions on the speed of answer (14th item in BOT-Check): some found it acceptable, while others believed it could be better managed, and some found it not acceptable at all. Table 1 presents the primary findings from the expert evaluations, including the usability issues identified and the corresponding recommendations for addressing them. 3.2. Usability Test Design While experts evaluation establish the foundation to first discover ways to improve the chatbot, usability testing ensures that the final product aligns with user expectations and needs. Table 1 Results of expert evaluations Issue Specific Concerns Recommendations A. Information credibility A1. Unsure data source. -Always specifiy data A2. Challenges previous an- source. swers. -Specify if response is the A3. Differences in cognitive result of an elaboration. paths. -Show elaboration processes only if explicitly requested. -Avoid response when unsure. B. Verbosity B1. Too much text to be di- -Shorter answers gested. -Answer length could de- B2. Excessive information pend on the request. can decrease answer preci- -Explicitly defer answers for sion. longer elaborations. B3. Excessive information hinders credibility. B4. The longer the answer, the longer the waiting time. C. Format and style C1. Only text available. -Make use of visualization C2. Absence of Call to Ac- strategies. tion. -Propose CTAs in relation to C3. Detachment with its ser- the available environment. vice environment and expe- -Propose more solutions rience. than excuses. C4. Absence of a real person- -Maintain a tone that is pro- ality. fessional, but harmoniously integrated into the service. D. Access privileges D1. Absence of authentica- -Check whether access to tion or login methods. specific information can be D2. Information can be ac- provided. cessed by any user profile. -Customise experience de- pending on privileges. Therefore, after the second iteration of the presented prototype, the chatbot will be tested with a selected number of Zerynth’s customers who already have previous experience with the dashboard environment to retrieve information from their IoT-ready industrial equipment. The chatbot will be embedded in the dashboard platform, as noted in the experts’ concern (C3). This will allow for the integration of more CTAs within the platform environment (C2). To collect data from these tests, users will be prompted to complete a post-test questionnaire aimed at gauging their satisfaction and usability perceptions after their first interaction with the chatbot. The questionnaire selected is the Chatbot Usability Scale (BUS-11) in its italian version [9], as the users’ primary language will be italian. BUS-11 was chosen over other usability assessment approaches, such as SUS or UMUX-Lite, for its specificity and applicability in the case studied. This assessment will evaluate user satisfaction based on five aspects: accessibility to chatbot functions, quality of chatbot functions, quality of conversation and information provided, privacy and security, and time of response. An additional aim of testing with end-users is to analyse their questions and extract qualitative data from the chatbot dialogs. This will allow us to enrich and refine the functionalities of the entire system, such as: 1. Generally improve the representation of end-users needs 2. Better associate LLM’s prompts with end-user profile. 3. Improve reply format and style. 4. Provide more precise data that could answer the questions. This encompasses: • Retrieve more data, if available. • Adding specifically requested KPIs. The first point ensures a better understanding of the end-users’ expectations and needs during their interaction with such a system for information retrieval purposes. The second one will inform the improvement of the quality of the prompts provided to the LLM. Concurrently, matching users’ profiles with prompts that describe them is crucial for customizing the dialog based on the user representation we have built. Currently, the chatbot is one-size-fits-all, except for the ’question helper’ function (section 2.2). Improving the reply format and style (third point) is also connected to the prompts. Finally, the fourth point highlights the potential for an improved retrieval process for industrial data and the definition of additional KPIs if necessary. 4. Discussion and Conclusion This article presents a work-in-progress development of an LLM-based ICA for human-centered industrial applications. The system is designed to simplify and augment decision-making processes, and to support both managers and operators in their daily activities. Currently, the proposal is represented by a chatbot that integrates industrial data from IoT-connected machines and other infrastructural elements. This first prototype facilitated the testing of the core functionalities and the collection of feedback from three experts through a heuristic evaluation based on a checklist for the design of chatbots (BOT-Check). We then established the groundwork for subsequent usability tests that will be carried out. The evaluations by experts identified several areas for improvement (Table 1), including concerns about the credibility of information (issue A), verbosity (issue B), format and style (issue C), and access privileges (issue D). Specifically, experts noted issues such as uncertainty about data sources (A1), challenging previously provided answers (A2) - for example, when previous data collection was doubtful - and presenting information in a way that differs from the user’s cognitive processes (A3). Excessive text can lead to reading difficulties, reduced precision, and reduced credibility. It can also increase waiting times to process the response. In terms of response format, it may be beneficial to include specific visualisations (C1) to present data in a more user-friendly manner. Additionally, incorporating calls to action (C2) could aid in integrating the chatbot into the dashboard and the other available services (C3). According to the industrial expert, the chatbot’s personality should be as professional as possible without any unnecessary embellishments1 . The experts had differing opinions on the matter, but it is important that the chatbot maintains objectivity and avoids subjective evaluations as much as possible. The absence of authentication methods (D1) and specific permissions to retrieve and access information (D2) prevents a secure and personalised experience. Addressing most of these concerns by applying suggested recommendations will be a crucial step to refine and improve the prototype before the usability tests. 4.1. Limitations Although this preliminary development of an LLM-based ICA shows potential for Industry 5.0 applications, it is important to acknowledge several limitations. The current work is still in its early stages, and its effectiveness in real-world industrial settings has yet to be fully validated. The primary goal of usability tests will be to achieve this validation. Furthermore, to address the complexities of diverse industrial environments, the scope of data integration and decision-making support may require further refinement. Additionally, there are limitations specific to the technology. A distinguishing factor between a dialog with an LLM-based chatbot and a traditional one is the range of errors that can be encountered. Traditional chatbots are often limited in their ability to handle unexpected user input, causing the conversation to be redirected back to its original path. In contrast, LLM-based chatbots demonstrate greater flexibility in comprehending user utterances. However, this could lead to a situation in which the LLM builds a response from scratch, rather than basing it on any real retrieved data. It is important to avoid this ’hallucinations’ and ensure that responses are based on actual data —as experts noted in issue A (Table 1)—. These limitations underscore the need for future research to comprehensively assess the usability, scalability, and effectiveness of LLM-based ICAs to contribute to I5.0 adoption in industrial contexts. 4.2. Future work To refine the chatbot, we will follow expert evaluations and integrate it into the Zerynth platform. Usability tests with end-users will be performed as presented in section 3.2. In future work, we aim to explore a broader setting where LLM-based intelligent assistants act as orchestrators of ubiquitous interaction with dynamic switching between tools, other agents, visualisations and other services. The design concept can draw inspiration from mixed-initiative interaction [10], which focuses on a flexible user-intelligent system collaboration to achieve a goal, seamlessly switching between interaction modes based on user preferences, contextual cues and opportunities. References [1] H. Özköse, G. Güney, The effects of industry 4.0 on productivity: A scientific mapping study, Technology in Society 75 (2023) 102368. URL: https://www.sciencedirect.com/ 1 In the expert words, "I want it to be as plain and professional as possible, no frills.". science/article/pii/S0160791X23001732. doi:https://doi.org/10.1016/j.techsoc. 2023.102368. [2] S. Few, Information dashboard design: The effective visual communication of data, O’Really, 2006. [3] C. A. Tavera Romero, J. H. Ortiz, O. I. Khalaf, A. Ríos Prado, Business Intelligence: Business Evolution after Industry 4.0, Sustainability 13 (2021) 10026. URL: https://www. mdpi.com/2071-1050/13/18/10026. doi:10.3390/su131810026, number: 18 Publisher: Multidisciplinary Digital Publishing Institute. [4] I. Berges, V. J. Ramírez-Durán, A. Illarramendi, Facilitating data exploration in industry 4.0, in: G. Guizzardi, F. Gailly, R. Suzana Pitangueira Maciel (Eds.), Advances in Conceptual Modeling, Springer International Publishing, Cham, 2019, pp. 125–134. [5] S. R. C. a. NSF, Intelligent Cognitive Assistants, https://www.nsf.gov/crssprgm/nano/ /reports/2016-1003_ICA_Workshop_Final_Report_2016.pdf, 2016. [6] S. Kernan Freire, M. Foosherian, C. Wang, E. Niforatos, Harnessing Large Language Models for Cognitive Assistants in Factories, in: Proceedings of the 5th International Conference on Conversational User Interfaces, CUI ’23, Association for Computing Machinery, New York, NY, USA, 2023, pp. 1–6. URL: https://dl.acm.org/doi/10.1145/3571884.3604313. doi:10. 1145/3571884.3604313. [7] OpenAI, OpenAI API Docs, https://platform.openai.com/docs/models/overview, ???? [8] S. Borsci, A. Malizia, M. Schmettow, F. van der Velde, G. Tariverdiyeva, D. Balaji, A. Cham- berlain, The Chatbot Usability Scale: the Design and Pilot of a Usability Scale for Interaction with AI-Based Conversational Agents, Personal and Ubiquitous Comput- ing 26 (2022) 95–119. URL: https://doi.org/10.1007/s00779-021-01582-9. doi:10.1007/ s00779-021-01582-9. [9] S. Borsci, E. Prati, A. Malizia, M. Schmettow, A. Chamberlain, S. Federici, Ciao AI: the Italian adaptation and validation of the Chatbot Usability Scale, Personal and Ubiquitous Computing 27 (2023) 2161–2170. URL: https://doi.org/10.1007/s00779-023-01731-2. doi:10. 1007/s00779-023-01731-2. [10] J. Allen, C. Guinn, E. Horvtz, Mixed-initiative interaction, IEEE Intelligent Systems and their Applications 14 (1999) 14–23. URL: https://ieeexplore.ieee.org/document/796083. doi:10.1109/5254.796083, conference Name: IEEE Intelligent Systems and their Appli- cations.