=Paper=
{{Paper
|id=Vol-2454/paper_23
|storemode=property
|title=A Systematic View on Speech Assistants for Service Technicians
|pdfUrl=https://ceur-ws.org/Vol-2454/paper_23.pdf
|volume=Vol-2454
|authors=Joachim Baumeister,Veronika Sehne,Carolin Wienrich
|dblpUrl=https://dblp.org/rec/conf/lwa/BaumeisterSW19
}}
==A Systematic View on Speech Assistants for Service Technicians==
A Systematic View on Speech Assistants for Service Technicians? Joachim Baumeister1,2 Veronika Sehne1,2 Carolin Wienrich2 Abstract. The paper gives a systematic view on speech assistants in the field of technical service of industrial machines. We describe the results of a requirements analysis targeting companion technologies for service technicians and we report on a first reference implementation. The ef- fectiveness of the approach is evaluated in a diagnosis task scenario and preliminary results of a user study are discussed. knowledge-based diagnosis system intelligent personal assistant speech interaction dialog systems 1 Motivation Intelligent personal assistants for private use are implemented in many devices nowadays. With their availability in smartphones, watches, and TVs, they dif- fused into daily life and support users when, for instance, sending text messages, setting reminders, and starting apps or media channels. They simplify the use of existing technology by making it more intuitive and quicker in execution. In general, an intelligent personal assistant provides a natural language inter- face to take requests from the user and perform corresponding actions. Today, many assistants interact with the user by a speech interface. In research, the development of (smart) speech assistants was elaborated in many works, for in- stance see [1, 3, 9, 11, 12]. In this paper, we ask the research question, whether and how the obvious benefits of such assistants can be transferred from personal life to industrial use cases. Here, we especially look for applications of speech assistants in the context of Technical Service. In general, the domain of Techni- cal Service considers the operation, the optimization and maintenance, and the repair of often very complex industrial machines. ? Supported by German BMWi Project Grant ZF4172703BZ7 (MARS project) Copyright c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 1 2 Background and Related Work 2.1 Assistance Systems for Service Technicians Today’s assistance systems for service technicians are often structured like search applications of the 2000er years: Providing simple textual search interfaces the technician need to manually formulate an appropriate query and click through the delivered results. Only a few number of systems provide a kind of semantic interface that is able to return relevant information bits not necessarily match- ing the original but the intended meaning of the query. Also, the technicians are required to actually hold and touch the assistance system’s device. Speech assistants would enable the technicians to intuitively formulate the information request and also free them to touch a keyboard device while doing so. The aim of the present study is an user-centered development of a personal speech assistant for service technicians. 2.2 User-Centered Design Process Norman and Draper [7] introduced the user-centered-design (UCD) process which condensed different approaches and methods known in the field of human-com- puter-interaction (HCI). Similar, the DIN EN ISO 13407 describes the basic steps of user-centered design processes consisting of the analysis of the context, the analysis of requirements as well as the iterative design and evaluation of gestalt solutions [5]. Integrating different perspectives, the approach of contex- tual design [4] involves users to analyze and evaluate systems (includes often summative evaluations) and the approach of usability engineering [6] involves experts (includes often formative evaluations) [8, 10]. The outer blue circle of Figure 1 demonstrates the common steps of the UCD process and how they are intertwined. Even though, the UCD process is well established in the HCI, context and requirement analyses for speech assistance systems are rare. Dumas et al. [2] de- veloped a guideline for multi-modal user interfaces including speech. However, the guideline seems rather universal and not specific enough to meet the require- ments of a personal speech assistance for service technicians. Thus, the present article analyses the context and requirements of service technicians to design an user-centered speech assistance including an expert and user evaluation. 3 Analyzing Phase in the Present Study Our use-context is the domain of the Technical Service, focusing particularly on the service technician. For this reason, we first introduce the domain of Techni- cal Service of industrial machines and then domain specific tasks, environmental conditions, and experiences of a service technician. The use-context and first im- plications for the development of a speech assistant were analyzed iteratively by three experts from the domains human-computer-interaction and technical ser- vices. The analyses of context result in four prototypical personas. We conclude Fig. 1. The outer blue circle shows a typical user-centered design process. The inner red circle shows the key points of the user-centered design process in the present article. Green, yellow, purple areas highlight the analyzing, designing, and evaluating phase. the section with a description of the context-related requirements, that—in the view of experts—a targeted speech assistant need to meet. Figure 1 shows the key points of the analyzing phase in the present study (inner red circle, green area). 3.1 Analyzing the Use-Context: The Domain of Technical Service Advanced industrial machinery is one of the most driving domains in today’s life. Machinery produces almost all consumer goods with high efficiency. In this case, it touches all aspects of personal life, for instance harvesting machines feeding the food production of animals and humans, paper making and printing machines to produce newspapers, and automotive factories to produce cars and trucks. However, machinery needs to be safely operated and maintained for its op- timal performance. In case of malfunction it need to be brought back to work quickly. Often, the standstill of such machines yield exceptional costs in the area of hundreds of euros and more—per minute. The Technical Service, and namely the service technicians, are responsible for the maintenance, performance opti- mization, diagnosis and repair of industrial machinery. 3.2 Analyzing Work-related Characteristics of a Service Technician: Domain Tasks, Environmental Conditions, and Domain Experiences Domain Tasks As an overall goal, the service technician should make sure to maintain the machine state and sometimes even to optimize the machine performance. To archive these goals we distinguish the following sub-tasks: 1. Operation and monitoring of the machine performance 2. Disassembly/assembly of machine components (optimization/repair) 3. Diagnosis of faulty machine behavior 4. Maintenance operations for a machine 5. Documentation of accomplished work for commercial and knowledge man- agement reasons The present paper focuses on the task of diagnosis including the use of an information system with a guided troubleshooting on the one hand and the work around/in/on/under the machine during the diagnosis on the other hand. The latter involves the use of special tools or working gloves while the former re- quests two free hands for navigate through the different diagnosis steps. As con- sequences for the development of the speech assistance result the implications of the integration of the speech assistant in a diagnosis interface of a semantic in- formation system on the one hand and the possibility of a hands-free interaction on the other hand. Environmental Conditions The service technician accomplishes the tasks within changing environmental parameters. Basically, the experts distinguish the following main parameters that affect the quality and speed of the technicians work: 1. Location The technician is working in a workshop or “in-field”. As conse- quences for the development of the speech assistant we see: – (Reduced) availability of required tools – (Reduced) support and knowledge provided by on-site colleagues – (Reduced) Internet access for remote communication 2. Supporting Information and Infrastructure considers the availability of in- formation resources in breadth and depth, e.g., technical documentation, diagnostic systems, maintenance plans, communication channels, and mo- bile devices. As consequences for the development of the speech assistant result we see: – Required infrastructure for providing information resources (cloud sys- tems and storage) – Different levels of information can be provided depending on the avail- ability of mobile devices 3. Limited Time may stress the technician to complete the task. This primary limitation influences the use of tools, the quality of work, and the type of possible tasks. As consequences for the development of the speech assistant result we see: – Speech assistants need to be adaptable with respect to the available time and mental workload of the user. This affects the query patterns, the possible intents, and the (extensive) answers provided by the assistant. Domain Experience The performance of the overall service task heavily de- pends on the experience of the technician. In general, it is difficult to classify the individual experience, but some indicators can help to distinguish different ex- perience levels: a) The number and type of training sessions completed in recent years and b) The years of work in the particular domain. Some companies define specific levels of experience based on the type of work the technician can complete individually, e.g., from Level 1 (Technician can safely operate the machine) to Level 4 (Technician can safely diagnose and repair faulty behaviour for which diagnostic knowledge is available) and up to Level 5 (Technician can safely diagnose and repair a previously unknown issue). Consequentially, a speech assistant should support the individual experience level of a service technician. 3.3 Four Prototypical Personas of a Service Technician Based on the analysis of basic work-related characteristics of a service tech- nician’s tasks, environmental conditions, and domain experiences, the experts identified four typical personas, that represent different prototypes of service technicians. They focus in variations on domain experience and the available time to complete a given task. Thus, the four personas represent a typical cross- selection of the domain of Technical Service. Brad Teresa Time Brave Thorough Carl Frank Conservative Frantic Experience Fig. 2. Characterization of personas in the domain of Technical Service concerning the domain experience and the time for task completion. In Figure 2, we see a classification of the identified personas: Brad Brave (”I am happy to learn new stuff every day!”) is innovation friendly, but a beginner of the domain. He has the motivation to use new things and invests time into the exploration and use of information systems. Carl Conservative (”I cannot spend extra time for this additional stuff.”) also has not much experience in the field but has not motivation to invest time, e.g., in the exploration and use of an information system. Frank Frantic (”It is about the bolts not about the apps!”) is very experienced in the domain, but shows no interest to explore new tools that possibly may improve his work. Theresa Thorough (”I love to improve every day!”) has much experience in the domain and also is very innovation-friendly. She is interested to further improve her personal performance by trying and using advanced tools. 3.4 Analyzing Requirements Following the context analysis, three experts analyzed specific requirements for developing a speech assistant for service technicians. In summary, 28 require- ments were identified and clustered in five categories. Table 1 shows the five clusters and examples of the corresponding specific requirements. Table 1. Analyzed requirements for a personal speech assistance supporting the di- agnosis in Technical Service. Marked requirements show the implementations in the current system. Requirement Cluster Requirement for the Development Symptom detection Robust symptom detection (also various in one input), disambiguation*, recognition of technical & colloquial language Orientation within dialog Guided dialog* including pausing*, go backwards*, skip- ping & repeating* questions, multi-modal interaction* Speech recognition Online/Offline speech recognition, activation words recognition* Explanations Display further information, documents, explanation of the solution & differential diagnosis Trust in the speech assistant Using technical language*, transparency, personalization 4 Evaluation Phase 1: Expert Evaluation of Analyzed Requirements In order to proof the relevance of the analyzed requirements, ten independent experts (age: 23 to 43 years, 9 male) from the field of diagnostic dialogues revis- ited the 28 requirements and rated the importance on a five-point Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). Table 2 shows the requirement clusters and ratings of the second expert group. Symptom detection was rated as most important, followed by orienta- tion within the dialog, speech recognition, and explanations. Trust in the speech assistant was rated as less important. In addition to the 28 requirements identified by the first expert group, the second expert group found 11 requirements. These 11 additional requirements fit Table 2. Means and standard deviations of important ratings of the second expert group on the requirement clusters. Requirement Clusters Second Expert Group Symptom detection M = 4.0, SD = 0.4 Speech recognition M = 3.2, SD = 1.1 Orientation within dialog M = 3.7, SD = 0.3 Explanations M = 3.4, SD = 0.4 Trust in the speech assistant M = 2.4, SD = 1.1 in the above presented requirement clusters and did not changed the importance pattern of the clusters. In sum, the two expert groups identified 40 requirements clustered in five groups with different importance for the development of a speech assistant for service technicians. The first conceptional design of the speech assistant targets on the prototyp- ical persona of Brad Brave (see Figure 2), because he is innovation-friendly and invests time into the exploration and use of information systems. Due to his lim- ited working experience (Level 3, i.e., needs guidance), his working performance and experience can be improved by companion technologies. From the identified requirements, Brad needs the following: – The assistant works offline and online, because the availability is required not only in the workshop but also on–field with a probably worse internet connection. – Brad often needs both hands to accomplish the service work, thus the assis- tant should be able to operate hands-free. – The assistant needs to support the following tasks: • Basic information research for technical documentation • Support a diagnosis task by a guided diagnostic dialog Following these requirements the assistant should be helpful for the remain- ing personas as well, at least in some situations. In the subsequent section, we introduce a concept of a speech assistant for the service technician Brad Brave, that takes the considerations from above into account. 5 Design Phase: Personal Assistant for Technicians 5.1 Conceptual View The competence of a personal speech assistant can be characterized by the col- lection of intents it understands and can react on. Intents are formulated by the user in natural language and require one or more corresponding actions performed by the assistant. The requirements stated above yield the following main intents: a) Search for information in the available technical documentation. b) Provide diagnostic support for a specific problem description. User Interface action Action Execution Automated query update Intent Speech Identification Recognition use Context Image query Recognition Fig. 3. Architecture of a multi-modal assistant including speech interaction. Figure 3 shows the general architecture of the personal assistant. It consists of an automated speech recognition (ASR), intent identification, an action exe- cution module, user context management, and a user interface. For the modules Intent Identification and Action Execution, we see adapter interfaces, so that the assistant can learn new intents/actions by plug-ins. In the following, we briefly describe each module in more detail. User Interface Query Formulation An advanced user interface of an assistant is able to capture the user query in different modalities, e.g., speech utterances, video images and basic touch/text input. In the context of this paper, we focus on query formu- lation using speech utterances. The query needs to be transparently elicited by the system, i.e., the user sees the entered input instantly in the system while actually providing it. This is trivial for keyboard entries but also the text of recognized speech input and captured video images should be displayed to the user for transparent tracability. Deliver Action The user interface outputs the results of the actions derived by the assistant. In the simplest case, the assistant simply displays the document the user asked for. In the case of a guided diagnosis system, the action module delivers an interactive dialog with the user. Automated Speech Recognition (ASR) This module transforms spoken text (sound waves) into text input, so it can be later analyzed by the subsequent intent module. Intent Identification The intent identification is responsible for the semantic interpretation of the recognized text. The module tries to identify the request of the user in order to find an appropriate action for the request. It includes techniques from natural language understanding and question answering. Classic approaches are based on rules and patterns, but recently also statistical learning approaches were introduced. Action Module Based on the recognized intents an appropriate action is se- lected. In our scenario, we refer to the tasks that a speech assistant should support (see Section 3.4): Support the documentation research by providing useful information and facts for a given question and the support for diagnostic questionnaires. Context Update The assistant is able to track the work of the technician and uses this work context to support the intent identification and the action selection. Typical task information elements are the previously stated queries, the se- lected actions, and the environment of the current use. 5.2 Reference Implementation A prototypical implementation of the speech assistant was developed for the ex- isting information system Service Mate (http://www.servicemate.de). Originally, the application serves as an information system for the research and consump- tion of technical service documentation. For the implementation of the diagnosis capabilities, the existing speech assistant was extended by an interactive speech dialog. The diagnostic dialog now can be started by simply stating ”start diag- nosis” into the speech input. Figure 4 depicts an example dialog of the diagnosis system. Here, the mal- function of a bicycle is analyzed by a question–answer dialog. The figure shows a question asking for the wear of the wheel rim with possible answers ”visible” and ”not visible”. The multi-modal interface of the system allows for simply touching the buttons to answer the question, but also the answer can be given by speech input. After answering the question the next best relevant question is asked in order to derive a possible cause for the observed fault of the bike. The next section outlines the second evaluation phase. Note, that the com- plete user evaluation is not described in this paper due to space constraints. 6 Evaluation Phase 2: User Evaluation of Speech Assistant In order to investigate whether the speech assistant indeed support the service technician, we compared the user experience of the developed speech interaction with the established touch interaction of the system as well as expectations previous the usage. Fig. 4. The diagnosis interface of the Service Mate showing a specific question, a helpful image and the possible answers for the question as touch buttons. It is also possible to answer the question by speech input. 6.1 Method Participants The study was conducted by 26 participants having a mean age M = 31.23 with standard deviation SD = 7.98, 4 females. The participants had medium experience with the information system. Material and Procedure The empirical study use an exemplary but fully functional service system for a bicycle, i.e., the dBike system. Besides the tech- nical documentation, a circuit diagram, and a 3D model the system also contains diagnosis routines for the most relevant functions of the bike. The experiments were conducted at a bike, a roughedized tablet computer was running the infor- mation system with the touch and speech interaction. Additionally, a separate speaker broadcasts the speech instructions. A further notebook was provided for answering the questionnaires of the experiment study. Participants had two tasks: (i) the diagnosis of faults with the gear shift and (ii) the diagnosis of a malfunction of the brake. The bike was specially prepared for each task. For the examination of the bike, the persons need to use the work gloves. Hence, for using the tablet computer, participants had to undress the gloves for touch interaction. In addition, some diagnostic steps request the usage of a special measuring tool. Half of participants conducted the diagnosis using the speech assistant, the other half was using the touch interaction on the tablet (independent variable: mode of interaction). The groups only differ in the mode of interaction imply- ing that participants using the touch interaction had to move between the bike and the laptop for each diagnostic step and had to undress the gloves for inter- action on the one side. On the other side, participants using touch interaction saw pictures with instruction. If participants of the speech interaction need a visualization, they had to move to the touch laptop, too. Previous to the interaction, all participants provided demographical infor- mation. Participants using speech interaction rated the expected usefulness and expected problems of speech interaction for the diagnosis process on a 5-point Likert scale ranging from 1 (not useful) to 5 (very useful), or 1 (many prob- lems) to 5 (no problems), respectively. After their interaction with the speech assistant, we assessed qualitatively positive and negative experiences. While the participants using speech interaction assessed their expectation prior the exper- iment, participants using touch interaction assessed the expected usefulness and problems of speech interaction after the experiment on a 7-point Likert scale ranging from 1 (not useful) to 7 (very useful), or 1 (many problems) to 7 (no problems), respectively. Other questionnaires were applied, but are not reported in the present paper. 6.2 Results Prior the interaction with the speech assistant, participants rated speech assis- tance to support the diagnosis process as rather useful (M = 3.92, SD = .64, range between 1 and 5). The auditive interaction mode and the corresponding free-hand interaction were mentioned as most frequent reasons. But also prob- lems were expected (M = 3.62, SD = .51, range between 1 and 5). Most problems were expected for the speech recognition. Further, some participants expected problems with the comprehension of diagnostic instructions. After the interac- tion with the speech assistant, most participants liked the hand-free interaction, the ease of use, and the intuitive control commands. On the other side, partici- pants reported negative experience concerning the restricted use of language and the fact that they cannot interrupt the speech assistant when it was speaking. After the interaction with the touch based assistant, participants also ex- pected speech assistance as rather useful to support the diagnosis process (M = 5.58, SD = 1.31, range between 1 and 7) and as medium problematic (M = 4.83, SD = 1.40, range between 1 and 7). The reasons for their ratings were similar to those of the pre-experimental ratings. Free-hand interaction were mentioned most frequently for a useful application. Speech detection and the comprehension of diagnostic instructions were mentioned as most frequently problems. Further, participants using touch interaction reported positive experiences with pictures supporting the comprehensive of the corresponding diagnosis steps. Undressing the gloves for touch interaction was experienced as negatively. 6.3 Discussion The results revealed the potentials of speech assistance in the domain of techni- cal services. Particularly, the possibility of hand-free interaction is expected as useful. However, the results also revealed concerns. The use of auditive guidance might lead to comprehensive problems and pictures support the comprehen- sion of corresponding diagnosis steps. Thus, a multi-modal interaction providing hands-free speech interaction and the visualization of 3D-models might be the best support for service technicians. 7 Conclusions In this paper we introduced speech assistants in the domain of Technical Ser- vice. Based on a user-centered requirements analysis for service technicians, we selected a number of relevant requirements to be implemented in an existing in- formation system. We evaluated the research question, whether and how speech assistants can support a service technician. In the experiment we focused on the diagnosis task. In the future, we are planning to extend the reported user studies to a diverse range of specific research questions. Particularly, a compar- ison of speech, touch, and multi-modal interaction is planned for future studies. Further, a field study including real service technicians is planned. References 1. Christensen, H., Casanueva, I., Cunningham, S., Green, P., Hain, T.: homeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition. In: 4th Workshop on Speech and Language Processing for Assistive Technologies. pp. 29–34 (2013) 2. Dumas, B., Lalanne, D., Oviatt, S.: Multimodal interfaces: A survey of principles, models and frameworks. In: Human machine interaction, pp. 3–26. Springer (2009) 3. Hamerich, S.W.: Benutzerfreundliche Sprachdialoge im Automobil. In: Sprachbe- dienung im Automobil, pp. 61–77. Springer (2009) 4. Holtzblatt, K., Beyer, H.: Contextual design: Design for life. Morgan Kaufmann (2016) 5. ISO, I.: 13407: Human-centred design processes for interactive systems. Geneva: ISO (1999) 6. Nielsen, J.: Heuristic evaluation. In: Usability inspection methods. pp. 25–62. John Wiley & Sons, Inc. (1994) 7. Norman, D.A., Draper, S.W.: User centered system design: New perspectives on human-computer interaction. CRC Press (1986) 8. Rubin, J., Chisnell, D.: Handbook of usability testing: how to plan, design and conduct effective tests. John Wiley & Sons (2008) 9. Santos, J., Rodrigues, J.J., Casal, J., Saleem, K., Denisov, V.: Intelligent personal assistants based on internet of things approaches. IEEE Systems Journal 12(2), 1793–1802 (2018) 10. Sarodnick, F., Brau, H.: Methoden der usability evaluation. Wissenschaftliche Grundlagen und praktische Anwendung 1 (2006) 11. Sidner, C.L.: Building Spoken-Language Collaborative Interface Agents, pp. 197– 226. Springer Netherlands, Dordrecht (2004) 12. Wobcke, W., Ho, V.H., Nguyen, A.T.L., Krzywicki, A.: A BDI agent archi- tecture for dialogue modelling and coordination in a smart personal assistant. IEEE/WIC/ACM International Conference on Intelligent Agent Technology pp. 323–329 (2005)