EKIN: Towards Natural Language Interaction with Industrial Production Machines EKIN: Hacia la Interacción en Lenguaje Natural con Máquinas de Producción Industrial Arantza del Pozo1 , Laura Garcı́a-Sardiña1 , Manex Serras1 , Ander González-Docasal1 , Marı́a Inés Torres2 , Eneko Ruiz2 , Izaskun Fernández3 , Cristina Aceta3 , Egoitz Konde3 , Daniel Aguinaga4 , Mikel de la Cruz4 , Iker Altuna5 , Joseba Agirre5 , Iker Etxebeste6 1 Vicomtech Foundation, Basque Research and Technology Alliance (BRTA) 2 University of the Basque Country UPV/EHU, Speech Interactive Research Group 3 Tekniker Foundation, Basque Research and Technology Alliance (BRTA) 4 Ikor Technology Center, 5 Machine Tool Institute, 6 UZEI 1 {adelpozo, lgarcias, mserras, agonzalezd}@vicomtech.org, 2 {manes.torres, eneko.ruiz}@ehu.eus, 3 {izaskun.fernandez, cristina.aceta, egoitz.konde}@tekniker.es, 4 {daguinaga, mcruz}@ikor.es, 5 {altuna, agirre}@imh.eus, 6 {ietxebeste}@uzei.eus Abstract: The industry and manufacturing sector could greatly benefit from ’hands-free’ voice-based natural language interactions to assist operators across tasks requiring manual operations. However, the complexity of the industrial domain ma- kes it very expensive to develop dialogue systems in this field. Also, the dominant cloud architectures for speech recognition and synthesis pose privacy, security and latency concerns. And for some languages with few resources such as Basque, there is a lack of formalised terminology and language resources for technology develop- ment. In this paper, we review the state of the art in this field and describe EKIN, a project which is being carried out to address some of the identified problems. Keywords: Industry 4.0, Human-Machine Interaction, Basque. Resumen: El sector de la fabricación industrial podrı́a beneficiarse enormemente de las interacciones ”manos libres” por voz, para ayudar a los operarios en tareas que requieren operaciones manuales. Sin embargo, la complejidad del dominio industrial hace que sea muy costoso desarrollar sistemas de diálogo en este campo. Además, las arquitecturas en la nube dominantes para el reconocimiento y la sı́ntesis de voz plantean problemas de privacidad, seguridad y latencia. Y para algunas lenguas con pocos recursos como el euskera, se carece de terminologı́a formalizada y de recursos lingüı́sticos para abordar los desarrollos tecnológicos necesarios. En este artı́culo, revisamos el estado del arte en este campo y describimos EKIN, un proyecto que se está llevando a cabo para abordar algunos de los problemas identificados. Palabras clave: Industria 4.0, Interacción Persona-Máquina, euskera. 1 Project consortium and funding wing consortium: Vicomtech1 , the Speech In- body teractive Research Group of the University of the Basque Country2 , Tekniker3 , Ikor Tech- EKIN is a research project funded by the nology Center4 , the Machine Tool Institute5 Basque Government through the Elkartek and UZEI6 . 2020 program from the Basque Agency for Business Development SPRI, under grant 1 https://www.vicomtech.org agreement KK-2020/00055. 2 https://www.ehu.eus/en/web/speech- The project has a total duration of 22 interactive/about-us 3 months, beginning on March 1, 2020 and en- https://www.tekniker.es 4 https://ikor.es ding on December 31, 2021. 5 https://www.imh.eus 6 EKIN is being carried out by the follo- https://uzei.eus Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 5 2 Context and motivation in industrial manufacturing environments is The use of voice and natural language is still rare because there are several challenges changing the way we relate to technology. As to overcome: a result, conversational assistants have beco- • Although much information is available me one of the most innovative tools to sim- in documents of a technical nature (e.g. plify and make human-machine interactions manuals, manufacturing and assembly more natural. Well-known examples of these dossiers, maintenance notes), it is still interfaces are Apple’s Siri, Google Now, Mi- very expensive to provide current dialo- crosoft Cortana or Amazon Alexa. gue systems with the knowledge neces- Although these types of devices can work sary to implement meaningful manufac- perfectly independently (e.g. to search the turing use cases and tasks. This is due web, find songs on Spotify or report the to the specificity and complexity of the weather forecast), they are increasingly inte- domain, compared to other more well- grating with the IoT of the home by enabling known areas of application such as the interactions with locks, light switches, hea- reservation of transport tickets, restau- ters, air conditioners and/or kitchen applian- rants or hotels. ces, becoming more and more indispensable devices. • The dominant cloud architectures for de- Similarly, voice assistants could act as a ploying speech recognition and synthesis central management element in IoT-enabled technology, derived from the hardware Industry 4.0 manufacturing plants. Some po- requirements of neural paradigms, pose tential use cases in industrial manufacturing privacy, security and latency issues that are: providing support in machine mainte- concern the industry. nance, repair and overhaul operations; faci- • In the particular case of the Basque in- litating the programming of manufacturing dustry, there is a tradition of oral com- machines; or supporting with manufacturing munication in Basque in some factories and assembly tasks, among others. The ope- that is not formalized. There is practi- rator is continuously involved in manual ope- cally no specific terminology for the sec- rations during these tasks, and searching th- tor and the most common oral expres- rough paper manuals or tablets for assistan- sions used to interact with machines are ce slows down and considerably hinders their not documented. work. Voice-based interfaces represent a relevant 3 Technologies involved solution in this context. The obvious but im- portant benefits of using voice to commu- 3.1 Natural interaction in nicate with systems and machines in facto- advanced manufacturing ries are as follows: (i) they are ’hands free’ environments and ’eyes free’, allowing operators to conti- Human-machine interfaces (HMI) in the in- nue with physical tasks; (ii) they are natu- dustrial field have evolved rapidly in re- ral for operators, requiring minimal training; cent years with the development of new and (iii) they are very flexible, allowing com- mobile technologies and new devices such munication at different levels of detail and in as smartphones, tablets and/or augmented contexts linked to multiple tasks. reality glasses. In the last decade, a consi- Regarding industrial noise concerns for derable number of systems have been deve- spoken interaction in manufacturing settings, loped, mainly in the field of collaborative ro- recent studies have shown that the combina- botics, with the capacity for natural inter- tion of existing hardware noise cancellation action between operators and machines, to devices and speech recognition systems is ro- varying degrees (Mavridis, 2015), (Serras et bust enough for use in manufacturing envi- al., 2020). Despite these platforms integrate ronments (Gaizauskas, 2019). advanced interfaces, their ability to semanti- Therefore, in theory, the development of cally understand human requests is still quite voice interaction technology should allow in- limited and the effort required for their im- dustrial work to be carried out more effecti- plementation is very high in most cases. vely, as well as obtaining a positive respon- Recent research that includes semantic se from the operators. However, its presence technologies to improve industrial human- 6 machine interaction is mainly rule-based, del compression (He et al., 2019). The union which in terms of maintenance and/or ex- between low-cost hardware and great compu- tension requires high manual labor (Maur- tational power, together with the advances in tua et al., 2017). There is little work focused neural model optimization, make it possible on machine learning techniques for multimo- to explore the embedding of speech recogni- dal human-machine communication, on im- tion and synthesis technology for industrial proving its adaptability to new scenarios (or human-machine interaction applications. even languages) or on improving its perfor- mance using as few resources (both linguistic 3.3 Use of Basque in the or human) as possible. In line with reducing industrial sector efforts, the authors in (Antonelli and Bruno, The use of Basque in the industrial manufac- 2017) emphasize the role of ontologies, since turing sector is very scarce. Although Bas- they allow defining the domain understanda- que was used informally in the factories of ble by humans and machines and contribute some regions where the tradition of the ma- to reducing ambiguity across operators. chine tool industry is very powerful, its in- On the other hand, question-answering corporation into the real industrial world has systems that allow obtaining information been closely linked to the plans for its pro- from a collection of unstructured documents motion carried out by the companies in the have advanced considerably in recent years sector. As a result, the social use of Basque and are beginning to evolve towards conver- in industry has increased, but there are still sational interfaces (Reddy, Chen, and Man- steps to take towards incorporating it into ning, 2019). Their adaptation to use cases in the actual manufacturing processes. In prac- the industrial production domain would allow tice, both industrial machinery software and to automatically exploit the information con- technical documentation are still made in the tained in existing technical documents. languages chosen by the machine suppliers, which usually do not formally include Bas- 3.2 Embedded speech recognition que. and synthesis Regarding availability of specialized lan- With the introduction of deep neural archi- guage resources, some terminological dictio- tectures, the last few years have witnessed a naries of the field exist such as the Numeri- significant leap in the performance of speech cal Control Dictionary and the Machine Tool recognition and synthesis systems. However, Dictionary contained in the Euskalterm data- until now, the high memory, processing and base7 , the LANEKI dictionary of vocational power consumption requirements of neural training8 , the Dictionary of the New Industry models and the computational and battery li- promoted by SPRI9 or the DANOBAT dic- mitations of mobile devices have led to cloud tionary10 . Nevertheless, the specialized lan- integrations. In the industrial sector, the de- guage resources available in Basque are too ployment of Wi-Fi facilities raises a series of limited to be used for the development of na- difficulties linked to the extension of the fa- tural language interaction interfaces with in- cilities and the dispersion in different buil- dustrial machines in such language. dings. Also, there is a growing concern re- 4 Project objectives and expected garding privacy and security issues posed by the current cloud architectures of commercial results systems. The EKIN project aims to advance the de- On the other hand, important advances velopment of ’conversational interfaces’ as a have been made in the development of spe- mechanism for interaction between operators cific hardware systems for embedded compu- and machines in industrial production plants ting of neural models in recent years. Thanks in the Basque Country, with the aim of fa- to the advances in hardware obtained, com- cilitating and improving the productivity of pression techniques of neural models have al- certain processes. Specifically, the following so gained attention, obtaining important ad- objectives are pursued: vances also at software level. Some of the 7 https://www.euskadi.eus/euskalterm/ techniques used to reduce the size of the mo- 8 http://hiztegia.jakinbai.eus/ dels and the latency of the responses are pre- 9 https://www.spri.eus/hiztegia/ 10 cision reduction, data paralelization and mo- https://hiztegia.danobatgroup.eus/ 7 • To facilitate the development of natural Finally, a significant effort has also been ma- language interaction interfaces between de to compile manuals and dossiers in Bas- operators and industrial production ma- que among project partners, from existing re- chines, based on the information contai- positories in the field of vocational training ned in technical documentation and by contacting relevant stakeholders in • To optimize neural speech recognition the Basque industrial sector. and synthesis models, so that they can During the last phase of the project, ef- be embedded in electronic devices in the forts will be focused on finalising technologi- machines themselves to avoid the pri- cal developments, implementing a prototype vacy, security and latency problems of and having real operators evaluate it. their cloud deployments References • To formalize a terminology and a corpus Antonelli, D. and G. Bruno. 2017. Human- of expressions of interaction with machi- Robot Collaboration using Industrial Ro- nes in Basque, that will serve as a refe- bots. In 2017 2nd International Confe- rence for the development of conversa- rence on Electrical, Automation and Me- tional interfaces in the industrial sector chanical Engineering (EAME 2017), pa- in such language ges 99–102. Atlantis Press. The main expected result of the project Gaizauskas, R. 2019. Investigating spoken is the explicit recognition by real operators dialogue to support manufacturing proces- that the use of natural language voice inter- ses. Technical report, The University of faces facilitates the maintenance, program- Sheffield, UK, April. ming, manufacturing and assembly tasks of He, Y., T. N. Sainath, R. Prabhavalkar, industrial production machines, increasing I. McGraw, R. Alvarez, D. Zhao, D. Ry- their productivity and satisfaction regarding bach, A. Kannan, Y. Wu, R. Pang, the tasks performed. The technological stack Q. Liang, D. Bhatia, Y. Shangguan, B. Li, under development will derive in specific in- G. Pundak, K. C. Sim, T. Bagby, S.-y. terfaces for different use cases. As more spe- Chang, K. Rao, and A. Gruenstein. 2019. cific results, we also expect to generate: Streaming end-to-end speech recognition • Technological components that allow for mobile devices. In ICASSP 2019 - implementing operator-machine interac- 2019 IEEE International Conference on tion systems faster and more efficiently Acoustics, Speech and Signal Processing than at present (ICASSP), pages 6381–6385. • Speech recognition and synthesis mo- Maurtua, I., I. Fernandez, A. Tellaeche, dels that can be embedded in low- J. Kildal, J. Ibarguren, and B. Sierra. performance hardware devices 2017. Natural multimodal communication for human-robot collaboration. Interna- • A terminology and corpus of reference tional Journal of Advanced Robotic Sys- expressions for operator-machine inter- tems, pages 1–12. action in Basque Mavridis, N. 2015. A review of verbal and The first results of the project have been non-verbal human-robot interactive com- satisfactory. Considerable progress has been munication. Robotics and Autonomous made in compiling a corpus of technical ma- Systems, 63:22–35. nuals and dossiers. Progress has also been made designing an ontology dealing with dia- Reddy, S., D. Chen, and C. D. Manning. logue including domain aspects and investi- 2019. CoQA: A conversational question gating novel techniques for the development answering challenge. Transactions of the of question-answering systems, as well as sys- Association for Computational Linguis- tems for the automatic generation of dialogue tics, 7:249–266, March. acts and rules from technical documentation. Serras, M., L. Garcı́a-Sardiña, B. Simões, Regarding the embedding of speech recogni- H. Álvarez, and J. Arambarri. 2020. Dia- tion and synthesis, an experimentation board logue Enhanced Extended Reality: Inter- has been designed and several neural model active System for the Operator 4.0. Ap- optimization frameworks have been explored. plied Sciences, 10(11):3960. 18