<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Sciences</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>EKIN: Towards Natural Language Interaction with Industrial Production Machines</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>wing consortium: Vicomtech</string-name>
          <email>agonzalezdg@vicomtech.org</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>the Speech In-</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>, Ikor Technology Center</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>, the Machine Tool Institute</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Arantza del Pozo</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Vicomtech Foundation, Basque Research and Technology Alliance</institution>
          ,
          <addr-line>BRTA</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>teractive Research Group of the University of the Basque Country</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>10</volume>
      <issue>11</issue>
      <fpage>5</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>The industry and manufacturing sector could greatly bene t from 'hands-free' voice-based natural language interactions to assist operators across tasks requiring manual operations. However, the complexity of the industrial domain makes it very expensive to develop dialogue systems in this eld. Also, the dominant cloud architectures for speech recognition and synthesis pose privacy, security and latency concerns. And for some languages with few resources such as Basque, there is a lack of formalised terminology and language resources for technology development. In this paper, we review the state of the art in this eld and describe EKIN, a project which is being carried out to address some of the identi ed problems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Project consortium and funding
body</p>
      <p>The project has a total duration of 22
months, beginning on March 1, 2020 and
ending on December 31, 2021.</p>
      <p>EKIN is being carried out by the
follo</p>
      <p>1https://www.vicomtech.org
2https://www.ehu.eus/en/web/speechinteractive/about-us
3https://www.tekniker.es
4https://ikor.es
5https://www.imh.eus
6https://uzei.eus</p>
    </sec>
    <sec id="sec-2">
      <title>Context and motivation</title>
      <p>The use of voice and natural language is
changing the way we relate to technology. As
a result, conversational assistants have
become one of the most innovative tools to
simplify and make human-machine interactions
more natural. Well-known examples of these
interfaces are Apple's Siri, Google Now,
Microsoft Cortana or Amazon Alexa.</p>
      <p>Although these types of devices can work
perfectly independently (e.g. to search the
web, nd songs on Spotify or report the
weather forecast), they are increasingly
integrating with the IoT of the home by enabling
interactions with locks, light switches,
heaters, air conditioners and/or kitchen
appliances, becoming more and more indispensable
devices.</p>
      <p>Similarly, voice assistants could act as a
central management element in IoT-enabled
Industry 4.0 manufacturing plants. Some
potential use cases in industrial manufacturing
are: providing support in machine
maintenance, repair and overhaul operations;
facilitating the programming of manufacturing
machines; or supporting with manufacturing
and assembly tasks, among others. The
operator is continuously involved in manual
operations during these tasks, and searching
through paper manuals or tablets for
assistance slows down and considerably hinders their
work.</p>
      <p>Voice-based interfaces represent a relevant
solution in this context. The obvious but
important bene ts of using voice to
communicate with systems and machines in
factories are as follows: (i) they are 'hands free'
and 'eyes free', allowing operators to
continue with physical tasks; (ii) they are
natural for operators, requiring minimal training;
and (iii) they are very exible, allowing
communication at di erent levels of detail and in
contexts linked to multiple tasks.</p>
      <p>
        Regarding industrial noise concerns for
spoken interaction in manufacturing settings,
recent studies have shown that the
combination of existing hardware noise cancellation
devices and speech recognition systems is
robust enough for use in manufacturing
environments
        <xref ref-type="bibr" rid="ref2">(Gaizauskas, 2019)</xref>
        .
      </p>
      <p>Therefore, in theory, the development of
voice interaction technology should allow
industrial work to be carried out more e
ectively, as well as obtaining a positive
response from the operators. However, its presence
in industrial manufacturing environments is
still rare because there are several challenges
to overcome:
• Although much information is available
in documents of a technical nature (e.g.
manuals, manufacturing and assembly
dossiers, maintenance notes), it is still
very expensive to provide current
dialogue systems with the knowledge
necessary to implement meaningful
manufacturing use cases and tasks. This is due
to the speci city and complexity of the
domain, compared to other more
wellknown areas of application such as the
reservation of transport tickets,
restaurants or hotels.
• The dominant cloud architectures for
deploying speech recognition and synthesis
technology, derived from the hardware
requirements of neural paradigms, pose
privacy, security and latency issues that
concern the industry.
• In the particular case of the Basque
industry, there is a tradition of oral
communication in Basque in some factories
that is not formalized. There is
practically no speci c terminology for the
sector and the most common oral
expressions used to interact with machines are
not documented.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Technologies involved</title>
      <sec id="sec-3-1">
        <title>Natural interaction in advanced manufacturing environments</title>
        <p>Human-machine interfaces (HMI) in the
industrial eld have evolved rapidly in
recent years with the development of new
mobile technologies and new devices such
as smartphones, tablets and/or augmented
reality glasses. In the last decade, a
considerable number of systems have been
developed, mainly in the eld of collaborative
robotics, with the capacity for natural
interaction between operators and machines, to
varying degrees (Mavridis, 2015), (Serras et
al., 2020). Despite these platforms integrate
advanced interfaces, their ability to
semantically understand human requests is still quite
limited and the e ort required for their
implementation is very high in most cases.</p>
        <p>
          Recent research that includes semantic
technologies to improve industrial
humanmachine interaction is mainly rule-based,
which in terms of maintenance and/or
extension requires high manual labor
(Maurtua et al., 2017). There is little work focused
on machine learning techniques for
multimodal human-machine communication, on
improving its adaptability to new scenarios (or
even languages) or on improving its
performance using as few resources (both linguistic
or human) as possible. In line with reducing
e orts, the authors in
          <xref ref-type="bibr" rid="ref1">(Antonelli and Bruno,
2017)</xref>
          emphasize the role of ontologies, since
they allow de ning the domain
understandable by humans and machines and contribute
to reducing ambiguity across operators.
        </p>
        <p>On the other hand, question-answering
systems that allow obtaining information
from a collection of unstructured documents
have advanced considerably in recent years
and are beginning to evolve towards
conversational interfaces (Reddy, Chen, and
Manning, 2019). Their adaptation to use cases in
the industrial production domain would allow
to automatically exploit the information
contained in existing technical documents.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Embedded speech recognition and synthesis</title>
        <p>With the introduction of deep neural
architectures, the last few years have witnessed a
signi cant leap in the performance of speech
recognition and synthesis systems. However,
until now, the high memory, processing and
power consumption requirements of neural
models and the computational and battery
limitations of mobile devices have led to cloud
integrations. In the industrial sector, the
deployment of Wi-Fi facilities raises a series of
di culties linked to the extension of the
facilities and the dispersion in di erent
buildings. Also, there is a growing concern
regarding privacy and security issues posed by
the current cloud architectures of commercial
systems.</p>
        <p>On the other hand, important advances
have been made in the development of
speci c hardware systems for embedded
computing of neural models in recent years. Thanks
to the advances in hardware obtained,
compression techniques of neural models have
also gained attention, obtaining important
advances also at software level. Some of the
techniques used to reduce the size of the
models and the latency of the responses are
precision reduction, data paralelization and
model compression (He et al., 2019). The union
between low-cost hardware and great
computational power, together with the advances in
neural model optimization, make it possible
to explore the embedding of speech
recognition and synthesis technology for industrial
human-machine interaction applications.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Use of Basque in the industrial sector</title>
        <p>The use of Basque in the industrial
manufacturing sector is very scarce. Although
Basque was used informally in the factories of
some regions where the tradition of the
machine tool industry is very powerful, its
incorporation into the real industrial world has
been closely linked to the plans for its
promotion carried out by the companies in the
sector. As a result, the social use of Basque
in industry has increased, but there are still
steps to take towards incorporating it into
the actual manufacturing processes. In
practice, both industrial machinery software and
technical documentation are still made in the
languages chosen by the machine suppliers,
which usually do not formally include
Basque.</p>
        <p>Regarding availability of specialized
language resources, some terminological
dictionaries of the eld exist such as the
Numerical Control Dictionary and the Machine Tool
Dictionary contained in the Euskalterm
database7, the LANEKI dictionary of vocational
training8, the Dictionary of the New Industry
promoted by SPRI9 or the DANOBAT
dictionary10. Nevertheless, the specialized
language resources available in Basque are too
limited to be used for the development of
natural language interaction interfaces with
industrial machines in such language.
4</p>
        <p>Project objectives and expected
results
The EKIN project aims to advance the
development of 'conversational interfaces' as a
mechanism for interaction between operators
and machines in industrial production plants
in the Basque Country, with the aim of
facilitating and improving the productivity of
certain processes. Speci cally, the following
objectives are pursued:
7https://www.euskadi.eus/euskalterm/
8http://hiztegia.jakinbai.eus/
9https://www.spri.eus/hiztegia/
10https://hiztegia.danobatgroup.eus/
• To facilitate the development of natural
language interaction interfaces between
operators and industrial production
machines, based on the information
contained in technical documentation
• To optimize neural speech recognition
and synthesis models, so that they can
be embedded in electronic devices in the
machines themselves to avoid the
privacy, security and latency problems of
their cloud deployments
• To formalize a terminology and a corpus
of expressions of interaction with
machines in Basque, that will serve as a
reference for the development of
conversational interfaces in the industrial sector
in such language</p>
        <p>The main expected result of the project
is the explicit recognition by real operators
that the use of natural language voice
interfaces facilitates the maintenance,
programming, manufacturing and assembly tasks of
industrial production
machines, increasing
their productivity and satisfaction regarding
the tasks performed. The technological stack
under development will derive in speci c
interfaces for di erent use cases. As more
speci c results, we also expect to generate:
• Technological components that allow
implementing operator-machine
interaction systems faster and more e ciently
than at present
• Speech recognition and synthesis
models that can</p>
        <p>be embedded in
lowperformance hardware devices
• A terminology and corpus of reference
expressions for operator-machine
interaction in Basque</p>
        <p>rst results of the project have been
satisfactory. Considerable progress has been
made in compiling a corpus of technical
manuals and dossiers. Progress has also been
made designing an ontology dealing with
dialogue including domain aspects and
investigating novel techniques for the development
of question-answering systems, as well as
systems for the automatic generation of dialogue
acts and rules from technical documentation.
Regarding the embedding of speech
recognition and synthesis, an experimentation board
has been designed and several neural model
optimization frameworks have been explored.</p>
        <p>Finally, a signi cant e ort has also been
made to compile manuals and dossiers in
Basque among project partners, from existing
repositories in the</p>
        <p>eld of vocational training
and by contacting relevant stakeholders in
the Basque industrial sector.</p>
        <p>During the last phase of the project,
efforts will be focused on
nalising
technological developments, implementing a prototype
and having real operators evaluate it.
ses.</p>
        <sec id="sec-3-3-1">
          <title>Technical report, The University of</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>She eld, UK, April. He, Y., T. N. Sainath, R. Prabhavalkar, I. McGraw, R. Alvarez, D. Zhao, D. Rybach,</title>
          <p>A.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>Kannan, Y. Wu, R.</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>Pang,</title>
          <p>Q. Liang, D. Bhatia, Y. Shangguan, B. Li,
G. Pundak, K. C. Sim, T. Bagby, S.-y.
Chang, K. Rao, and A. Gruenstein. 2019.
Streaming end-to-end speech recognition
for mobile devices.</p>
          <p>In ICASSP 2019
2019 IEEE International Conference on
Acoustics, Speech and Signal Processing
(ICASSP), pages 6381{6385.</p>
          <p>Maurtua, I., I. Fernandez, A. Tellaeche,
J. Kildal, J. Ibarguren, and B. Sierra.
2017. Natural multimodal communication
for human-robot collaboration.
Internatems, pages 1{12.</p>
          <p>Mavridis, N. 2015. A review of verbal and
non-verbal human-robot interactive
com</p>
          <p>Robotics and Autonomous</p>
          <p>Reddy, S., D. Chen, and C. D. Manning.
2019.</p>
        </sec>
        <sec id="sec-3-3-5">
          <title>CoQA: A conversational question answering challenge. Transactions of the</title>
          <p>active System for the Operator 4.0.
Ap</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Antonelli</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Bruno</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>HumanRobot Collaboration using Industrial Robots</article-title>
          .
          <source>In 2017 2nd International Conference on Electrical, Automation and Mechanical Engineering (EAME</source>
          <year>2017</year>
          ), pages
          <fpage>99</fpage>
          {
          <fpage>102</fpage>
          . Atlantis Press.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Gaizauskas</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Investigating spoken dialogue to support manufacturing procesSystems</article-title>
          ,
          <volume>63</volume>
          :
          <fpage>22</fpage>
          {
          <fpage>35</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>