=Paper=
{{Paper
|id=Vol-2350/paper2
|storemode=property
|title=Leverage White-Collar Workers with AI
|pdfUrl=https://ceur-ws.org/Vol-2350/paper2.pdf
|volume=Vol-2350
|authors=Stephan Jüngling,Angelin Hofer
|dblpUrl=https://dblp.org/rec/conf/aaaiss/JunglingH19
}}
==Leverage White-Collar Workers with AI==
Leverage White-Collar Workers with AI
Stephan Jüngling, Angelin Hofer
FHNW University of Applied Sciences Northwestern Switzerland, School of Business
Peter Merian-Strasse 86, 4052 Basel, Switzerland
stephan.juengling@fhnw.ch, angelin.hofer@students.fhnw.ch
Abstract silo-type of applications and far away from being seamlessly
While in the manufacturing industry robots do the majority integrated into the white-collar business processes.
of the assembly tasks, robotics process automation, where However, in some business domains such as banking or
software robots are taking over repetitive tasks from humans insurance companies, new technologies such as robotics
have been introduced only recently. Many routine tasks
process automation or the usage of customer facing chat bots
continue to be executed without adequate assistance from
tools that would be in reach of the current technical gained momentum.
capabilities of AI. Using the example of taking meeting Robotics process automation is increasingly used to
minutes, the paper presents some intermediate results of the automate tedious repetitive tasks where humans currently
capabilities and problems of currently available natural transfer information from one application front end to the
language processing systems to automatically record meeting
other. Especially in cases, where the backend integration of
minutes. It further highlights the potential of optimizing the
allocation of tasks between humans and machines to take the two systems is too cumbersome or takes too much
particular strengths and weaknesses of both into account. In implementation effort, RPA can be setup to quickly take
order to combine the functionality of supervised and over. Although, the complexity and stability of the resulting
unsupervised machine learning with rule-based AI or IT software architectures might be questioned, rule-based
traditionally programmed software components, the
AI and RPA is increasingly used for process automation. In
capabilities of AI-based system actors need to be
incorporated into the system design process as early as consequence, there is a fortunate side effect from regulatory
possible. Treating AI as actors enables a more effective requirements (e.g. implementing the four eyes principle,
allocation of tasks upfront, which makes it easier to come up clarify responsibilities in case of failures) resulting in an
with a hybrid workplace scenario where AI can support increasing pressure to establish explicit guidelines for the
humans in doing their work more efficiently.
supervision and governance of AI-driven processes in
enterprises and the collaboration between humans and
Introduction software robots in general.
Most physical goods such as cars are predominantly
manufactured by industrial robots. The international Human computer interaction patterns
federation of robotics states that the worldwide robot density
Although the current potential of AI in general and NLP
is currently 74 robot units per 10’000 employees (IFR,
(natural language processing) in particular would allow for
2018). The allocation of this workforce to blue-collar and
more human centricity, we still largely rely on constraints
white-collar workers is quite different. The automation of
from the past (Jüngling et al., 2018). Most office work and
tasks from blue-collar workers in manufacturing sites is still
human computer interaction (HCI) are still mainly done
much more common than the automation for document
with the mouse and the keyboard, which is older than 50 or
centric tasks of white-collar workers in enterprise back-
150 years respectively. Beyond leveraging HCI with NLP
offices. The potential of machine learning and knowledge
as a more human-centric interface, the system designs in
engineering is huge and a broad variety of different products
terms of input and output needs to be leveraged according to
and services enabled by AI are on the market to support
the fact that AI components learn and become active. AI-
humans with office tasks. The solutions are mostly isolated
driven system design where expert systems, knowledge
Copyright held by the author(s). In A. Martin, K. Hinkelmann, A. Gerber, Engineering (AAAI-MAKE 2019). Stanford University, Palo Alto,
D. Lenat, F. van Harmelen, P. Clark (Eds.), Proceedings of the AAAI 2019 California, USA, March 25-27, 2019.
Spring Symposium on Combining Machine Learning with Knowledge
repositories or deep learning capabilities of AI components • RQ4: To what extent does a speech recognizer separate
are taken into account goes much beyond the conventional multiple voices in a meeting?
type of thinking about HCI. Similar to cooperative robots, • RQ5: To what extent does a particular NLP component
so called cobots (Fast-Berglund et. al, 2016), which are extract information from the speech-to-text transcript?
designed to collaboratively produce physical goods,
supervised or unsupervised machine learning and rule-based
system components should be seen as active components in Current state of work
a collaborative workplace and could contribute substantially As a preliminary result from the master thesis (Hofer, 2018),
to the creation of digital goods such as meeting minutes. the following findings are distilled based on observations
with the following experimental setup. Several informants,
some of them regularly compile meeting minutes as part of
Example – Recording Meeting Minutes
their job, were asked to write meeting minutes of a “progress
Creating meeting minutes consumes time, money and meeting” video (Lauris Beinerts, 2018). The resulting data
resources. An average Fortune 500 company with around collection was compared to each other as well as to the
50000 employees spends 5 Million USD annually on the reference meeting minutes. Every participant got the same
creation of meeting minutes (IBM, 2018). template and the task to capture the most important
Meetings are very important planning and coordination information such as decisions and action items. The process
vehicles for organizations in order to mutually communicate for automated creation of meeting minutes suggest to
and track status information to meet the overall business capture first and analyze later, the first step is to create a
goals. Meetings that bind n resources simultaneously are n- speech-to-text (STT) transcript of the meeting. Two
times more expensive than individual work. In many cases, different systems, Otter.AI and Watson STT Service were
at least some of the participants consider it as waste of time, used to capture the speech (IBM Watson, 2018; Otter.ai,
especially if the meetings are too long or unstructured. On 2018). The automatically created transcripts were then
the other hand, structured meetings are real time-savers if compared to the reference transcript of the video using a
the right people meet at the right time for the right outcome, plagiarism software (Copyleaks, 2018). Identical parts of
and helps to reach a certain objective. In these cases, each transcript were analyzed and compared to generate the
meetings are very valuable for organizations and need desired result. The accuracy ranged from 87-95% compared
appropriate documentation for later recall and information to the reference transcript.
retrieval. The sooner meeting minutes are available with the • RQ1: Main difficulties occur during and after the
relevant information, decisions and action items, the better meeting. During the meeting, the main issue is the speed
participants and absent co-workers can start working on the of the utterance and after the meeting the reconstruction
action items. Thus, early distribution of the meeting minutes of the completeness of decisions and action items as well
can boost productivity. as the timely distribution of the minutes. Preliminary
However, recording meeting minutes is very challenging results from the feedback of the informants show that over
during the meetings as well as time-consuming for later 70% report difficulties to capture the content of the
rework and consolidation. While recording, it is difficult to meeting. A majority of 57.1% think that their minutes
follow the conversation flow because participants speak at a only partly represent the meeting, while 14.3% fully and
greater pace than it is possible to take notes. Moreover, the 28.6% do not agree that their minutes reflect the content
minute taker is absorbed and cannot actively participate in of the meeting.
the meeting. All of these aspects tend to lead to inaccurate, • RQ2: An initial analysis and functional decomposition of
incomplete and inconsistent meeting minutes. That said, the feedback forms led to the following most desired
automated recording and writing of meeting minutes would requirements. The particular system should be capable to
be of great value to enterprises and leads to the following recognize and separate speech from different participants,
research questions: transcribe speech to text and extract information. As most
important building blocks speech recognition, speech
separation, and information extraction were identified.
• RQ1: What problems can be identified in creating Less desired tasks were the tracking of action items,
meeting minutes? organizing the meeting minutes and the distribution to the
• RQ2: What are the requirements for an information participants.
system to create meeting minutes? • RQ3: The extent of accurate speech recognition depends
• RQ3: To what extent can a speech recognition system on many aspects. In order to create meeting minutes, it is
support the speech to text transcription so that it still invaluable to have an accurate speech transcript in order
contains the relevant information from the meeting? to perform later processing steps (e.g. information
extraction). In addition, accuracy depends on the
particular STT system, where different products show active role that AI can have and be comparable to the role of
different performance. humans, such as shown in figure 1.
• RQ4: Speech separation means to allocate the different
text segments to their originating speakers in the
transcript. Thus, all speakers need to be recognized.
Overlapping speech and multiple sources of speech are
critical and well known as the "Cocktail Party" problem
(Settle et al., 2018; Yul et al., 2017). Further challenges,
such as the loudness and distance of the voices affect their
separation. For optimum results of speech separation,
multiple microphones are desirable. However, in the test
setup, only the mono-audio channel from the video was
used and in the preferred application scenario, a single
microphone of a mobile device (e.g. smart phone, tablet)
would be used as well. Several speech recognizers already
have speech separation integrated but usually are limited Figure 1 – AI-augmented application scenarios and HCI
by recognizing only two to three speakers. Preliminary
results prove that it still seems to be very difficult. None In consequence, additional AI based system actors lead to
of the examined speech recognizers came close to the additional swim lanes in UML activity diagrams, such as
actual speech segments of the reference transcript. shown in figure 2.
• RQ5: The desired NLP component that accurately
extracts action items is still missing. A possible approach
to extract relevant information is with “named entity
recognition” (NER), but no considerable results have
been found so far that would allow automating this task
entirely (Goyal, 2018). Reason 8, a smartphone app,
claims to extract decisions and action items (Reason8.ai,
2018). Preliminary tests have shown that it seems to be
difficult to extract the expected decisions and action
items. Moreover, in order to capture a meeting with
Reason 8 at least two devices are required.
Leverage human tasks with augmented AI
It not very surprising, that it is currently not possible to
generate meeting minutes solely based on AI. However, the
task could be decomposed into subtasks that can be allocated
to humans and machines in a cobot-like scenario. Both
parties could solve those parts of the problem where they are
most capable. In the case of taking meeting minutes, humans
that manually take notes struggle with the speed of typing
the sentences while AI based STT conversion would
outperform humans by far. On the other hand, speaker
recognition is not a problem for humans while AI
components are still inaccurate and fail in our preferred
application scenario.
Why current applications are not designed using the
potential of both? How could novel hybrid approaches be Figure 2 - AI system actors in UML activity diagrams
stimulated upfront? In many cases, requirements
engineering and high-level specification of system design This allows allocating the different activities explicitly to
start with a UML use case diagrams where all actors and use the different swim lanes and afore-mentioned AI-system
cases are identified. Even in this early design phase, AI actors are responsible to execute their activities. By
needs to be taken into account. Adding additional AI-system designing systems in such a way, the mutual collaboration
actors in UML use case diagrams would best represent the is stimulated and the “AI-first” scenario, which is said to be
the successors of “cloud-first” and “mobile-first” Discussion
application architectures, comes into play in a hybrid
workplace with an AI-augmented system design upfront. The number of RPAs, digital agents and software cobots is
In such a hybrid system design, one can better focus on the still way behind the number of physical robots and cobots in
different strengths and weaknesses of human and AI actors the manufacturing industry. Nevertheless, the role of AI in
in order to improve the effectiveness of current IT the digitalization process is increasing and the different
applications. business scenarios, where AI algorithms reach from cancer
In most scenarios, where AI tools are used today, the diagnostics in health care over fraud detection in banking to
functionality is embedded in silo-type of applications. Data voice-based digital assistants in the consuming sector.
scientists are using the functionality provided by the However, AI algorithms are rarely seen as active
graphical user interface (GUI) of their business intelligence participants in most use cases. The focus of the system
tools. In the case of solving data classification problems, design should be leveraged from specifying the
experts train the classifiers with the help of tools, where data requirements of the components at design-time and include
sets are imported in order to train and apply the models. the specification of the business scenario at run-time, and
Knowledge engineers are manually building up ontologies how humans can delegate tasks to AI actors. Furthermore,
and rule-based semantic models for specific business not only the ability to act but also the ability to learn are new
domains, which are executed in their workbenches. Such features of software components which definitely have an
scenarios are time consuming and resource intensive and it impact on the design of the different run-time business
would be a relief to delegate the knowledge base scenarios. Treating AI as system actors can be visualized at
construction process (KBC) to a deep learning system the early design phase with UML use case diagrams and
(Ratner, 2018). changes the way how HCI is perceived. An AI-augmented
Although many AI services are at hand that can be called system design is also visible in UML activity diagrams,
by appropriate APIs, they have to be considered as “black- where the activities are allocated according to the strengths
box” logic and are not suitable to combine with company and weaknesses of the different actors.
internal business logic, traditional software components In many cases, the current distribution of tasks between
(SC) and software design (SD) in cases where the data humans and AI actors can be optimized, as can be seen in
should stay on premises. It should be the goal to facilitate a the example of taking meeting minutes. Although an
seamless way of combining the different components more application that autonomously can take accurate meeting
interactively, and to design more hybrid systems, where the minutes is out of reach with the current technology at hand,
strengths and weaknesses of humans and AI can be allocated a hybrid scenario with a GUI that supports the collaboration
more effectively. with a software cobot, tedious tasks such as STT
In the case of creating meeting minutes, one should transcription, which is one of the mature components of
construct a GUI, which makes it possible to enable AI NLP, could easily be delegated to an AI actor. Even more,
actors. For cobots in the manufacturing industry, different the interactions on the GUI could be used as supervised
methods have been developed to plan the sharing of tasks learning for speaker recognition and built into the
(Michalos, 2018). Similar attempts should be done for application itself. Rule-based systems could be established
software cobots. Although hybrid physical and AI that help to distinguish the different formats of meeting
augmented software co-workspaces might deal with similar minutes, that could be necessary for different meeting types
problems of orchestration, some of the physical constraints such as decision meetings, brainstorming sessions or even
are not present in software. Typical problems of using specialized meetings for a scrum team. In cases where the
cobots, such as separating the working areas from robots and quality of the speaker recognition is currently insufficient,
humans due to security and safety regulations are not the system could learn it over time if appropriate AI actors
relevant for software. Aspects of ergonomics are replaced are incorporated at design-time in order to learn during run-
by user-friendly GUI design that enables to delegate time.
challenging tasks for humans such as STT transcription
quickly and seamlessly to AI actors. If the transcript is
Conclusion and Outlook
generated automatically and visualized as live transcript, the
minutes taker could highlight parts with pre-determined tags Compared to blue-collar workers, where the majority of
/ buttons for the speaker recognition, action item allocation, repetitive manufacturing tasks are delegated to robots, many
decisions detection or automatic text-summarization. white-collar workers still lack appropriate tool support
where at least some of the tasks can be delegated to
machines and a more optimized cooperative workplace with
humans and AI enabled system actors can be designed. As
demonstrated with the example of taking meeting minutes,
humans have difficulties with the speed of the utterance International Federation of Robotics. (2018). Retrieved October
while currently available AI based STT transcription 30, 2018, from https://ifr.org/ifr-press-releases/news/robot-
density-rises-globally
systems are much more accurate. On the other hand, speech
Jüngling S., Lutz J., Korkut S. and Jäger J. (2018). Innovation
separation is easy for humans, but none of the two speech
Potential for Human Computer Interaction Domains in the Digital
recognizers came even close to the actual speech segments, Enterprise. In Dornberger R. (eds) Business Information Systems
which confirms the well-known “cocktail party” problem. and Technology 4.0. Studies in Systems, Decision and Control, vol
In conclusion, more efficient systems could be designed that 141. Springer, Cham
take the particular strengths and weaknesses of humans and Beinerts L. (2018). The Expert: Progress Meeting (Short Comedy
AI-based software components into account. By starting the Sketch) - YouTube. Retrieved June 5, 2018, from
system design with use cases where the different AI based https://www.youtube.com/watch?v=u8Kt7fRa2Wc
components are taken into account as system actors, the McCann, D. (2016). Robots, robots everywhere. In CFO Magazine
(September 15). from
capabilities of rule-based AI as well as machine learning can
http://ww2.cfo.com/applications/2016/09/robots-robots-
be taken into account explicitly and up-front. Later during everywhere/
the systems design, the activities can be allocated to the most Michalos G., Spiliotopoulos J., Makris S., and Chryssolourislem
appropriate system actors and the lanes, and the activity G. Solving. 2018. A method for planning human robot shared
diagram can be used as design methodology to optimize the tasks, In CIRP Journal of Manufacturing Science and Technology,
interactive workplace of humans and cobots. https://doi.org/10.1016/J.CIRPJ.2018.05.003
In some application areas such as autonomous driving Mohan, M. and Bhat, A. (2018). Joint Goal Human Robot
cars, AI has already the role of an assistive technology. collaboration from Remembering to Inferring, 8th Annual
International Conference on Biologically Inspired Cognitive
Although AI has the potential to replace human drivers in
Architectures, BICA-2017
the long run, the current application scenario is hybrid and
Otter.ai. (2018). Otter Voice Notes. Retrieved June 11, 2018, from
consists of a mutual collaboration between human and AI https://otter.ai/login
actors. Other than the Turing test, where the focus is towards
Ratner, A. and Ré, C. (2018). Knowledge Base Construction in the
building a machine that cannot be distinguished from a Machine-learning Era. Queue – Machine Learning.
human being, the goal should be to build cobots with https://doi.org/10.1145/3236386.3243045
capabilities that can complement those of humans. Building Reason8.ai. (2018). reason8 with your meeting partners – for
systems that combine the capabilities of traditional software project managers, assistants, business analysts and everyone
components, knowledge engineering and machine-learning making notes & follow-ups. Retrieved October 29, 2018,
components more seamlessly can help reshaping the from https://reason8.ai/
traditional HCI in a way where humans can benefit the most. Villani, V., Mechatronics. (2018).
https://doi.org/10.1016/j.mechatronics.2018.02.009
Humans can focus on valuable activities they can
accomplish better than machines and benefit from
delegating tedious tasks to machines where they are
superior.
References
Fast-Berglunda, A.,Palmkvistb, F., Nyqvista, P., Ekereda, S. and
Akermana, M. (2016). Evaluating Cobots for Final Assembly. In
6th CIRP Conference on Assembly Technologies and Systems.
p.175-180
Goyal, A. Gupta, V., Kumar and Recent, M. 2018. Named Entity
Recognition and Classification techniques: A systematic review.
PG Department of Information Technology, GGDSD College,
Chandigarh, India
Hofer, A. (2018). Towards a Meeting Minutes Assistant. Master
thesis. Master of Science in Business Information Systems, School
of Business, University of Applied Sciences Northwestern
Switzerland, Olten, Switzerland
IBM Watson. (2018). Speech to Text Demo. Retrieved June 11,
2018, from https://speech-to-text-demo.ng.bluemix.net/
IBM. (2018). Terminuter - automated cognitive meeting minutes.
Retrieved May 3, 2018, from https://terminuter-demo.eu-
gb.mybluemix.net/