=Paper= {{Paper |id=Vol-2716/paper1 |storemode=property |title=A Domain Specific Modeling Language for Model-Based Design of Voice User Interfaces |pdfUrl=https://ceur-ws.org/Vol-2716/paper1.pdf |volume=Vol-2716 |authors=Claudia Steinberger,Christian Kop |dblpUrl=https://dblp.org/rec/conf/er/SteinbergerK20 }} ==A Domain Specific Modeling Language for Model-Based Design of Voice User Interfaces== https://ceur-ws.org/Vol-2716/paper1.pdf
                   Judith Michael, Victoria Torres (eds.): ER Forum, Demo and Posters 2020 3


 A Domain Specific Modeling Language for Model-Based
           Design of Voice User Interfaces

                          Claudia Steinberger and Christian Kop
                       Universität Klagenfurt, Klagenfurt, AUSTRIA
                 {claudia.steinberger,christian.kop}@aau.at

        Abstract. Designing a voice user interface (VUI) can become more challenging than
        designing a graphical user interface (GUI). Without visual interaction elements, a user
        can be less bound to predefined interaction regulations and restrictions. Concrete user
        requests and system responses in a dialog strongly depend on the initial intention of the
        user and the user’s utterances during the dialog, to which the voice-based system has to
        respond. The aim of this paper is to present an intention-oriented approach to the design
        of VUIs. This is achieved in particular by defining and applying RIML, a domain specific
        modeling language, which enables VUI designers to create platform-independent inten-
        tion models. A RIML model includes to which requests a VUI should respond to, what
        intentions are involved, how the system handles user requests and responses, and what to
        do in case of misunderstanding or failure. Based on a RIML model, a voice-based system
        is able to communicate flexibly via its VUI with the user. We show this using the example
        of AYUDO Voice, a language assistant for personal health management

        Keywords: Domain Specific Modeling Language · Intention Modeling · Active
        Assistance · Voice User Interface · Voice-Based System


1       Introduction

In our days voice-based system technologies are advancing rapidly. Their voice user
interfaces (VUIs) allow a user to interact with a system in the form of natural language
inquiries. This enables a hands-free and eyes-free manner in which users can interact
with the system in an intuitive way. However, a positive user experience is necessary
for VUIs to be accepted. Thus, the design of usable VUIs plays a major role in the
system development process.
  In contrast to graphical user interfaces (GUIs), VUIs have no visible interface. Instead
of clicking buttons and selecting options from dialog boxes, users make their voice
requests and respond to questions by voice [20]. Their concrete requests and answers
depend on their intentions [24]. By an intention we understand what purpose a user
wants to achieve through the interaction with a voice-based system. From the point of
view of a voice-based system, which wants to fulfill certain intentions, we call them
request intentions.
   Without visual interaction elements a user can be less bound to predefined interaction
regulations and restrictions. The interaction must therefore be as natural as possible,
not strictly sequential and flexible, because there are many ways in which a user can
articulate his or her request intention. Existing interaction-element-oriented approaches

Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
4   C. Steinberger and C. Kop


for designing GUIs (e.g. [1][2][3]) are not well suited for designing VUIs. We believe
that VUIs must be designed intention-oriented to ensure a high level of usability.
    The aim of this paper is to present an intention-oriented approach to the design of
VUIs. Therefore, we present RIML (Request Intention Modeling Language), a domain
specific modeling language, which is dedicated to creating intention models. A RIML
model includes to which requests a voice-based system should respond, what intentions
are involved, how the system deals with the users' requests and responses, and what has
to happen in case of misunderstanding or failure. Thus, it conceptually represents those
user intentions that the VUI should support - together with their corresponding features
to be able to do so. With the help of RIML, an Intention Designer can specify the ex-
pected request intentions to a future voice-based system, independent of a specific plat-
form or technology. In this paper, we also sketch RIML-Modeler, a tool to support the
creation of RIML models. Based on a RIML model, a voice-based system with a model
driven architecture [18] is able to communicate via its VUI with the user flexibly, to
request required inputs from the user, to interpret answers, to recognize and to feedback
voice inputs that are not understandable or incorrect. As a case study throughout the
paper, we use the VUI that we are working on in the AYUDO project [19].
    The paper is further structured as follows: Chapter 2 deals with challenges in de-
signing VUIs and presents a simplified voice-based system architecture model. Chapter
3 sketches AYUDO, a voice-based system for personal health management, used as a
case study in this paper. Chapter 4 presents our approach in the MOF metamodeling
hierarchy and introduces the metamodel of RIML. Then it exemplifies an excerpt of the
AYUDO RIML model as a use case. Chapter 5 summarizes the results of the paper and
provides an outlook to further research work.


2      Challenges of Modeling Voice User Interfaces Systems

One of the main reasons VUIs are so fascinating is because verbal conversation is a
natural form of communication for people [20]. Thus, VUIs can particularly reduce
barriers for the elderlies or for impaired people. As a result, there exists a trend from
systems where the screen is displayed first to voice-based systems [6][25].
The architecture of a voice-based system [21] contains several modules as presented in
Figure 1: a speech recognition module interprets an oral utterance of the user and trans-
forms it into a textual representation (STT - Speech to Text), which is handed over to
the interaction management. There, first the intention recognition module analyzes this
text to find a match with an appropriate request intention. Once the intention is recog-
nized, the dialog management module together with the action execution module checks
the actual context memory. The context memory contains the information the user has
already communicated to the system. The intention model represents the knowledge
that the interaction management has about the intentions of its users. If more infor-
mation is needed from the user to fulfill the request intention, the system prompts the
user to communicate it in a loop. The action execution module hands over the prompt
to the response generation module, which again verbalizes it into a natural language
                                  DSML for Model-Based Design of Voice User Interfaces     5


result-prompt. This text is then handed over to the speech synthesis module, which pro-
duces an acoustic output to the user. Finally, if all information to fulfill a request inten-
tion is available, the intended command and its associated parameters are sent via the
API to a web service endpoint for processing and the result is communicated again to
the user.




Fig. 1. Simplified architecture model of a voice-based system

    While GUIs are tied to the screen and the keyboard of a device, like a desktop, a
tablet or a smartphone and their visible interaction elements, voice however, is ubiqui-
tous. VUI is surface independent and not tied to visible interaction elements. This re-
quires a new design approach, that considers the situation and context in which the user
is at that moment. From the perspective of a designer, there are subtle but strong dif-
ferences between VUIs and GUIs. VUIs face the following four challenges [4][5][6]:

1. Users do not know what they can or cannot ask the system: A VUI user does
   not want to learn a load of different commands, s/he just wants to say whatever
   naturally comes to mind. VUIs need to understand different ways users might say
   the same thing. Language is diverse, complex and nuanced. When designing a suc-
   cessful voice experience, it is important to consider the many variations that we
   humans use to say the same thing. This is completely different to how one would
   handle this on a GUI.
2. Once the machine says a response, it stays only in the short-term memory of
   the user and has no persistence: For users, VUIs exist only in their minds. Hence,
   users need to concentrate more to listen to what the VUI is saying. Therefore, a
   voice experience must be designed to reduce cognitive load as much as possible [8].
   VUIs are also not always better suited than GUIs: It is much quicker to use voice to
   ask a question, or input information, than it is to type a request on a keyboard while
   GUIs are more efficient for outputting information. Sometimes it makes perfect
   sense to combine both types.
3. Unnecessary information in VUIs is much costlier, in terms of cognitive load,
   than it is in GUIs: Designers have to be much more reckless in dealing with infor-
   mation they exclude from interaction. The dialogue with a voice-based system
   should please the user instead of frustrating him.
6     C. Steinberger and C. Kop


4. VUIs enable a flat navigation: An important difference designing VUIs is the need
   to rethink navigation and information architecture. In GUIs, users follow a click
   path to get through the individual screens. The voice enables "flat navigation" where
   users can go directly to where they want to go. This allows the user to find things
   quicker, providing a more efficient experience than with a GUI.

   But how to design for this? When designing for GUIs, you often begin by mapping
out the logical flow of the pages and steps a user can go through based on interactive
interaction elements (i.e., a graph-based UI model [1][2][3]). Consequently, existing
GUI modeling approaches are not well suited to model VUIs. Designing for voice is
different from designing for web and mobile [20]. Voice lets users get right down to
what they want. So, designers must abstract and think about the interactions and the
user intentions as a whole.
    Commercial voice technology platforms use their own cloud services for the inter-
action management of their VUIs. Amazon Alexa skills use skill interaction models
that define the intents a skill can handle and the utterances that users should say to
invoke them. For the design of Alexa skills, Amazon recommends to use a frame-based
UI model [5], which helps to manage a dialogue in a way that users ‘jump around’ to
get the information they need. However, this approach is integrated into the Alexa
Skills Kit [9]. Alexa’s skill interaction models are completely platform dependent and
their voice services as well as their intention recognition services are available only via
the Amazon Cloud. Thus, Alexa’s compliance with data privacy is controversial.

The goal of this paper is to overcome this platform dependent approaches. We are going
to present a domain-specific modeling language that enables the design of platform
independent intention models, which can be used after a transformation for both, com-
mercial platforms and private-by-design VUIs.


3        AYUDO Use Case

As a use case for our approach, we refer to a voice-based system we are currently work-
ing on. This chapter therefore introduces the AYUDO project as a use case and presents
the particular challenges we faced when designing a VUI.

3.1      AYUDO Project
AYUDO aims to develop an active assisted living (AAL) system to help elderly or
chronically ill people to improve their personal health and well-being and to support
them in remaining independent in their familiar environment as long as possible. The
AYUDO project is financed by the FFG1 and runs from 2019 to 2022 [19]. AYUDO
supports the user in documenting and monitoring his or her own state of health. Figure
2 sketches the overall architecture of the AYUDO system: it consists of AYUDO Voice,
a mobile application with graphical display functionality that the user can operate

1 FFG: Österreichische Forschungsförderungsgesellschaft (https://www.ffg.at)
                                 DSML for Model-Based Design of Voice User Interfaces   7


mainly by voice, AYUDO Admin, a GUI to set up user preferences and to configure the
system, the AYUDO Core System (MCA architecture) and the Integration Interface to
couple context-based middleware systems and external health records (ELGA2) to
AYUDO. Altogether, the AYUDO system aims to motivate the target group to behave
health consciously based on the captured and documented context data.




Fig. 2. AYUDO architecture

    In the following, we focus on AYUDO Voice. The design of AYUDO Voice is used
in a use case in Chapter 4. For example, users should be able to document measured
vital parameters, their medication, their nutritional behavior and subjective vital param-
eters such as their daily mood on a voice-based basis. Analyses of the documented data
and motivations for health-conscious behavior are also carried out verbally in dialogue
form. An example of such a verbal interaction is: “AYUDO, I have measured my blood
sugar now” with the intention to automatically document the measured value with all
the data that goes with it. AYUDO Voice then starts a natural dialogue to fulfill the
user's wishes, considering the challenges described in Chapter 2.


3.2    Challenges of AYUDO Voice Design
Documentation and monitoring of health data in this domain are itself challenging,
since elderlies suffer from age-related limitations [10]. For instance, elderly people
have issues in hearing, their visual abilities decrease, their touch perception diminishes
and it becomes hard for elderlies to follow and process complex information due to
issues with memory. In addition, this target group does not belong to the group of so-
called “digital natives”. So, a good mix of interaction possibilities with the AYUDO
system is required so that these persons can easily document and retrieve their personal
health data. Since elderly people also suffer under visual impairments, we also try to

2 ELGA: Elektronische Gesundheitsakte (https://www.gesundheit.gv.at/elga/inhalt)
8     C. Steinberger and C. Kop


focus on this target group. However, this focus also has its limits. If the visual impair-
ment is too progressed and for the target group of blind users, voice user interfaces can
raise further challenges according to [14], since these users have additional require-
ments.
    Health data is also very sensitive data with respect to Article 9 of GDPR [11]. There-
fore, it has to be ensured that processing the data is based on informed and explicit
consent, and the transmission is secured (authenticated and encrypted). Furthermore,
both the natural language services and the AYUDO knowledge base should not be lo-
cated somewhere in a cloud, but on the user clients locally or on a separate project
server. Our initial analysis has shown that most providers of voice-based systems use a
cloud solution where the user has little control over where the data is stored [23]. For
that reason, big commercial players like Amazon Alexa or Google Assistant could not
be considered as platforms for AYUDO Voice. We even had to discard our first choice
“SNIPS”, a software solution powering private-by-design voice assistants3. The Snips
Console is no longer available due to acquisition by SONOS.




Fig. 3. AYUDO Voice Architecture

   We therefore decided to develop our own private-by-design solution based on the
architecture shown in Figure 3. To keep the user’s privacy according to his or her ut-
terances to AYUDO Voice, we keep the STT and TTS-synthesis locally on the user’s
client device (e.g. a tablet computer). We use Pocketsphynx [16] and Android Speech
TTS [15] as open source toolkits for speech recognition/generation, but these compo-
nents are modularly interchangeable. The interaction management is located on our
own AYUDO server. The transmission between AYUDO Voice and the Voice Interface
API of AYUDO server is secured. Business logic is kept separate from interaction man-
agement in a separate domain knowledge module. Currently, we are developing the
interaction management module for AYUDO. However, in order to be able to stay flex-
ible and extendable in future, we wanted to separate the design of the AYUDO intention



3
    Snips Homepage (https://snips.ai/, last accessed 2020/10/08).
                                 DSML for Model-Based Design of Voice User Interfaces     9


model from the concrete realization of the interaction management. Therefore, we de-
veloped a domain specific modeling language (DSML) for the model-based design of
a VUI with the goal to specify the intention model in descriptive form including the
request intentions, the user utterances (i.e. what a user can say), existing constraints and
corresponding system responses (i.e. what the system can return as output) for our do-
main. This DSML can be used beyond AYUDO to create intention models inde-
pendently from (commercial) platforms.


4      Model based Design of Voice User Interfaces

A DSML is designed for exclusive use in a certain domain and for specific purposes
[26]. Using the MOF 4-level metamodel hierarchy as a basis [7][17][18], a DSML is
an extension of M3 and a metamodel for M1. That means that the DSML is defined on
level M2 using a metamodeling language provided on M3. On level M1, the DSML is
used to create concrete models that are instantiated on level M0. Figure 4 shows the
metamodel, which defines RIML at level M2 and the AYUDO intention model as an
instantiation of RIML. This AYUDO intention model is used on level M0 for the de-
tailed interaction management. In the next subchapters, we will explain RIML, as well
as designing an RIML intention model and the intention model at runtime.




Fig. 4: RIML in the MOF metamodel hierarchy
10    C. Steinberger and C. Kop


4.1      The RIML-Metamodel
    A RIML model has the purpose to conceptualize to which requests a voice-based
system should respond, what intentions are involved, how the system deals with the
users' requests and responses, and what has to happen in case of misunderstanding or
failure. Figure 5 presents the meta-model of RIML: The core classes in RIML (some
of them are colored in Figure 5) are Intention, Argument, Intention Utterance, Filling
Utterance, Confirmation Utterance and Prompt. The class Intention enables to model
an expected request intention of a user (e.g., document the blood sugar measurement)
and represents a specific purpose a user wants to achieve through the interaction with
the voice-based system.




      Fig. 5. RIML-Metamodel

    Each Intention has an Intention Type, e.g., ‘ask’, to get an answer to a question or
‘do’, to execute a certain transaction. Each Intention can have several instances of the
class Argument. Arguments are variables, with which the user specifies something. In-
stances of Argument itself can be related to several Intentions. For instance, the above-
mentioned example intention “document my blood sugar” has the following arguments:
blood sugar value, date of measuring, time of measuring, a flag if the user was on empty
                                 DSML for Model-Based Design of Voice User Interfaces   11


stomach or not. An Argument can have additional features (e.g., the Argument Type,
Order of an Argument, Default value, the Validation Rules of an Argument). The met-
amodel contains the features as attributes or related classes.
    Utterances represent the expected statements of the user in a dialogue to fulfill an
intention. The class Utterance has the following subclasses: Intention Utterance, Fill-
ing Utterance and Confirmation Utterance. Intention Utterance abstracts statements
the user is expected to use to request an intention at all. Instances of Filling Utterance
represent the expected user responses, which s/he is expected to answer to the voice-
based system's request regarding a certain argument of an intention. Instances of Con-
firmation Utterance represent accepted user responses, which affirm or deny a certain
previously given message of the voice-based system.
    Prompts represent the expected statements of the voice-based system in a dialogue
to fulfill an intention. It is possible to model variations of Prompts to make a dialogue
more diverse later. Prompt has several Subclasses: First, there are prompts related to
an argument (Filling Prompt and Confirmation Prompt). Filling Prompts are requests
for information about a certain argument. Instances of Confirmation Prompt either con-
firm information received from the user or represent follow-up requests for information
about an argument. In addition, there exist prompts related to an Intention instance itself
(Summary Prompt, Intermediate Confirmation Prompt and END Prompt). The inten-
tion designer can model instances of these classes to design the overall dialog of a spe-
cific intention instance.
    Careful error handling is important for the acceptance of voice-based systems. Thus,
the class Error Prompt is of special nature. An error in a dialogue can e.g., be caused
because of the wrong input for a requested argument, the violation of permitted value
ranges or the request of an unsupported user intention. Instances of Error Prompt are
related to instances of the classes Argument Type and instances of Parameter that be-
long to instances of Validation Rule respectively. Additionally, if a request of a user
cannot be resolved at all, the designer can define prompts for that purpose too and relate
an Error Prompt instance to an Intention instance.


4.2    The AYUDO Intention Model
One of the AYUDO project partners4 has developed the form-based RIML-Modeler to
create intention models with RIML. Figure 6 and Figure 7 show an excerpt from the
AYUDO intention model which was modeled with the RIML-Modeler:
   The instantiation of Application on level M1 is named “AYUDO”. AYUDO includes
several instances of Intention e.g., “document the blood sugar measurement” (see Fig-
ure 6). We will use this intention to show how to model it, based on RIML. This inten-
tion has the Intention Type “Do”, because data has to be collected and stored perma-
nently in the personal health record of the user. For the considered intention, the fol-
lowing instances of Argument exist: “BS_Date” (date of measurement), “BS_Time”
(time of measurement), “BS_Value” (blood sugar value) and “BS_FBS” (the flag that
indicates if the person was on empty-stomach). Figure 7 shows the specification for the


4 Groiss Informatics (https://www.groiss.com/en/)
12   C. Steinberger and C. Kop


BS_Value. Instances of the classes Intention Utterance, Filling Utterance and Confir-
mation Utterances are sentence templates with references to Argument instances that
have to be filled. These references are specified in form of curly brackets, e.g. “I have
measured my blood sugar {BS_Time}” or “At {BS_Time}, I had a blood sugar value
of {BS_Value}”.
   The template style is also used for the instances of the different Prompt subclasses.
Examples of Confirmation Prompts are “Thank you, I understood {BS_Value} as your
blood sugar value” or variants like “Ok, your blood sugar value is {BS_Value}”. A
Confirmation Prompt can also be expressed as a question for feedback (e.g., “You
measured {BS_Value}?”). Typical instances of the class Filling Prompt are: “Ok, what
value for your blood sugar did you measure”; “At which time did you measure the blood
sugar value {BS_Value}”). It is even possible to model such instances of a Prompt
class with SSML [22].




Fig. 6. The Intention “document the blood sugar measurement” modeled with the RIML-
Modeler (excerpt)
                                DSML for Model-Based Design of Voice User Interfaces   13




Fig. 7. The Argument “Blood Sugar Value (BS_Value)” modeled with the RIML-Modeler (ex-
cerpt)


4.3     AYUDO Voice and interaction management at runtime
Based on the AYUDO intention model presented in Chapter 4.2, a typical dialog be-
tween a user, in the following called Mary, and AYUDO Voice could look like sketched
in Table 1. The colors of the rows in Table 1 refer to the corresponding classes of the
RIML metamodel in Figure 5. The instances of these classes come from the AYUDO
intention model (see Figure 6 and Figure 7) and are used by the AYUDO interaction
management to keep the dialogue flexible.

          Table 1. Dialogue Scenario between User Mary and AYUDO Voice

 Role        Conversation

 Mary        AYUDO, I have measured my blood sugar now

 AYUDO       Ok, what value for your blood sugar did you measure?

 Mary        I measured the value 90

 AYUDO       You measured 90?

 Mary        Oh no, sorry, I measured 95!
14   C. Steinberger and C. Kop


 AYUDO       Ok, is 95 ok?

 Mary        Oh, yes. This is ok!

 AYUDO       Did you measure before meals?

 Mary        Yes, exactly!

 AYUDO       You measured today at 10.30 am the blood sugar value 95 on empty stomach,
             ok?

 Mary        Ok

 AYUDO       Thanks, I stored this now. Let me know when I can help you again.


    Mary starts the dialogue and speaks to AYUDO Voice. The AYUDO Voice intention
recognition component analyses her utterance based on its intention model with the
result that it best matches the Intention Utterance instance “I have measured my blood
sugar {BS_Time}” and concludes the corresponding intention “document the blood
sugar measurement”. From the word “now” the system concludes that Mary meant the
current time. Therefore, the dialogue management tries to find out the missing argument
values for this intention. Assuming, that the argument {BS_Value} (= blood sugar
value) has the highest priority, it prompts Mary about it. Mary answers with the utter-
ance “I measured the value 90”. The interaction management tries to match this re-
sponse with an instance of Filling Utterance and concludes the best match (“I measured
the value {BS_Value}”). Now, the system knows that Mary meant the blood sugar
value. The interaction management now uses the Confirmation Prompt “You measured
{BS_Value}?” to check Mary’s input. Mary answers and the system tries to find a
match of her answer to an instance of Confirmation Utterance (i.e., “Oh no, sorry, I
measured {BS_Value}”).
    Afterwards, the interaction management varies its response using once again an in-
stance of Confirmation Prompt. This time, Mary confirms and the interaction manage-
ment uses an instance of Filling Prompt to ask for information about Mary’s nutrition
state when she measured her blood sugar value (”Did you measure before meals?”).
Mary agrees and the interaction management matches her utterance with an instance of
Filling Utterance for this argument (BS_FBS). Using an instance of Summary Prompt,
the interaction management summarizes what Mary has said and waits for Mary’s con-
firmation. The dialog ends using the instance of the modeled END Prompt “Thanks, I
stored this now. Let me know when I can help you again”. With this prompt, the system
tells Mary that the data was stored successfully. Afterwards, the system goes into the
idle state.
                                DSML for Model-Based Design of Voice User Interfaces       15


5      Conclusion and future work

Designing VUIs differs from designing GUIs. Particularly, flexible and natural VUIs
have to be designed intention oriented. Big players in language-based system technol-
ogies like Amazon and Google have created platform-specific cloud-based solutions
and services to design interactions. However, they are not usable platform inde-
pendently. Data protection and data privacy is often also a critical aspect in the realiza-
tion of voice-based systems with these technologies.
   In this paper, we presented the domain-specific modeling language RIML, which
enables a platform independent design of intention models for voice-based systems.
With RIML and our RIML-Modeler, intentions and corresponding dialogues can be
described in a declarative manner. Based on a RIML model, model centered voice-
based systems can keep the intention knowledge also local and protected. They can
react flexibly and request the relevant information from the user in a natural way. In
this paper, we used the domain of the AYUDO project as a use case.
   The first AYUDO intention model has already been created with the RIML-Modeler
(20 considered intentions have been modeled) and we are currently working on the
development of the interaction management module of AYUDO (see Figure 3). Within
the next months, we will evaluate the user experience of AYUDO Voice with 10 to 15
test users and adapt our AYUDO intention model to their requirements with regard to
the planned AYUDO functions.
   In future, we also plan to extend the RIML-Modeler to transform RIML models to
platform specific formats (e.g. skill interaction models for ALEXA) and to extend
RIML with more context information to make a dialogue more fluent.


References

1.   Paterno', F., Santoro, C.,Spano, L. D. (2009). MARIA: A universal, declarative, multiple
     abstraction-level language for service-oriented applications in ubiquitous environments.
     ACM Transactions on Computer-Human Interaction (TOCHI), 16(4), 1-30.
2.   Brambilla, M., Fraternali, P. (2014). Interaction flow modeling language: Model-driven
     UI engineering of web and mobile apps with IFML. Morgan Kaufmann.
3.   Da Silva, P. P., Paton, N. W. (2003). User interface modeling in UMLi. IEEE software,
     20(4), 62-69.
4.   Dimaculangan, J. (2019). Will Voice Interactions Replace Screens?. https://ca-
     reefoundry.com/en/blog/ux-design/will-voice-replace-screens/, last accessed 2020/06/25.
5.   Amazon Webinar, How Building for Voice Differs from Building for the Screen,
     https://build.amazonalexadev.com/how-building-for-voice-differs-from-screen-on-de-
     mand-webinar-registration-ww.html, last accessed 2020/10/08.
6.   Cutsinger, P. (2018). Situational Design: How to shift from Screen First to Voice First
     Design. https://build.amazonalexadev.com/vui-vs-gui-guide-ww.html, last access
     2020/10/08.
7.    Object Management Group: Meta Object Facility (MOF) Specification.
     www.omg.org/cgi-bin/doc/?formal/02-04-03.pdf, last accessed 2020/10/08.
16    C. Steinberger and C. Kop

8.    Alvarez, I., Martin, A., Dunbar, J., Taiber, J., Wilson, D. M., Gilbert, J. E. (2011). De-
      signing driver-centric natural voice user interfaces. In Adjunct Proceedings of the 3rd In-
      ternational Conference on Automotive User Interfaces and Interactive Vehicular Applica-
      tions. online (pp. 156-159).
9.    Alexa Skills Kit, https://developer.amazon.com/en-US/alexa/alexa-skills-kit/get-
      deeper/sdk, last accessed 2020/10/08.
10.   Farage M. A., Miller K. W., Ajayi F., Hutchins D. (2012). Design Principles to Accom-
      modate Older Adults. Global Journal of Health Science Vol. 4, No. 2,(March 2012),2-25.
      DOI: 10.5539/gjhs.v4n2p2
11.   Regulation (EU) 2016/679 - General Data Protection Regulation. https://eurlex.eu-
      ropa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679, last accessed
      2020/10/08.
12.   Coucke, A., Saade, A., Ball, A., Bluche, T., Caulier, A., Leroy, D., Primet, M. (2018).
      Snips voice platform: an embedded spoken language understanding system for private-
      by-design voice interfaces. arXiv preprint arXiv:1805.10190.
13.   Cavoukian, A. (2013). Privacy by design: leadership, methods, and results. In European
      Data Protection: Coming of Age (pp. 175-202). Springer, Dordrecht.
14.   Branham, S. M., Rishin Mukkath Roy, A. (2019). Reading Between the Guidelines: How
      Commercial Voice Assistant Guidelines Hinder Accessibility for Blind Users. The 21st
      International ACM SIGACCESS on Computers and Accessiblity. ACM, 446 – 468,
      https://doi.org/10.1145/3308561.3353797.
15.   Android Speech TTS overview, https://developer.android.com/reference/an-
      droid/speech/tts/package-summary, last accessed 2020/10/08.
16.    CMUSphynx homepage. https://cmusphinx.github.io/, last accessed 2020/10/08.
17.    Meta-Modeling and the OMG Meta Object Facility (MOF), white paper (2017).
      https://www.omg.org/ocup-2/documents/Meta-ModelingAndtheMOF.pdf, last accessed
      2020/10/08.
18.   Mayr H. C., Al Machot F., Michael J., Morak G., Ranasinghe S., Shekhovtsov V., Stein-
      berger C. (2016). HCM-L: domain-specific modeling for active and assisted living. In
      Domain-Specific Conceptual Modeling (pp. 527-552). Springer, Cham.
19.   FFG Projektdatenbank - AYUDO, https://projekte.ffg.at/projekt/3311832, last accessed
      2020/10/08.
20.   Pearl, C. (2016). Designing voice user interfaces: principles of conversational experi-
      ences. O'Reilly Media, Inc..
21.    J. R. Bellegarda (2013). Large-scale personal assistant technology deployment: The siri
      experience. In: Proceedings of the Annual Conference of the International Speech Com-
      munication Association, INTERSPEECH 2013.
22.   Speech Synthesis Markup Language (SSML) Version 1.1,
      https://www.w3.org/TR/speech-synthesis11/, last accessed 2020/10/08
23.   Jesse, Mathias Wolfgang (2019). Analysis of voice assistants in eHealth. Master Thesis,
      July 2019, Universität Klagenfurt.
24.   Amunwa, J. (2017). The UX of Voice: The Invisible Interface. Digital Telepathy. Recu-
      perado el, 10. https://www.dtelepathy.com/blog/design/the-ux-of-voice-the-invisible-in-
      terface. last accessed 2020/10/08.
25.   Holoubek,S., Bowling E. (2019). Voice technology isn’t just a trend; it’s a paradigm
      shift, https://www.luminary-labs.com/insight/voice-technology-paradigm-shift/, last ac-
      cessed 2020/10/08.
26.   Frank, U. (2013). Domain-specific modeling languages: requirements analysis and design
      guidelines. In Domain Engineering (pp. 133-157). Springer, Berlin, Heidelberg.