<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Domain Specific Modeling Language for Model-Based Design of Voice User Interfaces</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Claudia</forename><surname>Steinberger</surname></persName>
							<email>claudia.steinberger@aau.at</email>
							<affiliation key="aff0">
								<orgName type="institution">Universität Klagenfurt</orgName>
								<address>
									<settlement>Klagenfurt</settlement>
									<country key="AT">AUSTRIA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christian</forename><surname>Kop</surname></persName>
							<email>christian.kop@aau.at</email>
							<affiliation key="aff0">
								<orgName type="institution">Universität Klagenfurt</orgName>
								<address>
									<settlement>Klagenfurt</settlement>
									<country key="AT">AUSTRIA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Domain Specific Modeling Language for Model-Based Design of Voice User Interfaces</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">971EB99F00B9FA228761A2636C762F04</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T05:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Domain Specific Modeling Language</term>
					<term>Intention Modeling</term>
					<term>Active Assistance</term>
					<term>Voice User Interface</term>
					<term>Voice-Based System</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Designing a voice user interface (VUI) can become more challenging than designing a graphical user interface (GUI). Without visual interaction elements, a user can be less bound to predefined interaction regulations and restrictions. Concrete user requests and system responses in a dialog strongly depend on the initial intention of the user and the user's utterances during the dialog, to which the voice-based system has to respond. The aim of this paper is to present an intention-oriented approach to the design of VUIs. This is achieved in particular by defining and applying RIML, a domain specific modeling language, which enables VUI designers to create platform-independent intention models. A RIML model includes to which requests a VUI should respond to, what intentions are involved, how the system handles user requests and responses, and what to do in case of misunderstanding or failure. Based on a RIML model, a voice-based system is able to communicate flexibly via its VUI with the user. We show this using the example of AYUDO Voice, a language assistant for personal health management</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In our days voice-based system technologies are advancing rapidly. Their voice user interfaces (VUIs) allow a user to interact with a system in the form of natural language inquiries. This enables a hands-free and eyes-free manner in which users can interact with the system in an intuitive way. However, a positive user experience is necessary for VUIs to be accepted. Thus, the design of usable VUIs plays a major role in the system development process.</p><p>In contrast to graphical user interfaces (GUIs), VUIs have no visible interface. Instead of clicking buttons and selecting options from dialog boxes, users make their voice requests and respond to questions by voice <ref type="bibr" target="#b19">[20]</ref>. Their concrete requests and answers depend on their intentions <ref type="bibr" target="#b23">[24]</ref>. By an intention we understand what purpose a user wants to achieve through the interaction with a voice-based system. From the point of view of a voice-based system, which wants to fulfill certain intentions, we call them request intentions.</p><p>Without visual interaction elements a user can be less bound to predefined interaction regulations and restrictions. The interaction must therefore be as natural as possible, not strictly sequential and flexible, because there are many ways in which a user can articulate his or her request intention. Existing interaction-element-oriented approaches for designing GUIs (e.g. <ref type="bibr" target="#b0">[1]</ref>[2] <ref type="bibr" target="#b2">[3]</ref>) are not well suited for designing VUIs. We believe that VUIs must be designed intention-oriented to ensure a high level of usability.</p><p>The aim of this paper is to present an intention-oriented approach to the design of VUIs. Therefore, we present RIML (Request Intention Modeling Language), a domain specific modeling language, which is dedicated to creating intention models. A RIML model includes to which requests a voice-based system should respond, what intentions are involved, how the system deals with the users' requests and responses, and what has to happen in case of misunderstanding or failure. Thus, it conceptually represents those user intentions that the VUI should support -together with their corresponding features to be able to do so. With the help of RIML, an Intention Designer can specify the expected request intentions to a future voice-based system, independent of a specific platform or technology. In this paper, we also sketch RIML-Modeler, a tool to support the creation of RIML models. Based on a RIML model, a voice-based system with a model driven architecture <ref type="bibr" target="#b17">[18]</ref> is able to communicate via its VUI with the user flexibly, to request required inputs from the user, to interpret answers, to recognize and to feedback voice inputs that are not understandable or incorrect. As a case study throughout the paper, we use the VUI that we are working on in the AYUDO project <ref type="bibr" target="#b18">[19]</ref>.</p><p>The paper is further structured as follows: Chapter 2 deals with challenges in designing VUIs and presents a simplified voice-based system architecture model. Chapter 3 sketches AYUDO, a voice-based system for personal health management, used as a case study in this paper. Chapter 4 presents our approach in the MOF metamodeling hierarchy and introduces the metamodel of RIML. Then it exemplifies an excerpt of the AYUDO RIML model as a use case. Chapter 5 summarizes the results of the paper and provides an outlook to further research work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Challenges of Modeling Voice User Interfaces Systems</head><p>One of the main reasons VUIs are so fascinating is because verbal conversation is a natural form of communication for people <ref type="bibr" target="#b19">[20]</ref>. Thus, VUIs can particularly reduce barriers for the elderlies or for impaired people. As a result, there exists a trend from systems where the screen is displayed first to voice-based systems <ref type="bibr">[6][25]</ref>.</p><p>The architecture of a voice-based system <ref type="bibr" target="#b20">[21]</ref> contains several modules as presented in Figure <ref type="figure" target="#fig_0">1</ref>: a speech recognition module interprets an oral utterance of the user and transforms it into a textual representation (STT -Speech to Text), which is handed over to the interaction management. There, first the intention recognition module analyzes this text to find a match with an appropriate request intention. Once the intention is recognized, the dialog management module together with the action execution module checks the actual context memory. The context memory contains the information the user has already communicated to the system. The intention model represents the knowledge that the interaction management has about the intentions of its users. If more information is needed from the user to fulfill the request intention, the system prompts the user to communicate it in a loop. The action execution module hands over the prompt to the response generation module, which again verbalizes it into a natural language result-prompt. This text is then handed over to the speech synthesis module, which produces an acoustic output to the user. Finally, if all information to fulfill a request intention is available, the intended command and its associated parameters are sent via the API to a web service endpoint for processing and the result is communicated again to the user. While GUIs are tied to the screen and the keyboard of a device, like a desktop, a tablet or a smartphone and their visible interaction elements, voice however, is ubiquitous. VUI is surface independent and not tied to visible interaction elements. This requires a new design approach, that considers the situation and context in which the user is at that moment. From the perspective of a designer, there are subtle but strong differences between VUIs and GUIs. VUIs face the following four challenges <ref type="bibr" target="#b3">[4]</ref>[5][6]:</p><p>1. Users do not know what they can or cannot ask the system: A VUI user does not want to learn a load of different commands, s/he just wants to say whatever naturally comes to mind. VUIs need to understand different ways users might say the same thing. Language is diverse, complex and nuanced. When designing a successful voice experience, it is important to consider the many variations that we humans use to say the same thing. This is completely different to how one would handle this on a GUI. 2. Once the machine says a response, it stays only in the short-term memory of the user and has no persistence: For users, VUIs exist only in their minds. Hence, users need to concentrate more to listen to what the VUI is saying. Therefore, a voice experience must be designed to reduce cognitive load as much as possible <ref type="bibr" target="#b7">[8]</ref>. VUIs are also not always better suited than GUIs: It is much quicker to use voice to ask a question, or input information, than it is to type a request on a keyboard while GUIs are more efficient for outputting information. Sometimes it makes perfect sense to combine both types. 3. Unnecessary information in VUIs is much costlier, in terms of cognitive load, than it is in GUIs: Designers have to be much more reckless in dealing with information they exclude from interaction. The dialogue with a voice-based system should please the user instead of frustrating him.</p><p>4. VUIs enable a flat navigation: An important difference designing VUIs is the need to rethink navigation and information architecture. In GUIs, users follow a click path to get through the individual screens. The voice enables "flat navigation" where users can go directly to where they want to go. This allows the user to find things quicker, providing a more efficient experience than with a GUI.</p><p>But how to design for this? When designing for GUIs, you often begin by mapping out the logical flow of the pages and steps a user can go through based on interactive interaction elements (i.e., a graph-based UI model <ref type="bibr" target="#b0">[1]</ref>[2] <ref type="bibr" target="#b2">[3]</ref>). Consequently, existing GUI modeling approaches are not well suited to model VUIs. Designing for voice is different from designing for web and mobile <ref type="bibr" target="#b19">[20]</ref>. Voice lets users get right down to what they want. So, designers must abstract and think about the interactions and the user intentions as a whole.</p><p>Commercial voice technology platforms use their own cloud services for the interaction management of their VUIs. Amazon Alexa skills use skill interaction models that define the intents a skill can handle and the utterances that users should say to invoke them. For the design of Alexa skills, Amazon recommends to use a frame-based UI model <ref type="bibr" target="#b4">[5]</ref>, which helps to manage a dialogue in a way that users 'jump around' to get the information they need. However, this approach is integrated into the Alexa Skills Kit <ref type="bibr" target="#b8">[9]</ref>. Alexa's skill interaction models are completely platform dependent and their voice services as well as their intention recognition services are available only via the Amazon Cloud. Thus, Alexa's compliance with data privacy is controversial.</p><p>The goal of this paper is to overcome this platform dependent approaches. We are going to present a domain-specific modeling language that enables the design of platform independent intention models, which can be used after a transformation for both, commercial platforms and private-by-design VUIs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">AYUDO Use Case</head><p>As a use case for our approach, we refer to a voice-based system we are currently working on. This chapter therefore introduces the AYUDO project as a use case and presents the particular challenges we faced when designing a VUI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">AYUDO Project</head><p>AYUDO aims to develop an active assisted living (AAL) system to help elderly or chronically ill people to improve their personal health and well-being and to support them in remaining independent in their familiar environment as long as possible. The AYUDO project is financed by the FFG <ref type="foot" target="#foot_0">1</ref> and runs from 2019 to 2022 <ref type="bibr" target="#b18">[19]</ref>. AYUDO supports the user in documenting and monitoring his or her own state of health. Figure <ref type="figure">2</ref> sketches the overall architecture of the AYUDO system: it consists of AYUDO Voice, a mobile application with graphical display functionality that the user can operate mainly by voice, AYUDO Admin, a GUI to set up user preferences and to configure the system, the AYUDO Core System (MCA architecture) and the Integration Interface to couple context-based middleware systems and external health records (ELGA<ref type="foot" target="#foot_1">2</ref> ) to AYUDO. Altogether, the AYUDO system aims to motivate the target group to behave health consciously based on the captured and documented context data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig. 2. AYUDO architecture</head><p>In the following, we focus on AYUDO Voice. The design of AYUDO Voice is used in a use case in Chapter 4. For example, users should be able to document measured vital parameters, their medication, their nutritional behavior and subjective vital parameters such as their daily mood on a voice-based basis. Analyses of the documented data and motivations for health-conscious behavior are also carried out verbally in dialogue form. An example of such a verbal interaction is: "AYUDO, I have measured my blood sugar now" with the intention to automatically document the measured value with all the data that goes with it. AYUDO Voice then starts a natural dialogue to fulfill the user's wishes, considering the challenges described in Chapter 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Challenges of AYUDO Voice Design</head><p>Documentation and monitoring of health data in this domain are itself challenging, since elderlies suffer from age-related limitations <ref type="bibr" target="#b9">[10]</ref>. For instance, elderly people have issues in hearing, their visual abilities decrease, their touch perception diminishes and it becomes hard for elderlies to follow and process complex information due to issues with memory. In addition, this target group does not belong to the group of socalled "digital natives". So, a good mix of interaction possibilities with the AYUDO system is required so that these persons can easily document and retrieve their personal health data. Since elderly people also suffer under visual impairments, we also try to focus on this target group. However, this focus also has its limits. If the visual impairment is too progressed and for the target group of blind users, voice user interfaces can raise further challenges according to <ref type="bibr" target="#b13">[14]</ref>, since these users have additional requirements.</p><p>Health data is also very sensitive data with respect to Article 9 of GDPR <ref type="bibr" target="#b10">[11]</ref>. Therefore, it has to be ensured that processing the data is based on informed and explicit consent, and the transmission is secured (authenticated and encrypted). Furthermore, both the natural language services and the AYUDO knowledge base should not be located somewhere in a cloud, but on the user clients locally or on a separate project server. Our initial analysis has shown that most providers of voice-based systems use a cloud solution where the user has little control over where the data is stored <ref type="bibr" target="#b22">[23]</ref>. For that reason, big commercial players like Amazon Alexa or Google Assistant could not be considered as platforms for AYUDO Voice. We even had to discard our first choice "SNIPS", a software solution powering private-by-design voice assistants <ref type="foot" target="#foot_2">3</ref> . The Snips Console is no longer available due to acquisition by SONOS.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig. 3. AYUDO Voice Architecture</head><p>We therefore decided to develop our own private-by-design solution based on the architecture shown in Figure <ref type="figure">3</ref>. To keep the user's privacy according to his or her utterances to AYUDO Voice, we keep the STT and TTS-synthesis locally on the user's client device (e.g. a tablet computer). We use Pocketsphynx <ref type="bibr" target="#b15">[16]</ref> and Android Speech TTS <ref type="bibr" target="#b14">[15]</ref> as open source toolkits for speech recognition/generation, but these components are modularly interchangeable. The interaction management is located on our own AYUDO server. The transmission between AYUDO Voice and the Voice Interface API of AYUDO server is secured. Business logic is kept separate from interaction management in a separate domain knowledge module. Currently, we are developing the interaction management module for AYUDO. However, in order to be able to stay flexible and extendable in future, we wanted to separate the design of the AYUDO intention model from the concrete realization of the interaction management. Therefore, we developed a domain specific modeling language (DSML) for the model-based design of a VUI with the goal to specify the intention model in descriptive form including the request intentions, the user utterances (i.e. what a user can say), existing constraints and corresponding system responses (i.e. what the system can return as output) for our domain. This DSML can be used beyond to create intention models independently from (commercial) platforms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Model based Design of Voice User Interfaces</head><p>A DSML is designed for exclusive use in a certain domain and for specific purposes <ref type="bibr" target="#b25">[26]</ref>. Using the MOF 4-level metamodel hierarchy as a basis <ref type="bibr" target="#b6">[7]</ref>[17] <ref type="bibr" target="#b17">[18]</ref>, a DSML is an extension of M3 and a metamodel for M1. That means that the DSML is defined on level M2 using a metamodeling language provided on M3. On level M1, the DSML is used to create concrete models that are instantiated on level M0. Figure <ref type="figure" target="#fig_1">4</ref> shows the metamodel, which defines RIML at level M2 and the AYUDO intention model as an instantiation of RIML. This AYUDO intention model is used on level M0 for the detailed interaction management. In the next subchapters, we will explain RIML, as well as designing an RIML intention model and the intention model at runtime.  Each Intention has an Intention Type, e.g., 'ask', to get an answer to a question or 'do', to execute a certain transaction. Each Intention can have several instances of the class Argument. Arguments are variables, with which the user specifies something. Instances of Argument itself can be related to several Intentions. For instance, the abovementioned example intention "document my blood sugar" has the following arguments: blood sugar value, date of measuring, time of measuring, a flag if the user was on empty stomach or not. An Argument can have additional features (e.g., the Argument Type, Order of an Argument, Default value, the Validation Rules of an Argument). The metamodel contains the features as attributes or related classes.</p><p>Utterances represent the expected statements of the user in a dialogue to fulfill an intention. The class Utterance has the following subclasses: Intention Utterance, Filling Utterance and Confirmation Utterance. Intention Utterance abstracts statements the user is expected to use to request an intention at all. Instances of Filling Utterance represent the expected user responses, which s/he is expected to answer to the voicebased system's request regarding a certain argument of an intention. Instances of Confirmation Utterance represent accepted user responses, which affirm or deny a certain previously given message of the voice-based system.</p><p>Prompts represent the expected statements of the voice-based system in a dialogue to fulfill an intention. It is possible to model variations of Prompts to make a dialogue more diverse later. Prompt has several Subclasses: First, there are prompts related to an argument (Filling Prompt and Confirmation Prompt). Filling Prompts are requests for information about a certain argument. Instances of Confirmation Prompt either confirm information received from the user or represent follow-up requests for information about an argument. In addition, there exist prompts related to an Intention instance itself (Summary Prompt, Intermediate Confirmation Prompt and END Prompt). The intention designer can model instances of these classes to design the overall dialog of a specific intention instance.</p><p>Careful error handling is important for the acceptance of voice-based systems. Thus, the class Error Prompt is of special nature. An error in a dialogue can e.g., be caused because of the wrong input for a requested argument, the violation of permitted value ranges or the request of an unsupported user intention. Instances of Error Prompt are related to instances of the classes Argument Type and instances of Parameter that belong to instances of Validation Rule respectively. Additionally, if a request of a user cannot be resolved at all, the designer can define prompts for that purpose too and relate an Error Prompt instance to an Intention instance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4.2</head><p>The AYUDO Intention Model</p><p>One of the AYUDO project partners <ref type="foot" target="#foot_3">4</ref> has developed the form-based RIML-Modeler to create intention models with RIML. Figure <ref type="figure">6</ref> and Figure <ref type="figure">7</ref> show an excerpt from the AYUDO intention model which was modeled with the RIML-Modeler:</p><p>The instantiation of Application on level M1 is named "AYUDO". AYUDO includes several instances of Intention e.g., "document the blood sugar measurement" (see Figure <ref type="figure">6</ref>). We will use this intention to show how to model it, based on RIML. This intention has the Intention Type "Do", because data has to be collected and stored permanently in the personal health record of the user. For the considered intention, the following instances of Argument exist: "BS_Date" (date of measurement), "BS_Time" (time of measurement), "BS_Value" (blood sugar value) and "BS_FBS" (the flag that indicates if the person was on empty-stomach). Figure <ref type="figure">7</ref> shows the specification for the BS_Value. Instances of the classes Intention Utterance, Filling Utterance and Confirmation Utterances are sentence templates with references to Argument instances that have to be filled. These references are specified in form of curly brackets, e.g. "I have measured my blood sugar {BS_Time}" or "At {BS_Time}, I had a blood sugar value of {BS_Value}".</p><p>The template style is also used for the instances of the different Prompt subclasses. Examples of Confirmation Prompts are "Thank you, I understood {BS_Value} as your blood sugar value" or variants like "Ok, your blood sugar value is {BS_Value}". A Confirmation Prompt can also be expressed as a question for feedback (e.g., "You measured {BS_Value}?"). Typical instances of the class Filling Prompt are: "Ok, what value for your blood sugar did you measure"; "At which time did you measure the blood sugar value {BS_Value}"). It is even possible to model such instances of a Prompt class with SSML <ref type="bibr" target="#b21">[22]</ref>. Fig. <ref type="figure">6</ref>. The Intention "document the blood sugar measurement" modeled with the RIML-Modeler (excerpt) Fig. <ref type="figure">7</ref>. The Argument "Blood Sugar Value (BS_Value)" modeled with the RIML-Modeler (excerpt)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">AYUDO Voice and interaction management at runtime</head><p>Based on the AYUDO intention model presented in Chapter 4.2, a typical dialog between a user, in the following called Mary, and AYUDO Voice could look like sketched in Table <ref type="table" target="#tab_0">1</ref>. The colors of the rows in Table <ref type="table" target="#tab_0">1</ref> refer to the corresponding classes of the RIML metamodel in Figure <ref type="figure" target="#fig_2">5</ref>. The instances of these classes come from the AYUDO intention model (see Figure <ref type="figure">6</ref> and Figure <ref type="figure">7</ref>) and are used by the AYUDO interaction management to keep the dialogue flexible. Mary starts the dialogue and speaks to AYUDO Voice. The AYUDO Voice intention recognition component analyses her utterance based on its intention model with the result that it best matches the Intention Utterance instance "I have measured my blood sugar {BS_Time}" and concludes the corresponding intention "document the blood sugar measurement". From the word "now" the system concludes that Mary meant the current time. Therefore, the dialogue management tries to find out the missing argument values for this intention. Assuming, that the argument {BS_Value} (= blood sugar value) has the highest priority, it prompts Mary about it. Mary answers with the utterance "I measured the value 90". The interaction management tries to match this response with an instance of Filling Utterance and concludes the best match ("I measured the value {BS_Value}"). Now, the system knows that Mary meant the blood sugar value. The interaction management now uses the Confirmation Prompt "You measured {BS_Value}?" to check Mary's input. Mary answers and the system tries to find a match of her answer to an instance of Confirmation Utterance (i.e., "Oh no, sorry, I measured {BS_Value}").</p><p>Afterwards, the interaction management varies its response using once again an instance of Confirmation Prompt. This time, Mary confirms and the interaction management uses an instance of Filling Prompt to ask for information about Mary's nutrition state when she measured her blood sugar value ("Did you measure before meals?"). Mary agrees and the interaction management matches her utterance with an instance of Filling Utterance for this argument (BS_FBS). Using an instance of Summary Prompt, the interaction management summarizes what Mary has said and waits for Mary's confirmation. The dialog ends using the instance of the modeled END Prompt "Thanks, I stored this now. Let me know when I can help you again". With this prompt, the system tells Mary that the data was stored successfully. Afterwards, the system goes into the idle state. Conclusion and future work Designing VUIs differs from designing GUIs. Particularly, flexible and natural VUIs have to be designed intention oriented. Big players in language-based system technologies like Amazon and Google have created platform-specific cloud-based solutions and services to design interactions. However, they are not usable platform independently. Data protection and data privacy is often also a critical aspect in the realization of voice-based systems with these technologies.</p><p>In this paper, we presented the domain-specific modeling language RIML, which enables a platform independent design of intention models for voice-based systems. With RIML and our RIML-Modeler, intentions and corresponding dialogues can be described in a declarative manner. Based on a RIML model, model centered voicebased systems can keep the intention knowledge also local and protected. They can react flexibly and request the relevant information from the user in a natural way. In this paper, we used the domain of the AYUDO project as a use case.</p><p>The first AYUDO intention model has already been created with the RIML-Modeler (20 considered intentions have been modeled) and we are currently working on the development of the interaction management module of AYUDO (see Figure <ref type="figure">3</ref>). Within the next months, we will evaluate the user experience of AYUDO Voice with 10 to 15 test users and adapt our AYUDO intention model to their requirements with regard to the planned AYUDO functions.</p><p>In future, we also plan to extend the RIML-Modeler to transform RIML models to platform specific formats (e.g. skill interaction models for ALEXA) and to extend RIML with more context information to make a dialogue more fluent.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Simplified architecture model of a voice-based system</figDesc><graphic coords="3,124.80,207.48,345.84,147.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 4 :</head><label>4</label><figDesc>Fig. 4: RIML in the MOF metamodel hierarchy</figDesc><graphic coords="7,124.80,384.48,345.84,276.72" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. RIML-Metamodel</figDesc><graphic coords="8,138.96,287.40,345.84,307.44" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="5,124.80,207.48,345.84,184.08" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="10,124.56,315.12,347.52,293.04" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="11,124.80,147.48,345.84,240.72" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Dialogue Scenario between User Mary and AYUDO Voice</figDesc><table><row><cell>Role</cell><cell>Conversation</cell></row><row><cell>Mary</cell><cell>AYUDO, I have measured my blood sugar now</cell></row><row><cell>AYUDO</cell><cell>Ok, what value for your blood sugar did you measure?</cell></row><row><cell>Mary</cell><cell>I measured the value 90</cell></row><row><cell>AYUDO</cell><cell>You measured 90?</cell></row><row><cell>Mary</cell><cell>Oh no, sorry, I measured 95!</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">FFG: Österreichische Forschungsförderungsgesellschaft (https://www.ffg.at)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">ELGA: Elektronische Gesundheitsakte (https://www.gesundheit.gv.at/elga/inhalt)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Snips Homepage (https://snips.ai/, last accessed 2020/10/08).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">Groiss Informatics (https://www.groiss.com/en/)</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">MARIA: A universal, declarative, multiple abstraction-level language for service-oriented applications in ubiquitous environments</title>
		<author>
			<persName><forename type="first">'</forename><surname>Paterno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Santoro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Spano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">D</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Computer-Human Interaction (TOCHI)</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1" to="30" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Interaction flow modeling language: Model-driven UI engineering of web and mobile apps with IFML</title>
		<author>
			<persName><forename type="first">M</forename><surname>Brambilla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fraternali</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<publisher>Morgan Kaufmann</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">User interface modeling in UMLi</title>
		<author>
			<persName><surname>Da</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">P</forename><surname>Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">W</forename><surname>Paton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE software</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="62" to="69" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Will Voice Interactions Replace Screens?</title>
		<author>
			<persName><forename type="first">J</forename><surname>Dimaculangan</surname></persName>
		</author>
		<ptr target="https://ca-reefoundry.com/en/blog/ux-design/will-voice-replace-screens/" />
		<imprint>
			<date type="published" when="2019">2019. 2020/06/25</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<ptr target="https://build.amazonalexadev.com/how-building-for-voice-differs-from-screen-on-de-mand-webinar-registration-ww.html" />
		<title level="m">How Building for Voice Differs from Building for the Screen</title>
				<imprint>
			<date type="published" when="2020-10-08">2020/10/08</date>
		</imprint>
	</monogr>
	<note>Amazon Webinar</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Situational Design: How to shift from Screen First to Voice First Design</title>
		<author>
			<persName><forename type="first">P</forename><surname>Cutsinger</surname></persName>
		</author>
		<ptr target="https://build.amazonalexadev.com/vui-vs-gui-guide-ww.html" />
		<imprint>
			<date type="published" when="2018">2018. 2020/10/08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<ptr target="www.omg.org/cgi-bin/doc/?formal/02-04-03.pdf" />
		<title level="m">Object Management Group: Meta Object Facility (MOF) Specification</title>
				<imprint>
			<date type="published" when="2020-10-08">2020/10/08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Designing driver-centric natural voice user interfaces</title>
		<author>
			<persName><forename type="first">I</forename><surname>Alvarez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dunbar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Taiber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Gilbert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Adjunct Proceedings of the 3rd International Conference on Automotive User Interfaces and Interactive Vehicular Applications</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="156" to="159" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<ptr target="https://developer.amazon.com/en-US/alexa/alexa-skills-kit/get-deeper/sdk" />
		<title level="m">Alexa Skills Kit</title>
				<imprint>
			<date type="published" when="2020-10-08">2020/10/08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Design Principles to Accommodate Older Adults</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Farage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">W</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ajayi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hutchins</surname></persName>
		</author>
		<idno type="DOI">10.5539/gjhs.v4n2p2</idno>
	</analytic>
	<monogr>
		<title level="j">Global Journal of Health Science</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="2" to="25" />
			<date type="published" when="2012-03">2012. March 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<ptr target="https://eurlex.eu-ropa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679" />
		<title level="m">Regulation (EU) 2016/679 -General Data Protection Regulation</title>
				<imprint>
			<date type="published" when="2020-10-08">2020/10/08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Snips voice platform: an embedded spoken language understanding system for privateby-design voice interfaces</title>
		<author>
			<persName><forename type="first">A</forename><surname>Coucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Saade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ball</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Bluche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Caulier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Leroy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Primet</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1805.10190</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Privacy by design: leadership, methods, and results</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cavoukian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Data Protection: Coming of Age</title>
				<meeting><address><addrLine>Dordrecht</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="175" to="202" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Reading Between the Guidelines: How Commercial Voice Assistant Guidelines Hinder Accessibility for Blind Users</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Branham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rishin Mukkath Roy</surname></persName>
		</author>
		<idno type="DOI">10.1145/3308561.3353797</idno>
		<ptr target="https://doi.org/10.1145/3308561.3353797" />
	</analytic>
	<monogr>
		<title level="m">The 21 st International ACM SIGACCESS on Computers and Accessiblity</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="446" to="468" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<ptr target="https://developer.android.com/reference/an-droid/speech/tts/package-summary" />
		<title level="m">Android Speech TTS overview</title>
				<imprint>
			<date type="published" when="2020-10-08">2020/10/08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<ptr target="https://cmusphinx.github.io/" />
		<title level="m">CMUSphynx homepage</title>
				<imprint>
			<date type="published" when="2020-10-08">2020/10/08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<ptr target="https://www.omg.org/ocup-2/documents/Meta-ModelingAndtheMOF.pdf" />
		<title level="m">Meta-Modeling and the OMG Meta Object Facility (MOF)</title>
				<imprint>
			<date type="published" when="2017">2017. 2020/10/08</date>
		</imprint>
	</monogr>
	<note>white paper</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">HCM-L: domain-specific modeling for active and assisted living</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">C</forename><surname>Mayr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Al</forename><surname>Machot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Morak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ranasinghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shekhovtsov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Steinberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Domain-Specific Conceptual Modeling</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="527" to="552" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">-</forename><surname>Ayudo</surname></persName>
		</author>
		<ptr target="https://projekte.ffg.at/projekt/3311832" />
		<title level="m">FFG Projektdatenbank</title>
				<imprint>
			<date type="published" when="2020-10-08">2020/10/08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Designing voice user interfaces: principles of conversational experiences</title>
		<author>
			<persName><forename type="first">C</forename><surname>Pearl</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>O&apos;Reilly Media, Inc</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Large-scale personal assistant technology deployment: The siri experience</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Bellegarda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Annual Conference of the International Speech Communication Association</title>
				<meeting>the Annual Conference of the International Speech Communication Association</meeting>
		<imprint>
			<publisher>INTERSPEECH</publisher>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<ptr target="https://www.w3.org/TR/speech-synthesis11/" />
		<title level="m">Speech Synthesis Markup Language (SSML) Version 1.1</title>
				<imprint>
			<date type="published" when="2020-10-08">2020/10/08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Analysis of voice assistants in eHealth</title>
		<author>
			<persName><forename type="first">Mathias</forename><surname>Jesse</surname></persName>
		</author>
		<author>
			<persName><surname>Wolfgang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019-07">2019. July 2019</date>
		</imprint>
		<respStmt>
			<orgName>Universität Klagenfurt</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master Thesis</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">The UX of Voice: The Invisible Interface</title>
		<author>
			<persName><forename type="first">J</forename><surname>Amunwa</surname></persName>
		</author>
		<ptr target="https://www.dtelepathy.com/blog/design/the-ux-of-voice-the-invisible-in-terface" />
	</analytic>
	<monogr>
		<title level="m">Digital Telepathy</title>
				<imprint>
			<date type="published" when="2017">2017. 2020/10/08</date>
			<biblScope unit="volume">10</biblScope>
		</imprint>
	</monogr>
	<note>Recuperado el</note>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Voice technology isn&apos;t just a trend; it&apos;s a paradigm shift</title>
		<author>
			<persName><forename type="first">S</forename><surname>Holoubek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Bowling</surname></persName>
		</author>
		<ptr target="https://www.luminary-labs.com/insight/voice-technology-paradigm-shift/,lastac-cessed2020/10/" />
		<imprint>
			<date type="published" when="2019-08">2019. 08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Domain-specific modeling languages: requirements analysis and design guidelines</title>
		<author>
			<persName><forename type="first">U</forename><surname>Frank</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Domain Engineering</title>
				<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="133" to="157" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
