<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ModelByVoice - towards a general purpose model editor for blind people</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joa˜o Lopes</string-name>
          <email>jr.lopes@campus.fct.unl.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joa˜o Cambeiro</string-name>
          <email>jmc12976@campus.fct.unl.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasco Amaral</string-name>
          <email>vma@fct.unl.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DI FCT, Universidade NOVA de Lisboa</institution>
          ,
          <addr-line>Lisboa</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NOVA LINCS, DI FCT, Universidade NOVA de Lisboa</institution>
          ,
          <addr-line>Lisboa</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-Context: Current modelling technologies, with the support of modelling frameworks, are in the base of the current adoption of Model-Driven Software development - MDD - and supporting Software Engineering phases. Problem: The focus of these tools are solely on graphical support and visual models. In fact, the chosen modelling language's concrete syntax is either graphical or textual or both. This approach is discarding the use of other senses for modelling purposes, and, for instance, the possibility of blind software engineers to take advantage of modelling and deal with the abstractions captured by those. It is necessary to improve the productivity of people with limitations or disabilities while modelling.They should not be excluded from the modelling activity. This situation of accessibility barriers starts already at education of Modelling. Method: In this paper we present a prototype of a tool that aims to take advantage of current voice recognition and speech synthesis to edit models in diverse modelling languages. The elegance of this work is the fact that, not only it is meant to make MDD accessible to a broader spectrum of practitioners, but also it is developed with an MDD approach. Results: A prototype was built, named ModelByVoice. This tool is not bound to a particular modelling language, as long as it is meta-modelled. ModelByVoice is the base for a new tool that will enable MDD highlighting the relevant human factor of accessibility via voice and audio to models. Ultimately, it aims at bringing accessibility for blind people to deal with MDD and Domain Specific (Modelling) Languages - DS(M)Ls - the same way it is already done with diagrammatic languages with the current Modelling workbenches. Index Terms-Model-Driven Software Development, Modelling Workbenches, Accessibility, Speech Generation and Synthesis, Audio Models</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        A research study carried out by IBM in 2004 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] emphasises
the idea that modelling software is, and will continue to be,
in the light of many engineers’ perspective, the foundation for
dealing with the complexity of systems. The key is
Abstraction. Thus, the modelling activity proves to be fundamental,
since it allows not only a better understanding and clarification
of what one intends to develop but also to define a plan
with the requisites and necessary functionalities for the system
to be implemented. Ultimately, the abstractions captured by
the models are represented thanks to visual (mostly
diagrammatic) languages. Both general purpose and Domain Specific
(Modelling) Languages -DS(M)Ls- have a visual nature and
assume that in general, a software engineer will not initially
have physical limitations that prevent him from handling
with computers. However, if we consider visually impaired
people who need to use these modelling languages, it is very
complicated to use those languages due to the lack of adapted
software to these limitations. Typically, language developers
delegate the responsibility of supporting disabilities to
thirdparty general purpose software, which focuses on textual
reading, which may not be sufficient because the integration
with the modelling platforms can be cumbersome, not bringing
the required productivity in the modelling effort.
      </p>
      <p>
        According to Stack Overflows survey in 2018 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] of over
100,000 software developers that participated in this study, 1.4
% are blind or have difficulty seeing. The small percentage of
these professionals makes it economically non-viable to build
dedicated software, which justifies the lack of products not
only in the daily life but also to support for those software
engineers in their professional activity. Software modelling,
as one of the phases in the software development life-cycle,
is perhaps one of the most affected by this lack of support.
      </p>
      <p>
        The diagrammatic languages modelling workbenches such
as Eclipse GMF / EMF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], DSL Tools [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], AToMPM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
GME [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], MetaEdit+ [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], or their textual counterparts such
as Xtext [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for Eclipse or Meta-Programming System(MPS)
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], do not provide the support for blind people to model
diagrams through resources such as their voice or touch. The
concrete syntax of these languages focuses only on visual
aspects, that is, they require a visual analysis, ignoring other
senses like the sound and touch, which makes modelling
activity extremely difficult for the blind people or visually
impaired. We argue that Audio/Voice is yet another aspect
of Human interaction in modelling that should be properly
supported in modelling. Therefore, we propose to create a tool
to use voice synthesis and recognition in a more structured
way than just the one-dimensional approach of mapping text
syntax solutions (one-to-one) instead of using the natural 2D
structure of the models (mostly graphs) abstract syntax. This
approach should support easier manipulation of models for
the end-user. It must support edition operations, equivalent
to the ones already found in the textual/graphical editors,
like CRUD’s create, update, delete, and others like select,
navigate or query. The MDD (Model-Driven Development)
approach was the technique adopted for the creation of the
platform since this concept promotes the systematic reuse of
components. Beydeda et.al [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] describe the MDD technique
by saying that it is based on models that are considered the
main elements of the development process. These models aid
in thinking about the problem domain and design a resolution
in the solution domain, providing abstractions of a physical
system that allows engineers to focus on critical aspects of
the same system.
      </p>
      <p>In this paper, we will present our tool prototype, named
ModelByVoice, that allows the visually impaired people to
model diagrams or systems with any modelling language
(nonhardwired to a specific language), through their voice thanks to
a speech recognition system implemented on the tool. The tool
was created under the assumption that the supported
domainspecific languages (DSL) are meta-modelled, as the current
focus of the approach is on the language’s abstract syntax
instead of concrete syntax.</p>
      <p>The rest of this paper is organised as follows. In section II
we discuss the current efforts that to the best of our
knowledge contribute to tackling the problem towards supporting
modelling in software engineering for blind people. In section
III we present an overview of our prototype. In section IV we
discuss a preliminary assessment of our prototype tool. Finally,
in section V we conclude and discuss future challenges to
address.</p>
    </sec>
    <sec id="sec-2">
      <title>II. SATE-OF-THE-ART</title>
      <p>
        The current state-of-the-art does not provide tools that
make it possible for the visually impaired to model diagrams
satisfactorily. Some tools were developed to circumvent this
problem, but in the most of cases, the success rate was reduced.
As an example, we have the Technical Diagram Understanding
for the Blind, as known as TeDUB [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This project was
developed to provide a UML modelling tool accessible to
visually impaired individuals. Mainly a visualisation tool, the
users created and explored the diagrams through a joystick
or a keyboard [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which means that the creators of this
tool adopted a haptic communication approach for the domain
users. The models should be persisted in XMI format so that
it could work with models exported from tools like Rational
Rose or Poseidon UML. As the tool was a model navigator and
not a proper model editor, the project did not have the expected
success due to the poor adherence of the domain users, and
it frequently failed when had to handle a significant amount
of data. The project stopped in 2005 with a claim from the
authors that this sort of solutions [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] would have more success
if they were merely transformation tools to export the models
into HTML code, maintaining the links and use of plain text,
as it would be more inter-operable with current screen-readers
that already read web pages.
      </p>
      <p>
        Another approach to this problem, named PRISCA [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ],
tried to circumvent this obstacle through 3D printing. The
users through their touch, once confronted with the diagrams
in 3D, tried to interpret them, but the solution proved to
be expensive, slow and ineffective. Thus, given the reasons
stated above, the ModelByVoice developing process was very
challenging, motivating and innovating task, since the search
for new concepts and paradigms is almost always associated
with the construction of a more adjusted reality and following
the demands of today’s world.
      </p>
      <p>
        According to the documentary research carried out, the
tool that most closely resembles the features and resources of
the ModelByVoice is the VoiceToModel tool [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This tool
was designed to enable the visually impaired requirements
engineers to derive KAOS models, conceptual models, and
feature models. It uses mechanisms of speech recognition and
synthesis. The user, through his voice, enunciates the execution
commands associated with the platform to create the type of
model that he wants. Subsequently, an audio response is given
to the user, which serves as feedback associated with the voice
commands. Its great limitation compared to ModelByVoice, is
that it only allows the creation and editing of diagrams that are
composed for one language of the set of languages described
above, which means that, is limited to three languages, while
with ModelByVoice, theoretically, we can use any desired
language as long as it is metamodel is defined.
      </p>
    </sec>
    <sec id="sec-3">
      <title>III. SOLUTION OVERVIEW</title>
      <sec id="sec-3-1">
        <title>A. Architecture</title>
        <p>At the architecture level, the ModelByVoice platform is
implemented in three layers. The first layer is related to the
voice recognition process. The platform is continuously
capturing the sound generated by the surrounding environment.
When a speech utterance is detected, the sound is converted
to text, and the result is delivered to the second layer. The
second layer then tries to match the received user command
to an operation defined by the modelling language. If a valid
operation is detected, this layer is responsible for the execution
of the operation. The feedback mechanism is implemented
in the third layer, and the platform uses speech synthesis to
communicate the input commands results back to the users. In
situations where an invalid command is detected, the system
alerts the user, and it prompts the user to a new attempt.</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Technology</title>
        <p>
          To create the necessary basis to start the developing process
of the domain specific language, we used the Epsilon [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ],
which is an inter-operable and consistent set of languages
for model-driven engineering. This toolset is grounded on
the Eclipse Modelling Framework, or EMF [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] which is an
open source code generation framework in Eclipse based on a
structured data model called Ecore. The representation of the
models in EMF is achieved by making use of the Ecore
metamodeling language (a ”de facto” implementation of MOF) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
Concerning the usefulness of this tool for the development
of ModelByVoice, the EMF proved to be fundamental as we
precisely define the metamodel of the modelling language with
Ecore. ModelByVoice makes use of the EMF with the purpose
of reading any language as long as it has its metamodel defined
with Ecore. This is a key functionality and highlight that we
make use in our platform, being an advantage when compared
to other approaches.
        </p>
        <p>
          We used the Epsilon Transformation Language (ETL)
[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and Epsilon Generation Language (EGL) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. These
languages belong to Epsilon. ETL was used to associate
(compose), via model transformatiom, which elements of the
entry metamodel (metamodel of the language with which one
intends to model) are nodes, links and / or compartments (as
explained in subsection III-C), elements that will compose the
final metamodel. For example, if we want to model using the
state machine language (formed by states and transitions), we
have to associate by equivalence, the notions of state to a
node, and transition to a link. Each time we want to have
a modelling environment for a particular language, it will
be necessary to generate the platform code with EGL from
its metamodel(result of composing the previously referred
elements). In this case, this tool was used mainly to embed the
types of modelling language elements that are to be modelled,
in which the data structure chosen to store this information
were two lists, one for the nodes and another for the links,
thus generating the program code adapted to that modelling
language, and which restricts the user to the language domain.
        </p>
        <p>
          As said before, the ModelByVoice platform is composed
of speech recognition and synthesis system. The technological
tools used for the speech recognition system were the Sphinx-4
[
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] together with the Google Cloud Speech-to-Text. The first
one, Sphinx-4 , is used for the generic application commands
such as create, remove and save commands, for example, while
the Google Cloud Speech-to-Text is used to recognise the
variables of the elements that will compose the diagram, such
as the name and the type of objects that user can create or
edit. The interaction between the user and ModelByVoice is
established in the English language, because it is a universal
language and may cover a more significant number of potential
users around the world. Is it a standard procedure among
programmers that the names of variables and classes result
from the composition of more than one common word. For
example, getListOfClients could be used to name a function.
Google Cloud Speech-to-Text only supports words that are
present in the English dictionary, and as such we restrict the
set of variables names to words that belong to the English
dictionary.
        </p>
        <p>
          The FreeTTS tool [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] is used to implement the speech
synthesis component. FreeTTS is an open source tool that
converts text to audio, by voice synthesis. This system was
implemented with the purpose of giving feedback about the
execution of the operations and the actual state to the user
during the modelling of the systems. The MBROLA voice
system [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] was also integrated into this tool. This project
contains several synthesised voices, in different languages and
genres. It was decided to use the English male voice, and in
this system, voices are less robotic than the standard voice of
FreeTTS, getting closer to the recording of the human voice.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>C. Functionality</title>
        <p>An intended key aspect of this platform is that it gives the
possibility to the user to model with any modelling language
as long as its metamodel exists. The platform, through the
reading of the meta-model of that language, allows making the
association of the elements of this language, for the standard
notions under described.</p>
        <p>At the operation level, it is possible for the user to model
diagrams based on the notions of node, link, compartment and
attribute. A node represents a particular entity or element, a
link allows to make the connection between nodes and / or
compartments, a compartment represents a hierarchical set of
nodes, that is, it is a node composed of other nodes, and,
finally, an attribute can be associated with one or more nodes.</p>
        <p>In the figure 2 we have represented the standard language
metamodel, which covers most elements of the languages,
by association, and that for most input languages, it will be
the output metamodel after the conversion of the respective
metamodel original input. As explained before, we have to
convert the metamodel of the language that we want to
model, by associating the composing elements to the notions
represented on the following metamodel, which is in the most
of cases, the final metamodel.</p>
        <p>The ModelByVoice functionality is based on operations that
are performed from associated voice commands. It is required
that the edited languages support voice commands for CRUD
(create, read, update, delete) operations on the diagram and its
elements, giving the user the ability to create, listen (”read”),
edit and remove the variables that compose the elements of
each model. In the following figure 3, we have the supported
commands of the platform.</p>
        <p>The primary challenge found in the implementation process
was the way in which the user would navigate the diagrams
and perform the desired operations. The notion of ”navigation
element” was used to overcome this problem. This element,
which may be a node or a compartment (which is also a node),
acts as a reference and guidance point for the user to explore
the diagrams. The election of the navigation element is based
on three possibilities: i) when a new node/compartment is
created, it is assigned the role of navigation element; ii) the user
by enunciating the command ”change navigation element” can
designate which will be the new current navigation element
through the enunciation of its name by a voice command; iii)
when the navigation element is deleted, and the user loads
a previously created diagram for edition, the oldest element
in that diagram is called the navigation element. Therefore,
this was the idealised and chosen form, to avoid that the user
gets lost during the creation, exploration, and edition of the
diagrams.</p>
        <p>The diagrams are presented to the user trough the list
diagram command. This command allows to list all the diagram
content trough voice synthesis, or part of it, where the user
have to state the range number of elements to list by the
platform.</p>
        <p>Another essential implemented function was the help mode
command. This mode allows users to be guided during
navigation, calculating the possible commands that the user can
enunciate from the state in which the diagram is.</p>
        <p>Each time the user announces a voice command, the
operation associated with this command will be executed by the
platform, and then the voice synthesis will issue a response,
so that, the user will know about the success of the operation
execution. The provision of feedback during the execution of
the platform allows the user not to become lost during the
execution of the operations.</p>
        <p>Once the user executes all the desired operations, there is the
necessity to call the save diagram command. This command
will save the diagram in XMI (XML Metadata Interchange)
finally from the green to the red state. After that, the subjects
had to list the diagram to verify that all elements were created
correctly. If all parts were created with success, they would
be asked to create an attribute with the name seconds, with
integer type, to represent the time that each of the semaphores
would be active. In the figure 8 we have a diagrammatic
representation of the process for the first task performed by
the subjects.
format, in the directory of the machine where the platform is
located.</p>
        <p>In the figure 4 we have represented through an activity
diagram, the possible executions of the ModelByVoice and
its interaction with the user. All the possible operations are
represented in this activity diagram, as well as all the options
that the user can take during the process of the platform.</p>
        <p>In turn, in the figure 5 we have the internal representation
of the help mode available to the user. This helpful resource is
intended to calculate the possible operations to perform from a
given element, and to inform the user about which commands
can be executed. Figures 6 and 7 are session logs to exemplify
the models creation and edition by the operator while using
our tool prototype.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>IV. PRELIMINARY ASSESSMENT</title>
      <p>It was decided to perform this first task, with the purpose
of the users to get to know the platform, its operation, and
the commands that they can execute. The first blind subject
took about 7 minutes to complete the task, the second blind
subject took around 10 minutes, and the rest of the non-blind
subjects, took on average, 5 minutes to complete the proposed
task. A possible explanation for the deviation in the time
observed in the first group is that the first blind subject took
less than the second because he had academic qualifications
and was more familiar with the modelling activity. Concerning
the explanation for the difference between groups, the
nonblind users took less time than the blind, can be explained
by the fact that they have more practice and freshness in
diagram modelling. Once this task was performed, it was given
the opportunity to model with a modelling language of their
choice, which is the second and the last task that was executed.</p>
      <p>The first blind user opted to model with a language that
he learned in his computing course, the UML class diagram
language. The subject explored all the possible commands of
the platform, and it took him about 10 minutes to try out
all the possible executable commands. In general, all users
were comfortable with the tasks involved, except the second
blind subject, who was not confident in the second proposed
task and preferred not to execute it because he did not know
which modelling language to choose for not being familiar
with Software Engineering.</p>
      <p>With the collaboration of the Portuguese Association for
Visual Impairment, including blindness and amblyopia, ACAPO
- Associao dos Cegos e Amblopes de Portugal, two blind
users were involved in a preliminary assessment session where
they had the chance to use the prototype and answer to a
questionnaire.</p>
      <p>
        The profile of the subjects to perform the usability tests
of the tool was divided into two categories, blind or visually
impaired users, and users without any visual limitation.
Relatively to the first profile, one subject was MSc. in Software
Engineering with previous experience with modelling using
Graphviz [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and TeDUB with keyboard and joystick. The
second subject did not have a formal education in Computer
Science and Engineering. The second group of users,
without visual limitations, were three MSc students of Software
Engineering.
      </p>
      <p>The selection process of the participants began by defining a
set of prerequisites, which the users would have to satisfy. The
requirements outlined were as follows: the user must present a
basic knowledge of the English language; there must be total After the experiment, all users answered a questionnaire.
unfamiliarity about the platform; and finally, the need for basic The questionnaires assessed whether if they liked the
experitheoretical-practical notions in the computing area. ment if ModelByVoice was challenging to use, if they would</p>
      <p>Before the experiment, it was established that the first recommend the platform to other users, among other aspects.
task would be to conduct interviews with the users. The One of the open box questions was about the strong and
interviews objectives were the verification if the subjects weak points of the platform. All users highlighted the concept
effectively fulfilled the prerequisites defined, and collect per- in the tool of the navigation element, also, the simplicity
sonal information such as age, experience in the area and if of the commands (not hard to remember), the existence of
they had previous experience with modelling tools that have a help mode and the quality of the feedback given by the
integrated speech recognition and/or synthesising mechanisms. platform. Concerning the weak points, the majority of users
Regarding usability tests, two tasks were outlined, both similar, reported some glitches on the speech recognition technology,
but for two different languages. The first usability test task and in their opinion, this resource is quite annoying after some
involved modelling with the state machine language (states time expended on modelling. Another weak point was the
and transitions) because it is a simple language and involves fact that there is no interruption of the feedback during the
few concepts. The problem proposed was to create a state execution, this is, the user must wait for the end of the response
machine that represented all the possible states and transitions of the speech synthesiser, even if he has already heard the
of a traffic light. First, the subjects had to produce a diagram information he would like to know.
named Traffic Light. They had to create three state-type nodes, Their suggestions for implementing change, was the use of
and assign the names red, yellow, and green. Subsequently, the keyboard, instead of the voice recognition as input
rethey generated three transitions, one from the red state to the source, and the ability to start and pause the voice recognition
yellow state, another from the yellow to the green state, and at any time with the keyboard as well.</p>
      <p>The evaluation included only two blind people, which is
not statistically relevant. The reduced numbers of
participating subjects are a threat to the validity of the preliminary
assessment. However, this reflects the difficulty of contacting
software engineers that are severely sight impaired. Still thanks
to ACAPO, we managed to contact the two users involved
in the experiment. The number of modelling tasks performed
in the exercises can also be a threat due to its simplicity.</p>
      <p>However, as a preliminary assessment meant to guide us in
future iterations of the tool, we opted to do so as the users
have never used a voice recognition modelling application,
and because one of them has not even experienced before the
modelling activity. As expected, despite the reduced subject
numbers, the performed tests showed promising and confident
results and turned out to be valuable feedback for future
improvements that are underway.</p>
    </sec>
    <sec id="sec-5">
      <title>V. CONCLUSIONS AND FUTURE CHALLENGES</title>
      <sec id="sec-5-1">
        <title>A. Contributions</title>
        <p>This work presents a prototype tool that allows people with
visual impairment to create and edit diagrams for any type of
modelling language as long as this language is meta-modelled.</p>
        <p>The goal is to provide accessibility to the modelling activity
that can be reached both for blind people (and with problems
with vision) and users without physical limitations. With
this tool, it is possible to edit and query(navigate) diagrams,
via voice. We expect that this will be an enabler for the
full integration of blind people into group projects involving
modelling.</p>
        <p>The preliminary usability tests or evaluations with blind
subjects and non-blind, although with anecdotal figures,
allowed us to get a first feedback on the previously mentioned
characteristics, since the users approved the platform, and
left good suggestions of change and confidence for a future
continuous development work.</p>
      </sec>
      <sec id="sec-5-2">
        <title>B. Limitations and challenges</title>
        <p>
          As mentioned in sectionIV, the main limitations pointed out
by the subjects in our preliminary experiment, were related to
technical failure (error rate) of the existing underneath speech
recognition technology, while running ModelByVoice. The
expert users (with extensive practice with software technology
and used to use the keyboard) have suggested using the
keyboard as an input component, as it may prove to be faster
and more efficient when executing commands, instead of the
speech recognition. It should be noted that this limitation in
speech recognition is due to the fact that it is limited to an
open source voice tool (Sphinx), and a voice tool (Google
Cloud Speech-to-Text) that, despite being more efficient than
the previous one, is also not 100 % reliable in translating the
voice to text. Another interesting observation, besides the need
for some improved interaction mechanisms (like pause), is the
request for introducing audio signals instead of pure voice
synthesis feed-back. We foresee that an interesting approach
to deciding for the adequate concrete syntax, in this case,
is to use an adaptation of the proposed design principles of
PoN (Physics of Notations) [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] to audio. Those principles are
currently used to evaluate, compare and enhance the
communication properties a given software modelling language when
designing its visual notations (concrete syntax).
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>C. Future Work</title>
        <p>Taking into consideration the feedback and suggestions
given by the users about the ModelByVoice, some features
may prove to be interesting to implement in the future.</p>
        <p>Among them we have to consider giving the users the
possibility to choose the input resource that they intend to
use to practise the modelling activity, that is, to allow them
to select the voice recognition mechanism or the keyboard as
an input resource, or even combine both input sources at the
same time.</p>
        <p>Another interesting upgrade in the tool’s architecture is the
introduction of an intermediate speech recognition platform
independent layer, that would support any speech recognition
API. This way, the tool would be easily reconfigured to handle
with any speech recognition tool.</p>
        <p>Additionally, it would be interesting, and straight away, to
create an editor that could convert the created diagrams (which
are in XMI) and generate the diagram in graphical mode. This
would allow the diagrams to be analysed by other entities
or people who would eventually analyse or evaluate those
diagrams.</p>
        <p>Finally, as a more challenging future work, this project
raises the issue of what should be the systematic approach to
assess audio concrete syntax and interaction model regarding
its usability. We argue that it should be a similar framework to
PoN. In our perspective, an interesting line of research could
be followed in this direction.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENT</title>
      <p>The authors would like to thank NOVA LINCS
Research Laboratory (Grant: FCT/MCTES PEst UID/
CEC/04516/2013) and DSML4MAS Project (Grant:
FCT/MCTES TUBITAK/0008/2014).</p>
      <p>The authors would also like to thank ACAPO (Associac¸a˜o
dos Cegos e Ambl´ıopes de Portugal) for providing us the
contact of software professionals, and for their availability to
test our tool.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cernosek</surname>
          </string-name>
          and E. Naiburg, “
          <article-title>The value of modeling,” IBM White Paper</article-title>
          .
          <source>Retrieved July</source>
          , vol.
          <volume>31</volume>
          , p.
          <year>2008</year>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Stackoverflow</surname>
          </string-name>
          , “
          <article-title>Stats and analysis</article-title>
          ,
          <source>” Mar</source>
          .
          <year>2018</year>
          . [Online]. Available: https://insights.stackoverflow.com/survey/2018/#overview
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Steinberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Budinsky</surname>
          </string-name>
          , E. Merks, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Paternostro</surname>
          </string-name>
          ,
          <article-title>EMF: eclipse modeling framework</article-title>
          .
          <source>Pearson Education</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Kolovos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Rose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Abid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Paige</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Polack</surname>
          </string-name>
          , and G. Botterweck, “
          <article-title>Taming EMF and GMF Using Model Transformation,” in Model Driven Engineering Languages and Systems SE - 15, ser</article-title>
          . Lecture Notes in Computer Science, D. C. Petriu,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rouquette</surname>
          </string-name>
          , and Ø. Haugen, Eds. Springer Berlin Heidelberg,
          <year>2010</year>
          , vol.
          <volume>6394</volume>
          , pp.
          <fpage>211</fpage>
          -
          <lpage>225</lpage>
          . [Online]. Available: http://dx.doi.
          <source>org/10.1007/978-3-642- 16145-2 15</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cook</surname>
          </string-name>
          , G. Jones,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kent</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Wills</surname>
          </string-name>
          ,
          <article-title>Domain-specific Development with Visual Studio Dsl Tools</article-title>
          , 1st ed.
          <source>Addison-Wesley Professional</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>J. de Lara</surname>
            and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Vangheluwe</surname>
          </string-name>
          , “
          <article-title>Atom3: A tool for multi-formalism and meta-modelling,” in Fundamental Approaches to Software Engineering</article-title>
          , 5th International Conference,
          <string-name>
            <surname>FASE</surname>
          </string-name>
          <year>2002</year>
          ,
          <article-title>held as Part of the Joint European Conferences on Theory and Practice of Software</article-title>
          ,
          <source>ETAPS</source>
          <year>2002</year>
          , Grenoble, France, April 8-
          <issue>12</issue>
          ,
          <year>2002</year>
          , Proceedings,
          <year>2002</year>
          , pp.
          <fpage>174</fpage>
          -
          <lpage>188</lpage>
          . [Online]. Available: https://doi.org/10.1007/3-540-45923-5 12
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7] V. University, “Gme: Generic modeling environment,”
          <year>2007</year>
          . [Online]. Available: http://www.isis.vanderbilt.edu/Projects/gme/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lyytinen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Rossi</surname>
          </string-name>
          , “
          <article-title>Metaedit+ a fully configurable multi-user and multi-tool case and came environment</article-title>
          ,
          <source>” in 8th International Conference on Advanced Information Systems Engineering</source>
          , CAiSE'
          <volume>96</volume>
          , vol.
          <volume>1080</volume>
          /
          <year>1996</year>
          . Heraklion, Crete, Greece: Springer Berlin / Heidelberg,
          <year>1996</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eysholdt</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Behrens</surname>
          </string-name>
          , “Xtext:
          <article-title>Implement your language faster than the quick and dirty way</article-title>
          ,”
          <source>in Proceedings of the ACM International Conference Companion on Object Oriented Programming Systems Languages and Applications Companion</source>
          , ser.
          <source>OOPSLA '10</source>
          . New York, NY, USA: ACM,
          <year>2010</year>
          , pp.
          <fpage>307</fpage>
          -
          <lpage>309</lpage>
          . [Online]. Available: http://doi.acm.
          <source>org/10</source>
          .1145/1869542.1869625
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Dmitriev</surname>
          </string-name>
          , “
          <article-title>Language oriented programming : The next programming paradigm</article-title>
          ,” Onboard, Jetbrains,
          <source>Tech. Rep.</source>
          ,
          <year>2004</year>
          . [Online]. Available: http://www.onboard.jetbrains.com
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Beydeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Book</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gruhn</surname>
          </string-name>
          et al.,
          <source>Model-driven software development</source>
          . Springer,
          <year>2005</year>
          , vol.
          <volume>15</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Petrie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schlieder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Blenkhorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.-M. ONeill</surname>
            , G. Ioannidis,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Gallagher</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Crombie</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Mager</surname>
          </string-name>
          et al.,
          <article-title>“Tedub: A system for presenting and exploring technical drawings for blind people,” Computers helping people with special needs</article-title>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>67</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Blenkhorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Crombie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dijkstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Evans</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Wood</surname>
          </string-name>
          , “
          <article-title>Presenting uml software engineering diagrams to blind people</article-title>
          ,” in International Conference on Computers for Handicapped Persons. Springer,
          <year>2004</year>
          , pp.
          <fpage>522</fpage>
          -
          <lpage>529</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] U. of Manchester, “Tedub and accessible uml,”
          <year>2005</year>
          . [Online]. Available: http://www.alasdairking.me.uk/tedub/index.htm
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>Doherty</surname>
          </string-name>
          and B. Cheng, “
          <article-title>Uml modeling for visually-impaired persons,”</article-title>
          <source>in CEUR Workshop Proceedings</source>
          , vol.
          <volume>1522</volume>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2015</year>
          , pp.
          <fpage>4</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Soares</surname>
          </string-name>
          , “Uma abordagem para derivar modelos de requisitos a partir de mecanismos de reconhecimento de voz,”
          <article-title>Master's thesis</article-title>
          , Faculdade de Cieˆncias e Tecnologia da Universidade de Lisboa,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Kolovos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Paige</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Polack</surname>
          </string-name>
          , “
          <article-title>Eclipse development tools for epsilon,” in Eclipse Summit Europe</article-title>
          ,
          <source>Eclipse Modeling Symposium</source>
          , vol.
          <volume>20062</volume>
          ,
          <year>2006</year>
          , p.
          <fpage>200</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Kolovos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Paige</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Polack</surname>
          </string-name>
          , “
          <article-title>The epsilon transformation language</article-title>
          ,” in
          <source>International Conference on Theory and Practice of Model Transformations</source>
          . Springer,
          <year>2008</year>
          , pp.
          <fpage>46</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Rose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Paige</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Kolovos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Polack</surname>
          </string-name>
          , “
          <article-title>The epsilon generation language,”</article-title>
          <source>in European Conference on Model Driven Architecture-Foundations and Applications</source>
          . Springer,
          <year>2008</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>W.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lamere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kwok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gouvea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wolf</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Woelfel</surname>
          </string-name>
          , “
          <article-title>Sphinx-4: A flexible open source framework for speech recognition,”</article-title>
          <source>SMLI TR-2004-139</source>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>Microsystems</given-names>
          </string-name>
          , Inc.,
          <source>Tech. Rep.</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>W.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lamere</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kwok</surname>
          </string-name>
          , “
          <article-title>Freetts: a performance case study,”</article-title>
          <source>SMLI TR-2002-114</source>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>Microsystems</given-names>
          </string-name>
          , Inc.,
          <source>Tech. Rep.</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Mbrola</surname>
          </string-name>
          . [Online]. Available: http://tcts.fpms.ac.be/synthesis/mbrola.html
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ellson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gansner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Koutsofios</surname>
          </string-name>
          , S. C. North, and G. Woodhull, “
          <article-title>Graphvizopen source graph drawing tools</article-title>
          ,” in
          <source>International Symposium on Graph Drawing</source>
          . Springer,
          <year>2001</year>
          , pp.
          <fpage>483</fpage>
          -
          <lpage>484</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24] D. Moody, “
          <article-title>The physics of notations: toward a scientific basis for constructing visual notations in software engineering,” IEEE T Software Eng</article-title>
          , vol.
          <volume>35</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>756</fpage>
          -
          <lpage>779</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>