Usability evaluation of a Conversational Artificial
                     Intelligence Interface

           Ligia Maza-Jimenez1 and Pablo Torres-Carrión1[0000-0002-7606-0582]
         1 Universidad Técnica Particular de Loja, San Cayetano Alto-Loja, Ecuador

                         {lemaza, pvtorres}@utpl.edu.ec


       Abstract. In this research, the usability of a Conversational Artificial Intelligence
       environment is evaluated from the metrics of effectiveness, efficiency and satis-
       faction, taking as a case of the virtual assistant "Max" of the Private Technical
       University of Loja (UTPL). The evaluation is carried out virtually, with the par-
       ticipation of 45 members of the university community (15 students, 15 parents,
       and 15 outsiders). An initial informative test prior to interaction is proposed; and
       the usability questionnaire PSSUQ (Adaptation to Spanish CSUQ) after the in-
       teraction. The research has a quantitative approach, with a concurrent and quasi-
       experimental design of a transectional type, and a descriptive scope. The results
       obtained show that the level of effectiveness, efficiency and satisfaction metrics
       are low, so it is concluded that the interaction strategy of the UTPL Virtual As-
       sistant "Max" should be reviewed.

       Keywords: Usability, Conversational Artificial Intelligence, Chatbot, University


1      Introduction

Human-Computer Interaction (HCI) is the emerging area of computer science closest
to the user, which studies the communication that occurs between computers and the
people who use them [1], [2]. In this context, User Experience (UX) refers to the feeling
of the person when interacting with a computer system [3] and manages mainly three
dimensions or experiences: aesthetic, significant and affective; One of the key factors
for a good user experience is usability. The international standard ISO 9241-11 defines
it as the "Degree to which a product can be used by certain users to achieve specific
objectives with effectiveness, efficiency and satisfaction in a context of specific use"
[4]. According to this definition, usability is made up of three main elements, which are
related to the characteristics and objectives of the users and the context of use: a) Ef-
fectiveness: it is defined as the precision and integrity with which users achieve specific
objectives; b) Efficiency: on the other hand, as the resources spent in relation to the
precision and integrity with which the users achieve these objectives; and, c) Satisfac-
tion: as "being free of discomfort and positive attitudes towards the use of the product
(system, service or environment)".


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).


                                                  1
2


Research in usability, as well as other sciences, has been enriched by Artificial Intelli-
gence (AI), which has allowed the establishment of intelligent interaction environments
[5]. According to García [6], AI tries to explain mental functioning based on the devel-
opment of algorithms to control different things, combining several fields, such as ro-
botics, expert systems and others, which have the same objective, which is to try to
create machines that can think for themselves. AI can be divided into three wide sce-
narios [7]: a) narrow or weak that is limited to one functional area; b) general, contain-
ing the power of reasoning, problem solving and abstract thinking; and, c) the super
intelligence that is the maximum level when the AI exceeds the human intelligence.
Within the weak AI is Conversational, with a strong relationship with HCI, which guar-
antees that simulated conversations can be made with a computer [8], opening the space
to develop empathetic behaviors between the machine and the user.

Conversational Artificial Intelligence (IAC), according to Nieves [9] is responsible for
the logic behind the robots, that is, it is the brain and soul of the chatbot. Without IAC,
a bot is just a bunch of questions and answers. Additionally, the IAC is powered by
Natural Language Processing (NLP), which focuses on human language interpretation,
while developers present the basic framework of how a conversation can develop.
Simply put, IAC and humans work together to create a virtual conversational experi-
ence.

Chatbots are programs that use natural language processing (NLP) in a question and
answer system (QA systems) [10]. Its purpose is to simulate an intelligent dialogue with
a human interlocutor, through text messages through a console or through voice. They
started from the Turing Test [11] that originated in 1950 by Alan Mathison Turing,
From the passage of time until today there is an interesting evolution of chatbots, the
most current ones are hosted on websites to personal virtual assistants on mobile de-
vices which include features that make them one of the most attractive tools for a com-
pany or institution. As a result of the health emergency due to the expansion of the
coronavirus COVID-19, chatbots have gained ground in terms of online assistance, in
order to keep society informed on different topics. For this reason, this study evaluates
the usability of the UTPL "Max" virtual assistant, based on the effectiveness, efficiency
and satisfaction metrics.

To this study, it is proposed a quantitative approach as the methodology, with a con-
current and quasi-experimental design, of a transactional or transversal type, with a
descriptive and correlational scope. The results obtained indicate that there is a low
level of usability due to different factors such as: presence of errors, difficulty in using
the system, lack of friendly interface, imprecise information and preference for inter-
action with human beings.


                                               2
                                                                                                       3


2         Methodology

It is proposed to evaluate the usability of an active IAC environment in a higher
education institution. The UTPL “Max” virtual assistant has been selected as the
evaluation interface. Following the indications of ISO 9241-11, the effectiveness,
efficiency and satisfaction metrics have been selected.


2.1       Research design
Taking into account what this research mentions [12], it has the following design see
Fig.1:


                                                                                         Descriptive
                                    Concurrent and                                         scope
                                   quasi-experimental       • Collect data in
    • IIt integrates                     design               a single
      numerical                                               moment.
      data for its                                          • Describe the       • Specify
      later                                                   variables and        characteristics
      statistical               • Confirm results
                                                              analyze their        of the
      analysis.                   through cross-
                                                              incidence and        participating
                                  validation
          Quantitative                                        interaction at       sample and
                                  between
           approach                                           any given            investigate the
                                  variables
                                                              time.                incidence of
                                • The sample of                                    the variables in
                                  this study is not              Transectional
                                                                                   the population.
                                  random.                            type


                                          Fig. 1. Research design


2.2       Sample

                                     Table 1. Sample of participants

      TYPES
                                              DESCRIPTION                            n      %
      OF USER
                        Undergraduate degree at the Universidad Técnica
                         Particular de Loja (English, Biochemistry, Account-
      Students                                                                       15     33,33
                         ing and Auditing, Medicine, Psychology, Economics
                         and Architecture)
                        Postgraduate at the Universidad Técnica Particular
      Parents            de Loja (Communication Sciences and Technolo-               15     33,33
                         gies)
      Externs           Presential and distance                                     15     33,33
      TOTAL                                                                          45     100


                                                        3
4


The sample is of a non-probabilistic type, selected following the technique called acci-
dental or snowball, which takes advantage of the people available at any given time for
the purpose of the study [13]. It should be noted that the type of sample was conditioned
by the national health emergency established on March 17, 2020 by Presidential Decree
1017 Agreement No. 00126-2020 [14], see Table 1.


2.3    Instruments

The instruments used for the present study are the following:

 Initial test: It is an ad-hoc survey that collects personal data and identification of the
  emotional state of the participants. It consists of 10 items, among which are: in-
  formed consent; personal data, classification of emotions, reason for emotions, ef-
  fectiveness, efficiency and satisfaction in relation to the interaction of with the Vir-
  tual Assistant "MAX" of the UTPL.
 System usability questionnaire after the PSSUQ study (Adaptation to Spanish
  CSUQ): Whose objective is to evaluate the general satisfaction of users with an in-
  terface? Account of 16 items, which is answered by a Likert-type scale of 7 options
  classified from one to seven, is classified into three variables a) Utility of the corre-
  sponding system of item 1-6; b), Quality of the information to items 7-12; c) Quality
  of interface to items 13-16. The higher score, the higher level of satisfaction in gen-
  eral and a reliability (Cronbach's Alpha, α = .96) [15].


2.4    Procedure

Stage 1. Interaction planning: This stage specifies the technology that will be used to
capture the information related to the characteristics of user behavior while participat-
ing in the evaluation with IAC.

 In the first instance, the evaluation instruments were selected: Initial ad-hoc test
  which was developed using the Microsoft Forms tool, the system usability question-
  naire after the PSSUQ study (adaptation to Spanish CSUQ) and as an environment
  IAC used the virtual assistant "MAX" from UTPL.
 The next step was to determine the sample that is divided into 3 groups: Students
  (Postgraduate in Information Sciences and Technologies and Undergraduate stu-
  dents corresponding to English, biochemistry, accounting and auditing, medicine,
  psychology, economy and architecture of the UTPL), Parents and External Persons,
  made up of 15 participants each; the same ones that have been contacted through
  social networks, WhatsApp and phone calls.
 To end the first stage, a preliminary test planning or piloting of the application of the
  questionnaires was carried out based on an application protocol guide, to clarify,
  specify and perfect the method of application of instruments.

Stage 2. Execution: The execution phase consists of three moments:


                                               4
                                                                                       5


 In the first moment, the ad-hoc emotion identification questionnaire was applied
  online, which measures the usability of the virtual assistant "MAX" of the UTPL by
  the participants. This process lasted 5 min. approximately.
 In the second moment, the participants share the screen through the ZOOM web
  conferencing and video conferencing services platform and will interact with the
  UTPL virtual assistant "MAX"; This process will be recorded for a period of 2 min.
 Finally, in the third moment, the participant answers an online usability question-
  naire after the PSSUQ study (adaptation to Spanish CSUQ) online, regarding their
  appreciations of the experience to verify the validity and reliability of the Chatbot
  instrument (Assistant Virtual “MAX” of the UTPL) used, the test lasted 3 min. ap-
  proximately; at that time, the metrics of efficacy (average of complete tasks and us-
  ability problems) and efficiency (task time and errors) and satisfaction (satisfaction
  scales) were evaluated [16]

Stage 3: Analysis of results: In this section, the data was analyzed with the results ob-
tained in the initial test and usability questionnaire of the system subsequent to the
PSSUQ study, using descriptive tables in Excel.


3      Results

This section is described according to the objective set based on the initial test and
usability questionnaire of the system after the PSSUQ study, measuring the usability of
the virtual assistant "MAX" of the UTPL, see Fig 2.


                                             %

           80      71,1

           60
           40
                             22,2
           20                                    4,4
                                       2,2                  0         0
            0
                  Never    0 to 30 30 to 60 60 to 90 90 to 120 More
                           minutes minutes minutes minutes than 120
                                                               minutes


    Fig. 2. On average, how long a week do you use the UTPL Virtual Assistant "MAX"?

71.1% of users mention never using the UTPL “MAX” virtual assistant a week.


                                             5
6


      A. Frequency of errors when using the                             B. Service satisfaction with respect to
                virtual assistant.                                                 virtual assistant

     Frequently   Occasionally          Rarely   Never                             Totally satisfied
                                                                                   Satisfied
                                                                                   No satisfied


                                  6%
                    13%                                                                       7%

                                                                                 40%
                                       31%
                                                                                                  53%
                   50%


                     Fig. 3. Usability parameters according to the initial test

As part of the research, the system usability questionnaire after the PSSUQ study is
  applied after the interaction see Fig. 3, 4:


                                   System utility                 Quality of information
                                   Interface quality              General Satisfaction

          70        64,4
                                   60   60
          60               51,1
          50
          40
                                                                                                  28,9
                                                           26,7                                          26,7
          30                                                                        24,4
                                                                                           22,2
          20                                        11,1          11,1 13,3
          10
            0
                            Low                            Medium                           High


                     Fig. 4. Variables of the PSSUQ usability questionnaire.


4      Discussion of results

In this study 71.1% of the users did not use the virtual assistant “MAX” of the UTPL,
while the participants who did it 50% rarely found errors. Likewise, 53% are satisfied
with the service at the time of interaction. For this reason, the effectiveness, efficiency
and satisfaction metrics have a medium level of usability. Regarding the system usabil-
ity questionnaire after the PSSUQ study, the variables of quality systems, information
quality, interface quality and perceived satisfaction obtained a low level of more than
50%, which means that review the interaction strategy of the Virtual Assistant "Max"


                                                                  6
                                                                                                7


of the UTPL. The same occurs with the study carried out by Valle [17] in Mexico, with
the aim of using a conversational assistant to help resolve doubts of university students
in their degree process, obtained as a result that the people who used the virtual assistant
they rated their interaction low, compared to people who interacted with a human to
obtain academic information, as well as indicating that the design and responses that
do not give enough information could be improved. Along these same lines, Peralta
[18], in his study carried out in Peru, with the aim of measuring the level of improve-
ment in personalized assistance in the process of obtaining a degree in a university
through the use of chatbot, found that the effectiveness of virtual assistant depends on
the use of an agile methodology, simplicity of usability and operation of the program.
Taking into account the aforementioned studies, the similarity that exists with this
study, regarding the level of usability, can be evidenced.


5      Conclusions

In this way, it is concluded that the low levels of usability obtained in this study are due
to different factors such as: presence of errors, difficulty in using the system, lack of
friendly interface, imprecise information and preference for interaction with human be-
ings. In this sense it is important to take into account what was mentioned by Sánchez
[19], the usability of a product is the quality attribute that is given to an interface based
on its easy interaction and the satisfaction it produces when used. Therefore, it is rec-
ommended that the virtual assistant be widely disseminated, and that an acceptable ser-
vice provision be offered that guarantees its usability.


References

 1. Marcos M.-C., “HCI (human computer interaction): concepto y desarrollo,” El Prof. la Inf.,
    vol. 10, no. 6, pp. 4–16, 2001.
 2. Sears A. and Jacko J. A., Human-Computer Interaccion, Second. 2007.
 3. Boada N., “¿Por qué es tan importante el User Experience o Experiencia del Usuario?,”
    2017. [Online]. Available: https://www.cyberclick.es/numerical-blog/por-que-user-
    experience-o-experiencia-del-usuario. [Accessed: 13-Mar-2019].
 4. International Organization for Standardization and ISO, “INTERNATIONAL STANDARD
    ISO 9241-11,” 1998.
 5. Mira J., “Inteligencia artificial, emoción y neurociencia,” Arbor, vol. 162, no. 640, pp. 473–
    506, Apr. 1999.
 6. García A., Inteligencia artificial : fundamentos, práctica y aplicaciones. RC Libros, 2012.
 7. Gomes C. C. and Preto S., “Artificial intelligence and interaction design for a positive
    emotional user experience,” vol. 722. Springer Verlag, The Research Centre for
    Architecture, Urbanism and Design, Universidade de Lisboa, Faculdade de Arquitetura, Rua
    Sá Nogueira, Polo Universitário, Alto da Ajuda, Lisbon, 1349-055, Portugal, pp. 62–68,
    2018.
 8. Brinquis C., “IA conversacional: conversaciones reales con un ordenador.,” 2019. [Online].
    Available: https://www.incentro.com/es-es/blog/stories/ai-conversacional-conversaciones-
    reales-con-un-ordenador/. [Accessed: 13-Apr-2019].


                                                  7
8


 9. Nieves B., “IA Conversacional: definición y conceptos básicos.,” 2018. [Online]. Available:
    https://planetachatbot.com/ia-conversacional-conceptos-basicos-y-la-definicion-
    107529e213c1. [Accessed: 04-Dec-2019].
10. Fitrianie S., “My_Eliza, A Multimodal Communication System,” in Proceedings of
    Euromedia, 2002, pp. 1–187.
11. Turing A. M., “Computing Machinery and Intelligence,” 1950.
12. Hernández Fernández R., C., and Baptista P., Metodología de la investigación., Sexta.
    México, 2014.
13. Espinosa P., Hernández H., López R., and Lozano S., “Muestreo de Bola de Nieve,”
    Departamento de Probabilidad y Estadística UNAM, 2018. [Online]. Available:
    https://es.scribd.com/document/379661920/Proyectofinal-Bola-de-Nieve. [Accessed: 15-
    Jul-2020].
14. C. E. Constitucional, “Registro Oficial - Órgano de la República del Ecuador,” Ecuador,
    2020.
15. Hedlefs M. I., González A. de la G., Sánchez M. P., and Garza A. A., “Adaptación al español
    del Cuestionario de Usabilidad de Sistemas Informáticos CSUQ / Spanish language
    adaptation of the Computer Systems Usability Questionnaire CSUQ.,” Revista
    Iberoamericana de las ciencias compitacionales e informática, 2016. [Online]. Available:
    https://www.reci.org.mx/index.php/reci/article/view/35. [Accessed: 15-Jul-2020].
16. Omar Mohamed H., Yusoff R., and Jaafar A., “Quantitive analysis in a heuristic evaluation
    for usability of educational computer game (UsaECG),” in Proceedings - 2012 International
    Conference on Information Retrieval and Knowledge Management, CAMP’12, 2012, pp.
    187–192.
17. Valle L., López J., and Garcia M., “Desarrollo e implementación de un bot conversacional
    como apoyo a los estudiantes en su proceso de titulación,” Int. Conf. Robot. Comput. 2013,
    no. May, pp. 361–365, 2013.
18. Peralta A. G., “Chatbot para la asistencia personalizada en el proceso de obtención de título
    en la modalidad de tesis para los bachilleres de la escuela profesional de ingeniería de
    computación y sistemas de la UPAO,” Universidad Privada Antenor Orrego, 2018.
19. Sánchez W., “La usabilidad en Ingeniería de Software : definición y características,” Ing-
    novación. Rep. Investig., no. 2, pp. 7–21, 2011.


                                                 8