Cloud Services for Social Robots and Artificial Agents
Lucrezia Grassi1 , Carmine Tommaso Recchiuto1 and Antonio Sgorbissa1
1
    Laboratorium - DIBRIS, Università di Genova, via all’Opera Pia 13, 16145, Genova, Italy


                                         Abstract
                                         This work presents the design and the implementation of CAIR: a cloud system for knowledge-based
                                         interaction devised for Social Robots and other conversational agents. The system is structured in a way
                                         that it can be easily expanded by adding new services that improve the capabilities of the clients connected
                                         to the Cloud. Another key feature of the system is that it has been designed to make the development of
                                         its clients straightforward: in this way, multiple devices (e.g., robots, computers, smartphones, etc.) can
                                         be easily endowed with the capability of autonomously interacting with the user, understanding when
                                         to perform specific actions, and exploiting all the information provided by services on the Cloud.

                                         Keywords
                                         Cloud Robotics, REST API, Client-Server Architecture, Socially Assistive Robots, Human-Robot Interac-
                                         tion


1. Introduction
CARESSES1 is an international, multidisciplinary project whose goal is to design the first robots
that can assist older people and adapt to the culture of the individual they are taking care of
[1, 2]. The robots can help the users in many ways including reminding them to take their
medication, encouraging them to keep active, helping them keep in touch with family and
friends. Each action is performed with attention to the older person’s customs, cultural practices
and individual preferences. In CARESSES, the ability of the companion robot to naturally
converse with the user has been achieved by creating a framework for cultural knowledge
representation that relies on an Ontology [3, 4]. A major limitation of the CARESSES system is
that only one device at a time can connect to the server and exploit all the capabilities provided
by the system. Moreover, its server stores the information related to a single client, and it
manages all the connections, even from different devices, as coming from the same user. This
kind of implementation also prevents the system to allow for an expansion of the knowledge
base from multiple users.
   Recently, it has become more and more common to exploit cloud technologies to improve
the efficiency of many devices and systems. In the robotics field, this practice is defined as cloud

The 8th Italian Workshop on Artificial Intelligence and Robotics – AIRO 2021
$ lucrezia.grassi@edu.unige.it (L. Grassi); carmine.recchiuto@dibris.unige.it (C. T. Recchiuto);
antonio.sgorbissa@unige.it (A. Sgorbissa)
 https://www.researchgate.net/profile/Lucrezia-Grassi (L. Grassi);
https://www.researchgate.net/profile/Carmine-Recchiuto (C. T. Recchiuto);
https://www.researchgate.net/profile/Antonio-Sgorbissa (A. Sgorbissa)
 0000-0001-6363-3962 (L. Grassi); 0000-0001-9550-3740 (C. T. Recchiuto); 0000-0001-7789-4311 (A. Sgorbissa)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings         CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073


                  1
                      www.caressesrobot.org
robotics, i.e. the use of remote computing resources to enable greater memory, computational
power, collective learning and inter-connectivity for robotics applications [5, 6].
   This paper describes the cloud system that has been designed based on the underlying
principles of the CARESSES system, but completely modifying its architecture to offer a set of
web services for multiple robots and devices.
   The CAIR system has been developed by taking advantage of the rich knowledge base and
the dialogue mechanism developed during the CARESSES project, intending to create a system
much easier to use and able to manage contemporary connections from multiple users. The
system is based on the use of REST APIs [7] that provide many advantages such as scalability,
flexibility, portability and independence (see Section 2). CAIR web services allow the connected
clients to manage a rich conversation with the users and to receive Plans to be executed, if
possible, on the client device. Moreover, such architecture allows to effectively exploit the
already implemented mechanism for knowledge expansion [8], which will be integrated into
the following versions of the system.
   The system for knowledge-based autonomous interaction described in this work can be easily
used by most devices with Internet connectivity, able to acquire an input through a keyboard or
a microphone, and provide an output through a screen or a speaker (e.g., robots, computers,
smartphones, smartwatches, etc.).


2. System Architecture
The system is based on a client-server architecture and it has been designed in a way that it
is easy to add new functionalities to improve its performance, and it is easy to be used by the
clients.
   Figure 1 depicts the proposed architecture. The server is composed of two web services
developed in Python: (1) the Dialogue Manager service that manages the dialogue and analyzes
the user sentence to recognize the intention of talking about a specific topic, and (2) the Plan
Manager service that recognizes the intention of the user to make the agent execute a specific
action. To provide appropriate answers and plans, the server exploits an Ontology containing
all the topics, keywords, sentences and plans used during the interaction with the user. The
Flask-RESTful2 framework has been used when developing the web services.
   The client can perform requests to the server using REST APIs. REST is a set of rules that
should be followed when creating the API. One of these rules states that the client should be
able to get a piece of data (called a resource) when linked to a specific URI. Each URI is called a
request, while the data sent back to the user is called a response. Any web service that obeys the
REST constraints is informally described as RESTful. Due to the separation between client and
server, this protocol makes it easy for developments across the various areas of a project to take
place independently. In addition, the REST API adapts at all times to the working syntax and
platform. This offers the opportunity to try several environments while developing.
   The client acquires the user sentence, sends it to the server, parses the response, executes the
received Plan and/or replies with the dialogue sentence returned by the Dialogue Manager.

   2
       https://flask-restful.readthedocs.io/en/latest/
Figure 1: CAIR system architecture


2.1. Dialogue Manager service
The Dialogue Manager service is in charge of managing the dialogue. To provide the appropriate
response to the user, the Dialogue Manager service requires as input the user sentence and the
client state. As mentioned before, the ability of the system to naturally converse with the user
has been achieved by creating a framework for cultural knowledge representation that relies on
an Ontology [3, 4]. However, to deal with representations of the world that may vary across
different cultures [9], the Ontology is organized into three layers, as explained more in detail in
[10, 4]. The Ontology structure is used to build the Dialogue Tree (DT) and some additional
files, namely the Topics keywords, Topics likeliness and Topics sentences, which are ultimately
fed to the conversation system to chit-chat with the user.
   Based on the Dialogue Tree, the key ideas for knowledge-driven conversation can be briefly
summarized as follows (the whole process [10] is more complex).
  Each time a user sentence is acquired:
   1. A Dialogue Management algorithm (either keyword-based or based on more advanced
      topic classification techniques) is applied to check if the user’s sentence may trigger one
      of the topics in the DT by jumping to the corresponding node;
   2. If no topics are triggered, the conversation follows one of the branches of the DT (according
      to policies that take into account the user’s cultural background and personal preferences).
  The system continues in this way, proposing sentences corresponding to a node and acquiring
the user’s feedback that can be used to update the user’s preferences and/or determine the next
node to move to.

2.2. Plan Manager Service
As shown in Figure 1, other than providing the dialogue reply and the updated client state,
the Dialogue Manager service also calls the Plan Manager service, working as an intermediary
between the client and such service. However, the client could directly call this service if not
interested in managing the dialogue.
    The Plan Manager service receives as input the user sentence. Its purpose is to find a match
between such sentence and one of the trigger sentences of a specific Intent. An Intent is defined
by (i) a set of trigger sentences (built using regexes), (ii) one or more plan-specific sentences (if
any), (iii) a KBPlan (if any), (iv) and a Plan (if any) (Figure 2).
    Trigger sentences (i) are used to check if the user sentence matches with the corresponding
Intent. Such sentences are currently modeled with regexes, and allow to extract parameters
from the matched sentence that can be used to dynamically compose the plan sentences, the
plan and the KBPlan. Besides trigger sentences, also plan-specific sentences have been defined
(ii), which are used by the system to reply when an intent is detected.
    A KBPlan, where KB stands for Knowledge Base, is a sequence of actions meant to affect the
knowledge base and/or the flow of the dialogue. For instance, if the user says “I love music",
this sentence will match the trigger sentences of the Appreciation Intent (Figure 2) meant to
recognize the user’s appreciation for something and extract the loved thing as a parameter. The
KBPlan of this Intent is composed of two actions: the first one increases the probability that the
user wants to talk about the extracted parameter, while the second one brings the information
that the system should jump to that conversation topic (if present in the Ontology).
    A Plan is a sequence of actions that should be executed on the client as it does not affect
the knowledge base nor the flow of the dialogue. For instance, if the user says “Play the song
Yesterday", this sentence will match one of the trigger sentences of the Music Intent (Figure 2)
that recognizes the user intention to some music. The plan of this Intent is composed of a single
action carrying the information that the client should play the song having the title contained
in the parameter field (see Section 2.3).
    If the Plan Manager service finds a match with an Intent, a response containing the KBPlan,
the plan sentence, and the plan is returned to the Dialogue Manager service (see Figure 1).
The KBPlan is managed by the Dialogue Manager, as it directly affects the continuation of the
dialogue, while the plan sentence and the plan are not used by this service. Once the Dialogue
Figure 2: Examples of Intent recognized by the CAIR system


Manager service has chosen the most appropriate answer for the user based on the information
contained in the client state, the previous user sentence, and, eventually, the KBPlan, it returns
a response to the client. Such a response contains the updated client state, the plan sentence,
the plan, and the dialogue sentence (i.e., the actual continuation of the dialogue).

2.3. Client
A client for CAIR can be easily developed for most devices with Internet connectivity such as
robots, computers, smartphones, smartwatches, etc. The first thing that the client should do
is to perform a PUT request to the server, in particular to the Dialogue Manager service, to
obtain the initial client state and the first sentence of the conversation. The client state should
be stored locally and retrieved before all the following requests. Afterwards, the client should
acquire the user input, send it to the server along with the client state with a GET request,
retrieve and manage the response. This last operation expects the system to communicate the
Intent reply to the user, perform the actions contained in the plan field of the response, and
eventually continue the dialogue by communicating the sentence reply. If the client is not able
to execute the actions it can ignore them and consider only the sentence reply.
   An example of a simple client for PC can be found in this GitHub repository3 . The repository
contains also a PDF guide with a detailed explanation of the code and of all Plans that the
server can return to the client based on the Intent that has been matched. Another documented
example of a full client for the SoftBank Robotics robots Pepper and Nao, that manages all
the Plans returned by the server, can be found here4 . A video showing some extracts of the
interaction is available on YouTube5 .


   3
     https://github.com/lucregrassi/CAIRclient_instructions.git
   4
     https://github.com/lucregrassi/CAIRclient/tree/develop
   5
     https://www.youtube.com/watch?v=UdbYGytd07w
References
 [1] C. Papadopoulos, N. Castro, A. Nigath, R. Davidson, N. Faulkes, R. Menicatti, A. A. Khaliq,
     C. Recchiuto, L. Battistuzzi, G. Randhawa, et al., The caresses randomised controlled
     trial: Exploring the health-related impact of culturally competent artificial intelligence
     embedded into socially assistive robots and tested in older adult care homes, International
     Journal of Social Robotics (2021) 1–12.
 [2] A. Khaliq, U. Kockemann, F. Pecora, A. Saffiotti, B. Bruno, C. Recchiuto, A. Sgorbissa,
     H.-D. Bui, N. Chong, Culturally aware planning and execution of robot actions, 2018, pp.
     326–332.
 [3] N. Guarino, Formal ontology and information systems, FOIS’98 Conf (1998) 81–97.
 [4] B. Bruno, C. T. Recchiuto, I. Papadopoulos, A. Saffiotti, C. Koulouglioti, R. Menicatti,
     F. Mastrogiovanni, R. Zaccaria, A. Sgorbissa, Knowledge representation for culturally com-
     petent personal robots: requirements, design principles, implementation, and assessment,
     International Journal of Social Robotics 11 (2019) 515–538.
 [5] J. Wan, S. Tang, H. Yan, D. Li, S. Wang, A. V. Vasilakos, Cloud robotics: Current status and
     open issues, IEEE Access 4 (2016) 2797–2807.
 [6] C. Recchuto, L. Gava, L. Grassi, A. Grillo, M. Lagomarsino, D. Lanza, Z. Liu, C. Papadopoulos,
     I. Papadopoulos, A. Scalmato, et al., Cloud services for culture aware conversation: Socially
     assistive robots and virtual assistants, in: 2020 17th International Conference on Ubiquitous
     Robots (UR), IEEE, 2020, pp. 270–277.
 [7] M. Masse, Rest api design rulebook: Designing consistent restful web service interfaces,
     O’Reilly (2011).
 [8] L. Grassi, C. Recchiuto, A. Sgorbissa, Knowledge triggering, extraction and storage via
     human–robot verbal interaction, Robotics and Autonomous Systems 148 (2022).
 [9] M. Carrithers, M. Candea, K. Sykes, M. Holbraad, S. Venkatesan, Ontology is just an-
     other word for culture: Motion tabled at the 2008 meeting of the group for debates in
     anthropological theory, university of manchester, Critique of anthropology 30 (2010)
     152–200.
[10] C. T. Recchiuto, A. Sgorbissa, A feasibility study of culture-aware cloud services for
     conversational robots, IEEE Robotics and Automation Letters 5 (2020) 6559–6566.