Workshop "From Objects to Agents" (WOA 2019) Towards a Logic-Based Approach for Multi-Modal Fusion and Decision Making during Motor Rehabilitation Sessions Fabio Aurelio D’Asaro Antonio Origlia Silvia Rossi CRDC Tecnologie URBANECO DIETI University of Naples Federico II University of Naples Federico II University of Naples Federico II Napoli, Italy Napoli, Italy Napoli, Italy antonio.origlia@unina.it silvia.rossi@unina.it Abstract—We introduce a general approach which aims at performed. The resulting user model can then be used e.g. to combining machine learning and logic-based techniques in order monitor the user’s performance and decide what is the most to model its user’s cognitive and motor abilities. In the context appropriate rehabilitation strategy. of motor rehabilitation, hybrid systems are a convenient option as they allow both for the representation of formal constraints However, handling data coming from different sources needed to implement a clinically valid exercise, and for the requires a complex system able to integrate them and take statistical modelling of intrinsically noisy data sources. Moreover, decisions accordingly, i.e. a multimodal system [4]. Moreover, logic-based systems offer a transparent way to look at the in existing rehabilitation games, the patient motivational state decisions taken by an automated system. This is particularly has been considered to evaluate the game effectiveness [5], useful when an AI system needs to interact with a therapist in order to assist therapeutic intervention, e.g. by explaining why without providing the possibility of taking decisions with a given decision is sound. This methodology is currently being respect to the online adaptation of the rehabilitation exercise. developed within the context of the AVATEA project. Alongside machine learning techniques, we also intend to Index Terms—Multimodal Fusion, Epistemic Probabilistic employ a logic-based system. Indeed, in a complex domain Event Calculus, Motor Rehabilitation such as that of AVATEA, much of the experts’ knowledge would have to be re-learnt from scratch (thus requiring a I. INTRODUCTION considerable amount of data) if we were to use machine learn- In this work, we introduce some of the ongoing activities in ing techniques exclusively. Logic-based systems, on the other the context of the AVATEA project (Advanced Virtual Adap- hand, are progressively becoming able to handle uncertain tive Technologies e-hEAlth). The project aims at developing knowledge (e.g., using Probability Theory or Fuzzy Logic). an intelligent system to support the rehabilitation process This provides the opportunity to retain important parts of this of children with neuro-motor disorders. More specifically, knowledge even when it comes with a degree of uncertainty. AVATEA aims at creating an integrated system consisting of: Moreover, logic-based systems offer a transparent way to look (i) an adjustable seat, (ii) different types of sensors, and (iii) an at the information in AI systems. For example, a therapist interactive visual interface to perform rehabilitation exercises might want not only to see what decisions were taken by the in the form of games. Such games/exercises are going to system but also why they were taken. A logic-based system be specifically targeted at supporting therapeutic sessions for is able to reconstruct the rationale behind the decisions taken Development Coordination Disorders (DCD). by considering the chain of rules that were applied starting Although a significant amount of work has been done in from the facts in the knowledge base. Such an advantage does the general area of motor rehabilitation with promising results not also apply to machine learning algorithms, that generally [1], there is still a need for developing personalised therapeutic cannot provide explanations in human-readable terms. It is also scenarios. Adaptation techniques typically only focus on max- worth noting that these systems can be used by an expert to imising effort during the rehabilitation session [2]. However, sketch the causal relationships of a domain, and then use other it is also necessary to take into account parameters such as the techniques to learn the appropriate parameters when they are individual subject’s capabilities [3] and the child’s emotional not available to the expert. response, e.g. in terms of motivation and engagement [2]. In this direction, AVATEA aims to assist the activity of a II. BACKGROUND AND RELATED WORK therapist through the use of data acquired from its sensors. Gamification strategies have proven to be extremely suc- Machine learning techniques can process this data to profile cessful to engage young children in diagnostic or therapeu- the user’s motor abilities, his/her psychophysiological state, tic exercises, even before the advent of digital gaming. By and to monitor the child’s response to the exercise being leveraging on Self-Determination Theory [6], the concept of 8 Workshop "From Objects to Agents" (WOA 2019) intrinsic motivation has been applied to engage children in and ProbLog [21] to perform event recognition from security activities designed to provide therapists with reports about cameras. In the proposed case study, the logical part of their competence levels in either cognitive or physical tasks. the architecture receives time-stamped events as inputs and While games to test cognitive capabilities (see e.g. [7]) do processes them in order to detect complex long-term activities not form a sharply defined class, games designed to test and (e.g., detect that two people are fighting from the fact that they improve motor skills are usually referred to as exergames. have been close to each other and moving abruptly during the The effects of exergames have been found to be generally last few seconds). Given their semi-probabilistic nature, these positive [8]. Therefore, combining Computerised Adaptive frameworks are able to handle uncertainty in the input events Tests (CATs) with gamification strategies results in systems (ProbEC) or in the causal rules linking events and fluents that are able to engage young users in playful activities, (MLN-EC). We envisage that similar systems, especially the while adapting the current challenge according to level of Event Calculus-based ones, could be employed as a way to user competence. Furthermore, sessions are typically logged perform fusion between different modalities. in order to provide detailed feedback to therapists. On the cognitive side, these adaptive systems have been designed to III. THE AVATEA ARCHITECTURE evaluate subjective well-being [9] and phonological acquisition The proposed architecture is essentially a multimodal sys- [10] among others. Adaptive exergames have been used e.g. tem. These were first formally defined in [4] as systems that in the context of children with spinal impairments [11], “[. . . ] process two or more combined user input modes such as and to test gross motor skills [12]. These works, however, speech, pen, touch, manual gestures, gaze, and head and body typically use a single modality to implement CATs and do not movements in a coordinated manner with multimedia system consider social feedback as a part of the adaptation process to output”. The possibility to handle multiple communication recover engagement. Considering the challenge posed both by channels is expected to simplify interaction with the user and multimodal fusion and by adaptation strategies, an integrated to result in a more natural way to control an automated system. system for both fusion and decision making represents an Available modalities may be used in an exclusive or concurrent ambitious goal, with potentially broad impact on the field way, with no integration between them [22]. However, it is of adaptive diagnosis and treatment of both cognitive and more often the case that a multimodal system processes mul- physical impairments. tiple channels in a parallel and integrated way [23]. Moreover, Moreover, in the specific case of CATs, the domain knowl- whether it is more advantageous to adopt an early or late edge is known to the developers, as the experimental procedure model for data fusion strongly depends on the amount of must remain safe and informative. It is necessary to keep available knowledge about the domain. From a system design records of the decisions taken by the system, in order to perspective, it is better to develop separate, more specialised, reconstruct and explain how the session was managed by the approaches to analyse each single data source and then fuse the system. In this specific situation, statistical modelling alone is results using a subsequent layer. However, when the domain not advantageous because: (i) it would require a lot of data knowledge is limited, key interactions among input modalities to discover elements that are already known, and (ii) it would may be overlooked: in this case, early fusion is more adequate. make it difficult to provide human-readable feedback to the Generally, the problem of deciding when to apply fusion is one therapists. of the main issues when designing multimodal systems (see In this context, hybrid systems are a convenient option as e.g. [24]). In our domain, the amount of available knowledge they allow both for the representation of formal constraints about training exercises supports the adoption of a late fusion needed to implement a clinically valid exercise, and for the approach. statistical modelling of intrinsically noisy data sources. Re- Figure 1 shows the envisaged architecture for the AVATEA cently, logic-based approaches have been successfully applied project which main modules we will discuss in the following to several fields of Artificial Intelligence, including (but not paragraphs. limited to): event recognition from security cameras [13], [14], robot location estimation [15], understanding of tenses [16] A. Sensors and natural language processing [17]. Due to the increasing We are going to use different types of sensors, including: relevance of Machine Learning and Probability Theory in AI, (i) pressure sensors, (ii) 2D and 3D cameras, (iii) motion these frameworks and languages have gradually started em- detectors, and (iv) an EEG sensor. Pressure sensors, motion ploying probabilistic semantics (see e.g. [18]) to incorporate detectors and cameras are going to be used to make sure that and deal with uncertainty. This has given birth to the field of exercises are being executed correctly by detecting front and Probabilistic Logic Programming (see e.g. [19]). For example back posture, head-pose, movement speed, balance and feet [15], which is based on the Situation Calculus ontology, can relative position. In some cases, the cameras may be used model imperfect sensors and effectors. The Situation Calculus’ to let the user interact with the system, e.g. by pointing at branching structure makes these frameworks mostly suitable the screen. Moreover, data from cameras and EEG data are for planning under partial states of information. On the other going to help checking the user’s current level of stress and hand, MLN-EC and ProbEC [13], [14] extend the semantics engagement and e.g. decide whether the difficulty level of the of the Event Calculus using Markov Logic Networks [20] exercise should be changed. 9 Workshop "From Objects to Agents" (WOA 2019) Fig. 1. The AVATEA architecture B. Input Trackers during the exercise, as well as his/her emotional state and engagement. The Modalities Recognisers classify the features Adaptive games require the ability to dynamically track extracted from the sensor data, and create a list of possible children movements. For example, movements must be taken interpretations (N-Hypothesis) for the AI engine. into account if we want the system to automatically adapt the game speed to the children’ physical capabilities. To this D. User Models aim, pressure sensors, head pose and skeleton data from video images will be processed and then given on input to the game. We will use personalised machine learning techniques to Particular emphasis will be put on tracking children’ posture: learn a model of the children abilities and interaction prefer- for instance, the head-pose will be a triggering event for some ences [27]. In turn, these profiles will make the system able to games. We are going to employ one of the several available recognise anomalies with respect to such model [28] so that it skeleton detection algorithms. Existing methods include those is possible to track improvements in the user’s performance. using R-GBD or 2D cameras (e.g. the OpenPose library [25]) Moreover, user’s performance over time will be correlated with which can identify various positions, even those of ambigu- how they felt about the exercise (or similar exercises) in the ous interpretation (sitting, three-quarter backward perspective, past. This can be used to create a personalised exercise plan etc..). (e.g., exercise type, modality of execution of the exercise, etc). C. Classifiers E. AI Engine In AVATEA, the focus is on human activities. These are The system we envisage implies the use of noisy data very difficult to classify due to the diversity of individual con- sources coming from multiple sensors and trained classifiers. ditions. Leveraging on the expressive power of deep networks These drive a decisions layer conducting adaptive rehabili- as feature extractors, and by exploiting features modelling tation exercises. If, on one hand, this makes it necessary to techniques of the human body, we will research and design handle classification estimates with probabilistic reasoning, novel algorithms for Social Signal Processing [26]. Video and on the other hand, one needs to keep a rule-based structure EEG data will be used to monitor the attention of the children to ensure clinical effectiveness and human-readable session 10 Workshop "From Objects to Agents" (WOA 2019) summarisation. Hybrid systems, typically consisting of prob- • the o-proposition abilistic rules, combine the best of the two approaches by A occurs-at I with-prob P if-holds θ allowing the definition of strict rules. These rules can be used to model the structure of the clinical procedure, and adapt states that action A is known to be occurring at instant I to the information from the classifiers (i.e. confidence and with probability P , but only if its preconditions encoded probability distributions over classes). A user model based in θ are satisfied. on probabilistic estimates can be used by a rule system to • the p-proposition estimate the best course of action using expert knowledge A performed-at I if-believes (θ, P̄ ) encoded in a rule system. This user model can be then further processed in order to customise the therapeutic intervention, states that action A is performed by the agent at instant and therefore to raise the quality of the children experience, I if its state of belief in formula θ at instant I falls in by also taking into account the behaviour of the child during the (open, half-open or closed) interval P̄ . the rehabilitation process. Indeed, the visual interface will be A domain description in EPEC is a collection of these used to offer children the exercises as part of recreational propositions satisfying some integrity constraints (e.g. exactly activities that make use of detected social signals. This will one i-proposition must belong to any domain description). We offer a rehabilitation process based on games whose behaviour are not going to describe these constraints formally here, but automatically adapts to the child. the interested reader can find them in [29], [30]. 1) EPEC: We propose the use of language EPEC (short for EPEC has a possible-worlds semantics where each world Epistemic Probabilistic Event Calculus) as a foundation for represents a possible evolution of the world from the ini- our methodology. Similarly to MLN-EC and ProbEC, EPEC tial state and is weighted according to the propositions in is a language in the style of the Event Calculus for reasoning the domain descriptions. Four implementations of EPEC are about actions, but goes beyond these languages in that it allows available and/or under active development, and can answer for the modelling of noisy sensors. Its foundations were laid queries regarding what is true (and with what probability) in [29], and it has since then been extended in [30] to also in a given domain. Two of them are optimised for the non- include sensing actions and propositions conditioned on belief. epistemic fragments of EPEC, called PEC+. While the exact We briefly introduce its syntax in the following. implementation (written in clingo [31]) exhaustively works In the tradition of reasoning about action languages, out all the possible worlds and their associated weights, EPEC models a given domain using fluents (which represent the approximate implementation (written in the probabilistic properties of the world), instants (which represent time points programming language Anglican [32]) samples a user-defined at which events may occur) and actions (which represent number of worlds using Anglican’s built-in Markov Chain actions under the control of the agent being modelled or the Monte Carlo sampling capabilities, and uses the obtained sam- environment itself). The causal interactions between fluents ple to approximate the probability of a query. Similarly, there and actions are captured by the specialised propositions below: are two implementations of EPEC (including the epistemic fragment) to deal with exact and approximate inference. • the v-proposition 2) Knowledge Base: The following simple domain description demonstrates some features of EPEC: F takes-values hV1 , . . . , Vn i Engagement takes-values hfalse, truei (1) states that fluent F can take values V1 , . . . , Vn . • the i-proposition initially-one-of {({Engagement}, 1)} (2) EEG senses Engagement  (3) initially-one-of {(ψ1 , P1 ), . . . , (ψn , Pn )} 0.7 0.3  with-accuracies states that the environment is initially in one of the states 0.4 0.6 ψ1 , . . . , ψn with probabilities P1 , . . . , Pn . Watching senses Engagement   (4) 0.8 0.2 • the c-proposition with-accuracies 0.1 0.9 θ causes-one-of {(ψ1 , P1 ), . . . , (ψn , Pn )} Cutscene causes-one-of {({Engagement}, 0.9), (∅, 0.1)} (5) states that θ, a formula encoding one or more actions ∀I, EEG performed-at I (6) and some fluent preconditions, has the effect of causing ∀I, Watching performed-at I (7) exactly one of the fluent conjunctions ψ1 , . . . , ψn with if-believes (Engagement, [0, 0.7]) probabilities P1 , . . . , Pn respectively. ∀I, Cutscene performed-at I (8) • the s-proposition if-believes (Engagement, [0, 0.5]) θ senses F with-accuracies M These propositions aim at describing an automated system states that θ has the effect of sensing fluent F with an used to detect the degree of engagement of the patient, which accuracy given by the confusion matrix M . can sound a dedicated alarm to raise the patient’s level of 11 Workshop "From Objects to Agents" (WOA 2019) engagement if this falls below an appropriate threshold. In R EFERENCES this example, Engagement is a boolean valued fluent which at [1] P. Wilson, D. Green, K. Caeyenberghs, B. Steenbergen, and J. Duck- every instant can take values true or false (proposition (1)) and worth, “Integrating new technologies into the treatment of cp and dcd,” initially the patient is known to be fully engaged (proposition Current Developmental Disorders Reports, vol. 3, no. 2, pp. 138–151, Jun 2016. (2)). Propositions (3) and (4) specify the confusion matrices [2] N. Hocine, A. Gouaı̈ch, and S. A. Cerri, “Dynamic difficulty adaptation associated with the actions of EEG and Watching through in serious games for motor rehabilitation,” in Games for Training, the system’s sensors, while proposition (5) defines what the Education, Health and Sports, S. Göbel and J. Wiemeyer, Eds. Cham: Springer International Publishing, 2014, pp. 115–128. expected effects of playing the Cutscene is, i.e. raising the [3] M. S. Cameirao, S. Bermudez i Badia, E. D. Oller, and P. F. M. J. patient’s level of engagement in 90% of the cases. EEG is Verschure, “Using a multi-task adaptive vr system for upper limb continually performed (proposition (6)), whereas Watching rehabilitation in the acute phase of stroke,” in Virtual Rehabilitation, Aug 2008, pp. 2–7. is only performed if belief in Engagement falls below the [4] S. L. Oviatt and P. Cohen, “Perceptual user interfaces: multimodal 0.7 threshold (proposition (7)). Finally, the Cutscene is only interfaces that process what comes naturally,” Communications of the played if Engagement is believed to have fallen below the 0.5 ACM, Volume 43, Issue 3, pp. 45–53, 2000. [5] M. Pirovano, R. Mainetti, G. Baud-Bovy, P. L. Lanzi, and N. A. threshold (proposition (8)). Borghese, “Self-adaptive games for rehabilitation at home,” in 2012 The AI engine therefore established if this is more appro- IEEE Conference on Computational Intelligence and Games (CIG), Sep. priate to select more exercises or if it is necessary to apply 2012, pp. 179–186. [6] R. M. Ryan and E. L. Deci, “Self-determination theory and the fa- an attention recovery strategy. Indeed, also the modality of cilitation of intrinsic motivation, social development, and well-being.” execution of an exercise (e.g., its speed) can be adjusted with American psychologist, vol. 55, no. 1, p. 68, 2000. respect to the children profile. [7] T. Belpaeme, P. E. Baxter, R. Read, R. Wood, H. Cuayáhuitl, B. Kiefer, S. Racioppa, I. Kruijff-Korbayová, G. Athanasopoulos, V. Enescu et al., In this simple case, EEG and Watching are thought “Multimodal child-robot interaction: Building social bonds,” Journal of to be independent. In EPEC it is also possible to model Human-Robot Interaction, vol. 1, no. 2, pp. 33–53, 2012. dependency between actions. For instance, consider a case [8] N. Vernadakis, M. Papastergiou, E. Zetou, and P. Antoniou, “The impact of an exergame-based intervention on children’s fundamental motor in which a high-res camera is also employed to perform skills,” Computers & Education, vol. 83, pp. 90–102, 2015. engagement detection, and consider its associated action [9] Y. Wu, Y. Cai, and D. Tu, “A computerized adaptive testing advancing HiResWatching. This could be modelled by appropriately the measurement of subjective well-being,” Journal of Pacific Rim Psychology, vol. 13, 2019. reworking proposition (4)’s precondition and adding the two [10] A. Origlia, P. Cosi, A. Rodà, and C. Zmarich, “A dialogue-based propositions: software architecture for gamified discrimination tests.” in GHITALY@ CHItaly, 2017. HiResWatching ∧ ¬Watching  senses Engagement (10) [11] M. Mulcahey, S. M. Haley, T. Duffy, P. Ni, and R. R. Betz, “Measuring 0.9 0.1 physical functioning in children with spinal impairments with comput- with-accuracies erized adaptive testing,” Journal of pediatric orthopedics, vol. 28, no. 3, 0.1 0.9 p. 330, 2008. HiResWatching ∧ Watching  (11) [12] C.-Y. Huang, L.-C. Tung, Y.-T. Chou, H.-M. Wu, K.-L. Chen, and C.-L. 0.91 0.09 Hsieh, “Development of a computerized adaptive test of children’s gross with-accuracies 0.07 0.93 motor skills,” Archives of physical medicine and rehabilitation, vol. 99, no. 3, pp. 512–520, 2018. Notice that, although HiResWatching is more accurate than [13] A. Skarlatidis, A. Artikis, J. Filippou, and G. Paliouras, “A Probabilistic Watching (compare propositions (4) and (10)’s matrices) these Logic Programming Event Calculus,” Theory and Practice of Logic Programming, vol. 15, pp. 213–245, 3 2015. two actions are correlated, and the confusion matrix in propo- [14] A. Skarlatidis, G. Paliouras, A. Artikis, and G. A. Vouros, “Probabilistic sition (11) reflects this. Event Calculus for Event Recognition,” ACM Transactions on Compu- tational Logic (TOCL), vol. 16, no. 2, p. 11, 2015. IV. CONCLUSIONS [15] V. Belle and H. J. Levesque, “Reasoning about Discrete and Continuous Noisy Sensors and Effectors in Dynamical Systems,” Artif. Intell., vol. We have presented the system architecture designed for the 262, pp. 189–221, 2018. AVATEA project to manage adaptive rehabilitation exercises [16] M. Van Lambalgen and F. Hamm, The proper treatment of events. John and provide therapists with interpretable feedback about the Wiley & Sons, 2008, vol. 6. [17] P. M. Nadkarni, L. Ohno-Machado, and W. W. Chapman, “Natural session. Also, we have presented the hybrid approach to com- language processing: an introduction,” Journal of the American Medical bine the use of explicit rules with probabilistic management Informatics Association, vol. 18, no. 5, pp. 544–551, 2011. of noisy data sources, like automated classifiers working on [18] T. Sato, “A statistical learning method for logic programs with dis- tribution semantics,” in IN PROCEEDINGS OF THE 12TH INTER- streamed sensor data. The system will autonomously manage NATIONAL CONFERENCE ON LOGIC PROGRAMMING (ICLP95. rehabilitation exercises and will react to social feedback com- Citeseer, 1995. ing from young users during a gamified experience. After the [19] F. Riguzzi, Foundations of Probabilistic Logic Programming, ser. River Publishers Series in Software Engineering. River Publishers, 2018. end of the session, the system will provide a detailed report [20] M. Richardson and P. Domingos, “Markov logic networks,” Machine about the session to support therapists in evaluating children Learning, vol. 62, no. 1, pp. 107–136, Feb 2006. improvement and design further interventions. [21] L. De Raedt, A. Kimmig, and H. Toivonen, “Problog: A probabilistic prolog and its application in link discovery.” in IJCAI, vol. 7, 2007, pp. ACKNOWLEDGEMENTS 2462–2467. [22] L. Nigay and J. Coutaz, “A design space for multimodal systems: con- This work has been partially supported by MIUR within the current processing and data fusion,” in Proceedings of the INTERACT’93 POR Campania FESR 2014-2020 AVATEA “Advanced Virtual and CHI’93 conference on Human factors in computing systems. ACM, 1993, pp. 172–178. Adaptive Technologies e-hEAlth” research project. 12 Workshop "From Objects to Agents" (WOA 2019) [23] M. Turk, “Multimodal interaction: A review,” Pattern Recognition Let- Agents, Multi-Agent Systems and Sustainability: The PAAMS Collection. ters, vol. 36, pp. 189–195, 2014. Cham: Springer International Publishing, 2015, ch. Combining Users [24] S. Rossi, E. Leone, M. Fiore, A. Finzi, and F. Cutugno, “An extensible and Items Rankings for Group Decision Support, pp. 151–158. architecture for robust multimodal human-robot communication,” in [28] S. Rossi, L. Bove, S. Di Martino, and G. Ercolano, “A two-step IEEE/RSJ International Conference on Intelligent Robots and Systems, framework for novelty detection in activities of daily living,” in Social Nov 2013, pp. 2208–2213. Robotics. Cham: Springer International Publishing, 2018, pp. 329–339. [25] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “Openpose: [29] F. A. D’Asaro, A. Bikakis, L. Dickens, and R. Miller, “Foundations for Realtime multi-person 2d pose estimation using part affinity fields,” a Probabilistic Event Calculus,” in LPNMR, 2017, pp. 57–63. CoRR, vol. abs/1812.08008, 2018. [30] F. A. D’Asaro, “Probabilistic Epistemic Reasoning About Actions,” [26] N. Jaques, O. O. Rudovic, S. Taylor, A. Sano, and R. Picard, “Predicting Ph.D. dissertation, University College London, February 2019. tomorrows mood, health, and stress level using personalized multitask [31] M. Gebser, R. Kaminski, B. Kaufmann, and T. Schaub, “Clingo = asp learning and domain adaptation,” in Proceedings of IJCAI 2017 Work- + control: Preliminary report,” vol. arXiv:1405.3694v1, 2014. shop on Artificial Intelligence in Affective Computing, ser. Proceedings [32] D. Tolpin, J. W. van de Meent, H. Yang, and F. Wood, “Design of Machine Learning Research, N. Lawrence and M. Reid, Eds., vol. 66, and Implementation of Probabilistic Programming Language Anglican,” 20 Aug 2017, pp. 17–33. arXiv preprint arXiv:1608.05263, 2016. [27] S. Rossi, A. Caso, and F. Barile, Trends in Practical Applications of 13