INTRODUCTION

[32] P. Rashidi and A. Mihailidis. A survey on ambient-assisted living tools for older adults. IEEE Journal of Biomedial and Health Informatics

SmartLobby: A 24/7 Human-Machine-Interaction Space within an O ce Environment

Stefan Fuchs

stefan.fuchs@honda-ri.de stefan.fuchs@honda-ri.de Manuel Muhlig Honda Research Institute Europe O enbach, Germany 0

Nils Einecke

nils.einecke@honda-ri.de nils.einecke@honda-ri.de Bram Bolder Honda Research Institute Europe O enbach, Germany 0

Fabian Eisele

1 0 Honda Research Institute Europe , O enbach , Germany 1 Technical University of Darmstadt , Darmstadt , Germany

2019

83 13 15

In this work, we present the SmartLobby, an intelligent environment system integrated into the lobby of a research institute. The SmartLobby is running 24/7, i.e. it can be used any time by anyone without any preparations. The goal of the system is to conduct research in the domain of human machine cooperation. One important rst step towards this goal is a detailed human state modeling and estimation. As the system is built into the lobby area, also people having no scienti c background are interacting with the system which strongly reduces the bias in the collected data. On software side the SmartLobby mainly integrates state-of-the-art algorithms that enable a thorough analysis of human behavior and state. These algorithms constitute the fundamental basis for the development of higher level system components. Here, we present our system with its various hardware and software components. Thereby, we focus on the realization of a non-intrusive head-eye-tracking as a key component to continuously observe persons using the system. A rst multi-week lasting experiment utilizes this capability to investigate a method for content personalization, which demonstrates the e ectiveness of the whole system.

INTRODUCTION

This paper presents the SmartLobby, an intelligent environment system, built into the entrance area of a research institute. It allows to present the institute's ongoing research and to perform research at the same time. The SmartLobby is designed for the institute's major research topic: cooperative intelligence, as the next level of arti cial intelligence (AI). That is, AI systems cooperating with humans in an adaptive fashion and having a sense for responsibilities in the cooperation. To achieve this goal, it is necessary to research on anthropomorphic interaction mechanisms as well as on innovative interactions specialized to speci c tasks. It is also important to equip intelligent systems with competences that adapt the interaction with the human depending on the situation. An essential prerequisite is the ability to estimate the internal and observable human states. Although this topic list is not comprehensive it highlights that this kind of research cannot be conducted isolated inside a lab. Especially, online learning, adaptation and personalization rely on a continuous feed with data from human interaction.

The SmartLobby presented in this paper comes with an unobtrusive appearance, which makes the ongoing research a tangible and natural experience for visitors and employees. At the same time, the SmartLobby features properties that support versatility and exibility as these are requirements that come along with this very agile research domain. 2

RELATED WORK

The SmartLobby is following the concept of living lab that was coined at the MIT by Mitchell, Larson and Pentland in 2003 [ 12 ]. The rst living labs have been realized in smart environments to observe humans in their usage of emerging technologies (microscopic level). Especially with the transformation from ubiquitous computing [37] towards intelligent environments [ 5, 17, 34 ] the importance of living labs has grown. Intelligent environments are considered to support and collaborate with humans in a non-intrusive intelligent way and to satisfy their experience in the environment. Hence, the development of such systems requires a user-centered research methodology in order to continuously take into account the needs and wishes of potential users as well as the evolving real life context. Meanwhile, the concept of living labs has been tailored more generally (macroscopic level) to enhance innovation and usability of information technology driven applications in the society. This trend intensi ed even more with the recent progress in the eld of arti cial intelligence where seeing the potential users as competent partners in transparent development processes may reduce qualms regarding this technology.

We see intelligent-cyber-physical-systems (ICPS) [ 25 ] at the intersection of cooperative intelligence, smart environments and robotics. Augusto et al. [ 5 ] describe intelligent environments as a synthesis of ubiquitous computing, smart environments and arti cial intelligence. Sa otti et al. [34] went a step further and introduced physically-embedded-intelligent-systems (PEIS-ecology) as the intersection of ubiquitous computing, AI and robotics. In this context, arti cial intelligence is an essential prerequisite to gain autonomous self-directed agents that can follow own goals and draw related decisions. We, however, aim at enriching the system with a sense for the human state as well as a sense for responsibilities in the cooperation with the human. Thus, we refer to ICPS as an intelligent environment that adheres to the principles of cooperation (see [ 22 ]).

ICPS systems target at applications in various domains from health, hospitals over manufacturing and emergency services to education. A very common application are smart homes equipped with health features also called: ambient assisted living (AAL) [36, 32]. AAL is mainly focusing on helping elderly people to keep their self-determination. The basic idea is to augment the home environment of elderly people with intelligent devices and o ering remote care services that assist the elderly people in their daily live. This eld has grown substantially due to advancements in smart sensors and devices. A survey [ 7 ] of AAL papers between the years 2004 and 2013 found that most work is concentrating on activity recognition [ 10, 23, 27 ] and monitoring of the vital status [ 4, 3, 33 ]. Further reviews [ 30, 9 ] show that the most potential sensors in current AAL projects are cameras, in particular RGB-D cameras. Cameras allow for analysis that is di cult to achieve with other sensors like human behavior analysis, fall detection or emotion recognition. On the downside, the information richness of cameras also constitute a critical privacy issue. In particular, cameras installed in sensitive areas like bathrooms are very critical and result in low acceptance. Hence, privacy preservation is also an important focus of research. For example [ 8 ] tackle the privacy issue by replacing the real camera images with a pre-stored background image plus stick- gures of the current people visible. Furthermore, all necessary data for computing behavior and situation analysis is stored locally.

Another close area to our work is research on improving quality of life in working environments. However, the setup is often restricted to sensors directly installed at each desktop [ 20, 24, 3 ] and not in an ambient fashion as in AAL. Going more in direction of ambient assisted working is the idea of a virtual secretary presented in [ 11 ]. Several o ces were equipped with cameras and microphones in order to recognize people and their activities. In case a meeting situation is detected, visitors trying to enter the door or callers trying to phone someone in the room are automatically informed. Unfortunately, false detection rates are high because of the simple heuristics used. Furthermore, the system was mainly tailored around detecting presence and availability of persons in the o ces.

Finally, [31] analyzed two prominent AAL middleware platforms and derived a set of recommendations for future AAL systems from this. Two major insights are that AAL systems should favor GUIs over text-based interfaces and that the system should have a good balance between standardization and customizability. Our system [ 14 ] adheres to these suggestions by having a graphical touch interface and by using ROS [ 29 ] as underlying middle-ware platform. 3

SMARTLOBBY SYSTEM

The following section describes the requirements and the realization of the the overall system with its structural concept, selected sensors, e ectors and software. 3.1

General Requirements

Our goal was to design a room that has both a representative character and an intelligent behavior according to the concept of ICPS. The intelligence of the SmartLobby is not meant to be a static demonstration of research results but rather to be continuously developed further by interaction. That is, there are challenges arising from the users with their heterogeneous background and privacy considerations as well as regarding robustness, operability and versatility. The daily usage by expert and non-expert (e.g. administration or guests) users in a standard o ce environment provides the necessary insights to actually generate intelligent systems with a natural and intuitive human-machine-interaction.

The overall impression of the room should avoid usual lab impressions such as cables, extensive sensor rigs or prototypical electronics. A presentation software has to present the institute with its structure, employees, connections to other institutes and research projects. Additionally, the presentation software should welcome visitors or announce talks embedded into a weekly press review. The presentation software should be intuitive and self-explaining in order to encourage a usage by guests and novices.

One major target is the 24/7 operation of the system's basic functionalities, i.e. the presentation software, people tracking, live feedback, and head-eye-tracking have to run continuously. Furthermore, a fundamental requirement is versatility and extensibility of the hardware installation to accommodate to new research approaches. 3.2

Structural Concept

The location we selected for the SmartLobby is an essential junction of the institute's traveling paths, because it connects two building parts with the employee o ces and laboratories. With administration and management o ces, a co ee kitchen and two meeting rooms directly connected, this area is frequently visited by employees and visitors alike. Fig. 1 gives an image of the o ce lobby.

The room has a size of 6 m 5.5 m and features ve doors and two open gateways. A large bar table extends from the co ee kitchen's service hatch into the room. The bar table invites to meet, interact with the system and operate the touch screens that are embedded into it.

The wooden material of the walls generates a comfortable atmosphere while the lamella design maintains a more technical touch. In the upper part the lamella distance is increased to make them more transparent for light from the adjacent rooms.

Sensors and actors are either directly built into the wooden wall or attached to it. Power and network cables are hidden inside the wall to support a living room atmosphere. In order to keep the room exible, the walls feature various hidden extension slots that can house additional sensors, e ectors or computation hardware. However, major computing hardware is installed remotely in a server room. 3.3

Sensors and E ectors

Fig. 3a shows a top view of the SmartLobby and its sensors. The room is equipped with seven imaging devices. There are three Kinects (XBox One) for providing depth measurements, three ceiling-mounted RGB cameras (IDS UI-5250CP-C-HQ, lens 6 mm) in the corners for a complete room overview, and a pan-tilt-zoom-(PTZ)camera (Axis Q6128, lens 3.9 mm to 46.8 mm) for high delity scans. At a distance of 6 m the PTZ-camera still captures images with a spatial resolution of 5 pix/mm. The room is also equipped with two microphone arrays (depicted by small green circles in Fig. 3a) each consisting of eight pressure zone microphones (AKG C 562 CM, Beyerdynamic Classis BM32W). Their arrangement enables sound localization and highly sensitive speech recognition.

The most straightforward interaction with the system is the usage of the displays. At the heart of the interaction is a multi-touch-table (MTT) that comprises two touch displays (scape tangible 55) each having a UHD resolution (3840 pix 2160 pix) at 55 inch screen diagonal. The left touch display runs a proprietary software for interactively presenting the research institute. It can be operated not only by nger touches but also by physical passive and active tokens that trigger certain information. There is one wall mounted display close to the MTT with a 84 inch screen diagonal. The wall along the main passage features two ultra-wide displays each with a size of 2.16 m 0.35 m at a resolution of 3840 pix 600 pix. This installation enables the interactive control of the displays as a feedback for persons walking by. Two speakers beside the large display allow for playing back audio and video data as well as performing speech-based interaction.

A strong focus was also put on the light installation. The lamellas in the walls are partially replaced by multi-color LED stripes. These LED stripes are grouped into eight segments (two for each wall), which can be controlled independently. The ceiling lights provide white light with controllable intensity. The light is driven by DMX protocol and accessible via Ethernet by a LANBox-CLX device. 3.4

Mobile Robot

The SmartLobby covers only a small area of the institute. As a straightforward extension further research projects involve mobile robots that are permanently active in the whole o ce area, e.g. for guiding people. Together with these robots the SmartLobby connects to a multi-entity system that shares information among the di erent entities. For example the robots act as mobile extended sensors for the SmartLobby. In return the SmartLobby feeds user information gathered in user interactions back to the robots and potentially other entities. A more detailed description of the mobile robots and their usage is described in [ 16, 13 ]. 3.5

Software

The large variety of sensors and the generated data cannot be processed on a single computer. Thus, there is the need for a versatile middleware that allows for distributed processing as well as fast integration of new sensors and algorithms. We decided to use ROS [ 29 ], which comes with a lot of available sensor packages (e.g. iai_kinect2, ueye_cam) and deployable applications (e.g. OpenPTrack). Also ROS provides means for web-based usage that (b) Screen Patterns (e) Desaturation (f) Rig Detection (g) Daily Calibration (h) Fused Clouds is independent of the operating systems involved (ros_bridge). Hence, ROS can be easily integrated with any kind of hardware as long as an HTML5-renderer is available.

In order to assure robustness of our multi-entity system, the mobile robots and the smart lobby are organized in separate ROS systems with separate ROS masters. This ensures that the robots can disconnect and reconnect at any time without the need of restarting the overall system. To share speci c topics and services across the multiple ROS systems, the multimaster_fkie package [ 1 ] is used. 4

SMARTLOBBY MODULES

The variety of ROS software modules deployed in the SmartLobby can be categorized into: infrastructure modules to control the sensors and e ectors, maintenance modules to guarantee the 24/7-operation, and application modules. The following sections describe selected software modules from the latter two categories. Thereby, we focus on modules that enable and foster accurate human monitoring to realize a personalization application. 4.1

Automatic Camera Calibration

After the installation of the cameras an extrinsic calibration is necessary to unify the coordinate systems to merge (see Fig. 2h) or to transfer information amongst the sensors. In a rst attempt, we calibrated the cameras with a standard checkerboard calibration rig (see Fig. 2a). Unfortunately, the cameras are subject to slight movements, due to ball joints in the camera mount and lightweight dry-walls which vibrate slightly when a door is closed. A regular recalibration would be necessary but is very time consuming when using a manually placed rig, especially because multiple rigs are necessary in order to cover all cameras. Hence, we devised an automatic calibration, which makes use of the installed monitors.

The basic idea is displaying the calibration patterns on the screens. Since the screen positions are very stable their exact position needs to be measured only once. In order to easily distinguish the screens we use di erent pattern colors as shown in Fig. 2b. There are, however, some details that need to be considered. Firstly, the calibration patterns shouldn't be occluded. For this reason, we decided to trigger the calibration at 1 am, when people are rarely present. Secondly, there is a problem with the monitors' re ective properties. The cameras might see a re ection from the lighting which overcasts the calibration pattern (see Fig. 2c) and thus prevents a clean calibration.

Automatically switching o the light before calibration prevents re ections but introduces another problem. As the room is dark and the screens are bright, the automatic camera exposure will be high and lead to overexposed pixels (see Fig. 2d). To account for that, we reduce the screen brightness to 15 % during calibration. Finally, we observed some desaturation e ects when having a camera at a steep angle relative the the calibration screens. As Fig. 2e shows, this desaturation is not homogeneous. The calibration pattern is green on the left side (far 0 0.0 0.5 1.0 1.5 velocity / m/s 2.0 (a) Cameras and Tracks (b) Speed Histogram side) of the display but is almost white on the right (close side). We counteract this mainly by reducing the pattern extent on the ultra-wide screens.

The actual calibration procedure is done with OpenCV [ 6 ] (see Fig. 2f). Fig. 2g shows the extrinsic parameter measurement of the center Kinect over the course of 296 days. In this plot a notable drift in orientation is observable. Furthermore, the plot shows some events of strong perturbations. These were caused by maintenance work on the pan-tilt-zoom unit which is located on the ceiling above the center Kinect.

In order to measure the accuracy of the automatic calibration we generated a manual Ground Truth calibration by manually annotating the corners of the screen calibration patterns in the camera images. We found the automatic calibration error to be between 0.36 and 0.64 for camera orientations and between 0.1 m and 0.4 m for camera positions. The eligibility of the manual Ground Truth was judged with a simple experiment: We assumed that humans are able to localize corner points with an accuracy of at least 1 pix. By jittering the human screen corner point annotations by 1 pix we found that the average accuracy of the manual calibration is about 0.01 m in position and 0.13 in orientation. Since the average error caused by 1 pix jitter is much smaller than the di erence between automatic and manual calibration, we can regard the manual calibration as a reasonable ground truth. 4.2

People Tracking

For tracking people in the SmartLobby, we employ OpenPTrack [ 26 ]. One important aspect of OpenPTrack is its support for distributed computing. In our setup, we use one ZBox (Zotac Magnus-1070k, i5-7500T CPU @ 2.70GHz Quadcore) for each Kinect. OpenPTrack allows to run point cloud generation and detection on the ZBox while fusion and tracking is done on one of the central SmartLobby computers. Hence, the computational load is highly distributed and data bandwidth is reduced as the data-heavy point clouds are not sent over the network.

Fig. 3a shows that the arrangement of Kinects allows to track humans in the lobby on all reasonable paths. During one morning about 120 persons pass the SmartLobby. The histogram of average walking speeds corresponds to the typical speed of 1.4 m/s (see red line in Fig. 3b).

OpenPTrack Head-Eye-Tracking

gaze, head pose, ROI

Face Recognition

id gaze, head pose

Gaze Calibration

calibrated

gaze, head pose id AOI

Analysis

AoI Fixations

Slide Control / Statistics

tracks, track ids stathee,atrdacpkosied image, intrinsics, extrinsics

gaze, head pose

Image Server

track from

OpenPTrack Adapt Content

velocities pan, tilt, zoom pan, tilt, zoom, image Head pose and eye gaze are regarded as a major cue to infer the internal human state [ 2 ]. In the context of smart environments usually vision based remote eye tracking systems are applied, because of their non-intrusiveness and their possibility to promptly interact with the system. Gaze estimation relies on qualitative images, which results in a con ned tracking box. The enhancement of the tracking box is an integral challenge of such systems. There are systems with movable cameras (rays, drone), systems that try to cope with less resolution and pan-tilt-zoom cameras.

We decided to integrate an Axis PTZ-camera, which is placed above the large display as it is the most prominent information hub in the room (see Fig. 4). Thus, it is possible to track all humans that enter the area and capture their glances while watching the screen. Two threaded processes retrieve the images and forward them as ROS messages at 40 Hz and a latency of 75 ms. We developed a simple kinematic model of the PTZcamera given the assumption that the image plane is moving on a sphere located in camera center. Here, we used a common checkerboard pattern to retrieve the anchor point of the camera, the radius of the sphere, and the pan o set. Given the possibility to change the camera's focal length from 3.9 mm to 46.8 mm with a resolution of 10000 steps there could be the same amount of intrinsic calibrations. The high quality optics allows us to neglect the distortion and we set the principal point to the center of the image. Furthermore, we estimated the actual focal length (camera scale) for ve operation points and generated a look-up-table.

The PTZ-camera provides an extensive REST-API (VAPIX) to monitor, parametrize and control the device. We implemented a closed-loop PI-controller, which controls the camera orientation and its zoom by adjusting the velocities of pan, tilt, and zoom. The controller runs at 20 Hz and its major parameters (Kp and Tn) have been optimized to capture an entering human within maximal 1.5 s. The maximum speed of the PTZ-camera is set to 90 =s.

The head-eye-tracking-(HET) uses tracking messages from OpenPTrack as an initial guess for a head position in order to follow the people's faces with the PTZ camera. Head pose and eye gaze are estimated by means of the dlib-library [ 18, 19 ] and GazeML [ 28 ] with an average accuracy of 10 . This accuracy is su cient to estimate grid-wise areas-of-interest (AoIs), which are used by other applications, such as the personalization (see Fig. 7c). automatic enrollment face recognition

EnFcaocdeing EnFcaocdeing EnFcaocdeing approoprireiantteatsihonar,pgnaezses,qburailgithytness -1 2 2 2 Query Each Single Encoding

EnFcaocdeing EnFcaocdeing EnFcaocdeing EnFcaocdeing OPT0 OPT0

OPT1 OPT1 OPT1 OPT1 OPT1 OPT1 (a) Enrollment Process

EncEondFcinaocgdeing 2

Face 1Face

Face EncEondciEnongdFcinaocgdeing

EncoFdaicnFegEaFncaecoeding3 EncEondcinogdFinacge FaceEncoding OPT0 already known create new face model augment face model 200 150 e l p o e 100 p 50 0.0MTWTFMTWTFMTWTFMTWTFMTWTFMTWTFMTWTFMTWTFMTWTFMTWTFMTWTFMTWTF0

week day (b) Unknown faces vs. unfolding of face-id-database

Face Recognition

Recognizing individuals entering the SmartLobby is an inevitable capability to implement personalized behavior. The most common and non-intrusive approach is face recognition. Usually, face recognition is encompassed by an enrollment process wherein a user deliberately provides one or more images of his face together with his name. In spirit of tighter data protection regulations and its uncertain reading, we decided to implement an anonymized face recognition. The system learns new faces in an unsupervised fashion without associating the actual names of the users and still providing unique id numbers.

Our face recognition system is based on the dlib-library [ 19 ] and the HET-module outcomes. Two processes are running in parallel (see Fig. 5a): First, every time the HET-module gives a head pose, the system tries to recognize a face. To this end it passes an image cutout of the head pose to the face recognition of the dlib-library, which transforms these into a 128-dimensional face-feature-vector. This measured feature vector is used to search for the most similar face model in the database. The system computes the L2-norm of the measured face feature and the stored ones. A match is found for the pair with the minimal norm that is above a certain threshold (1 L2-norm). Second, all face RoIs of a coherent head tracking are ltered according to particular requirements (brightness, orientation, sharpness). The remaining RoIs are assumed to relate to a single individual and the facial-feature-vector is computed for each RoI. By using the aforementioned search method a matching face feature is added to the database (augmentation) or, if not available, a new face model is created (enrollment).

These two processes rely on di erent con dence thresholds that have been decided on the basis of face recognition performance measures. According to [ 15 ] and [35] we identi ed the false non-match rates (FNMR), false match rates (FMR), false acceptance rates (FAR), and false rejection rates (FRR). Fig. 6a shows the comparison of the dlib-face recognition performance with the best approach from a recent NIST evaluation. At a FMR of 0.1 % genuine facial features are classi ed as unknown for 27 % of the queries (FNMR). This is a magnitude higher than state-of-the-art (2.5 % FNMR). It is representative only for our setup and potentially caused by imperfect lighting conditions and unfavorable viewing angles. Fig. 6b is used to decide on the con dence thresholds for the recognition and unsupervised augmentation processes. Usually, a threshold is selected that gives equal FARs and FRRs. The developers of the dlib-face recognition software recommend a threshold of 0.4 (light green dots). In our setup, we used for the regular face recognition and the initial enrollment a threshold of 0.475 , which yields a FRR and FAR of 9 %. For the unsupervised augmentation, we used a more strict threshold of 0.6 . Thus, we can prevent the face model database from adding incorrect facial features and mixing up individuals.

Fig. 5b shows the face database unfolding over a period of 12 weeks. Currently, approximately 100 researchers are with the institute and should regularly pass-by the SmartLobby. The automatic enrollment prefers to enrich the database with persons that are captured by the HET-system, i.e. persons that either are present in the lobby or interact with the screens. Over time, the number of enrolled faces continuously increases, whereas the percentage of unknown faces reduces continuously. It can be observed that the system generates more IDs than the expected 100 researchers. There are two main reasons: First, the system also captures guests and maintenance sta . Second, the system sometimes creates more than one instance for a person, which needs to be improved in the future. 0.7 0.6 A major task of the SmartLobby is to represent the institute and to visualize ongoing research activities with their achievements. We targeted for a system that can express the creativity and innovative strength of the institute by using state-of-the-art devices and recent technology. As a result, we decided to use multi-touch displays (MTD) as interaction hubs, which not only react on touch gestures but also recognize physical tokens placed on the displays.

The latter capability is a basic feature of the presentation software easire (developed by interactive scape), which govern the left MTD and the large display. For example, easire can present the structure of the institute, CVs of the people working at the institute or give information about running and nished projects. Fig. 7a depicts a view when standing in front of the left MTD. The large, upright screen can either show a duplicate of the content of the left MTD or additionally display videos, documents and charts. This feature is especially useful when presenting information to a larger audience, because the horizontal touch screen is hardly visibly from farther away due to an unfavorable view perspective. We found that people enjoyed this kind of presentation much more than dusty old slides. Furthermore, the token concept allows them to easily explore the things they are interested in. The right MTD is set aside for RViz, which visualizes ROS topics, such as point clouds (see Fig. 2h), robot trajectories or results of our algorithms. 4.6

Personalization Experiment

During idle times a slide show welcomes visitors, announces talks and comprises a weekly updated press review. Typically a dozen pages are repeatedly shown, each for 2 min. Given the face recognition and the AoI analysis, the system can memorize the slides a person has already seen. A rst multi-week lasting experiment investigates a basic content personalization method targeted at all kinds of visitors to the SmartLobby. We divided the people into two groups, one where the system intervenes in the repetitive slide process and one where it does not. The groups are split randomly by intervening only for people with an even ID. For the intervention group, the system skips the slide show to pages that have not been regarded by the person yet. This means, if the current slide, when entering the room, has not been seen by the person yet, the system will also not switch to another slide for people in the intervention group. Fig. 7b shows the impact of the slide-skipping intervention. The analysis of the intervention group is split into events where no intervention happened (slide was not yet known anyway) and actual skipping events. As one can expect, there is no signi cant di erence between the control group (no intervention) and the events of the intervention group where no intervention happened (slide

Control Exp. Group Exp. Group Group w/o inter- with intervention vention (a) Tokens trigger the presentation of various contents (b) Personalization (c) Glance statistics of two particular slides unknown yet). The median viewing time is similar for both groups with about 1 s. This seems rather short, but it includes also people passing by without reading the screen. In contrast, skipping the pages attracts the attention of the visitors and increases the mean viewing time tremendously to 4 s.

Furthermore, we investigated the reading behavior for di erent content categories: tech news and publication announcements. Fig. 7c shows the glance frequency distribution for the publication slides. The gaze transition entropy [ 21 ] is a promising criteria to measure interest in content: Though the median entropy is similar for the two categories with 2.8 , the entropy variance for the news slides is more than twice as big. This demonstrates, that di erent news topics cause a larger spread in the gaze transition entropy values of the readers. 5

CONCLUSION AND OUTLOOK

We have presented an intelligent environment integrated into the lobby of a research institute. We described how to equip a specialized public space with ubiquitious technologies, so as to showcase research results as well as to serve as a workbench for collecting data for research in the domain of human machine cooperation. This so-called SmartLobby is available 24/7. The interior design, the hardware and software infrastructure allow for versatility and extensibility. Thus, we can quickly set up experiments and conduct unobtrusive user studies in an every-day environment.

For a start, we developed a head-eye-tracking system, that covers the whole lobby, and an anonymous face recognition to enable long-term user monitoring in the SmartLobby. On this basis, we equipped the lobby with a simple intelligent behavior and personalized the news feed shown in the lobby. Our results show the positive personalization impact by increasing the median viewing time. Furthermore, we investigated gaze transition entropy as a measure for interest in the news feed slides. As a next step, we will enhance the personalization by also considering the personal interest.

ACKNOWLEDGMENT

We would like to thank studio klv GmbH for designing and managing the reconstruction of the lobby and Interactive Scape GmbH for implementing the presentation software as well as the token and touch user-interface.

[1] http://wiki.ros.org/multimaster_fkie.

[2]

Al-Rahayfeh and

Faezipour . Eye tracking and head movement detection: A state-of-art survey . IEEE Journal of Translational Engineering in Health and Medicine , 1 , 2013 .

[3]

Alberdi ,

Aztiria ,

Basarab , and

D. J.

Cook . Using smart o ces to predict occupational stress . In International Journal of Industrial Ergonomics , volume 67 , pages 60 { 66 , 2018 .

[4]

Andreu ,

Chiarugi ,

Colantonio ,

Giannakaki ,

Giorgi ,

Henriquez ,

Kazantzaki ,

Manousos ,

Maria ,

B. J.

Matuszewskia ,

M. A.

Pascali ,

Pediaditis , G. Raccichini, and

Tsiknakis . Wize Mirror - a smart, multisensory cardio-metabolic risk monitoring system . In CVIU , pages 3 { 22 , 2016 .

[5]

J. C.

Augusto ,

Callaghan ,

Cook ,

Kameas , and I. Satoh. Intelligent Environments: a manifesto . Human-centric Computing and Information Sciences , 3 : 12 , 2013 .

[6]

Bradski. The OpenCV Library. Dr . Dobb's Journal of Software Tools , 2000 .

[7]

Calvaresi ,

Cesarini ,

Sernani ,

Marinoni ,

A. F.

Dragoni , and

Sturm . Exploring the ambient assisted living domain: a systematic review . Journal of Ambient Intelligence and Humanized Computing , 8 ( 2 ): 239 { 257 , 2017 .

[8] CherryHome. Cherryhome. ai website , 2019 .

[9]

Colantonio ,

Coppini ,

Giorgi , M. -

A. Morales , and M. A.

Pascali . Computer vision for ambient assisted living: Monitoring systems for personalized healthcare and wellness that are robust in the real world and accepted by users, carers, and society . In Computer Vision for Assistive Healthcare, pages 319 { 336 , 2018 .

[10]

Cucchiara ,

Prati , and

Vezzani . A multi-camera vision system for fall detection and alarm generation . Expert Systems , 24 ( 5 ): 334 { 345 , 2007 .

[11]

Danninger and

Stiefelhagen . A context-aware virtual secretary in a smart o ce environment . In Proceedings of the 16th ACM International Conference on Multimedia , pages 529 { 538 , 2008 .

[12]

Eriksson ,

Niitamo ,

Kulkki , and

Hribernik . Living labs as a multi-contextual R&D methodology . The 12th ITM Conference , January 2006 .

[13]

Fischer ,

Hasler , J. Deigmoller, T. Schnurer, M. Redert,

Pluntke ,

Nagel ,

Senzel ,

Richter , and

Eggert . Where is the tool? - grounded reasoning in everyday environment with a robot . In International Cognitive Robotics Workshop (CogRob) , 2018 .

[14]

Fuchs ,

Einecke , and

Eisele . Smart-lobby: Using a 24/7 remote head-eye-tracking for content personalization . In Proceedings of UbiComp/ISWC , 2019 .

[15]

Grother ,

Ngan , and

Hanaoka . Ongoing face recognitionvendor test (frvt) . Technical report, NIST National Institute of Standards and Technology , 2019 .

[16]

Hasler ,

Kreger , and

Bauer-Wersing . Interactive incremental online learning of objects onboard of a cooperative autonomous mobile robot . In Neural Information Processing , pages 279 { 290 , 2018 .

[17] M. B. James Dooley and M. R. Al-Mulla . Beyond four walls: Towards large-scale intelligent environments . pages 287{294 , 2012 .

[18]

Kazemi and

Sullivan . One millisecond face alignment with an ensemble of regression trees . June 2014 .

[19]

D. E.

King . Dlib-ml: A machine learning toolkit . Journal of Machine Learning Research , 10 : 1755 { 1758 , 2009 .

[20]

Kiyokawa ,

Hatanaka ,

Hosoda ,

Okada ,

Shigeta ,

Ishihara ,

Ooshita ,

Kakugawa ,

Kurihara , and

Moriyam . Owens luis a context-aware multi-modal smart o ce chair in an ambient environmen . In IEEE VR Workshops (VRW) , pages 1 {4 , 2012 .

[21]

Krejtz ,

Duchowski , and

Villalobos . Gaze transition entropy . ACM Trans. Appl . Percept., 13 ( 1 ), December 2015 .

[22]

Kru ger, C. B . Wiebel , and H. Wersing . From tools towards cooperative assistants . In Proceedings of the 5th International Conference on Human Agent Interaction , pages 287 { 294 , 2017 .

[23]

Lymberopoulos ,

Bamis , and

Savvides . Extracting spatiotemporal human activity patterns in assisted living using a home sensor network . Universal Access in the Information Society , 10 ( 2 ): 125 { 13 , 2011 .

[24]

McDu ,

Karlson ,

Kapoor ,

Roseway , and

Czerwinski . A ectaura: An intelligent system for emotional memor . In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , pages 849 { 858 , 2012 .

[25]

H. A.

Mu ller. The rise of intelligent cyber-physical systems . Computer , 50 ( 12 ): 7 { 9 , 2017 .

[26]

Munaro and

Menegatti . Fast RGB-D people tracking for service robots . Journal on Autonomous Robots , 37 ( 3 ): 227 { 242 , 2014 .

[27]

Pandya ,

Ghayva ,

Kotecha ,

Awais ,

Akbarzadeh ,

Gope ,

S. C.

Mukhopadhyay , and

Chen . Smart Home Anti-Theft System: A Novel Approach for Near Real-Time Monitoring and Smart Home Security for Wellness Protocol . In Applied System Innovations , volume 1 , 2018 .

[28]

Park ,

Spurr , and

Hilliges . Deep pictorial gaze estimation . In European Conference on Computer Vision (ECCV), ECCV '18 , 2018 .

[29]

Quigley ,

Gerkey ,

Conley ,

Faust ,

Foote ,

Leibs , E. Berger,

Wheeler , and

A. Ng. ROS:

An open-source Robot Operating System . ICRA Workshop on Open Source SW , 3 : 1 { 6 , 2009 .