<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">RoboTrio2: Annotated Interactions of a Teleoperated Robot and Human Dyads for Data-Driven Behavioral Models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Frédéric</forename><surname>Elisei</surname></persName>
							<email>frederic.elisei@gipsa-lab.grenoble-inp.fr</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">Univ. Grenoble Alpes</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">Grenoble INP</orgName>
								<orgName type="institution" key="instit4">GIPSA-lab</orgName>
								<address>
									<postCode>F-38000</postCode>
									<settlement>Grenoble</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Léa</forename><surname>Haefflinger</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">Univ. Grenoble Alpes</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">Grenoble INP</orgName>
								<orgName type="institution" key="instit4">GIPSA-lab</orgName>
								<address>
									<postCode>F-38000</postCode>
									<settlement>Grenoble</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Atos</orgName>
								<address>
									<settlement>Échirolles</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Gérard</forename><surname>Bailly</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">Univ. Grenoble Alpes</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">Grenoble INP</orgName>
								<orgName type="institution" key="instit4">GIPSA-lab</orgName>
								<address>
									<postCode>F-38000</postCode>
									<settlement>Grenoble</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">RoboTrio2: Annotated Interactions of a Teleoperated Robot and Human Dyads for Data-Driven Behavioral Models</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">1F9A188D1DB6DFFAA68093D7BB9899F7</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:12+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>human-robot interaction, social robotics, multi-party, cooperative game, head and gaze orientation, immersive teleoperation (F. Elisei) 0000-0002-1295-3445 (F. Elisei)</term>
					<term>0009-0009-6592-040X (L. Haefflinger)</term>
					<term>0000-0000-0002-6053 (G. Bailly)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present here RoboTrio2, an annotated multimodal corpus of interactions between an autonomouslooking social robot and two humans, and the original way to record it: an immersive tele-operation of the robot, which makes it behave naturally and efficiently, and captures many signals (gaze including vergence, head and neck movements, the exact subjective stereo views that motivate the decisions in the interaction, binaural audio...). With this high level of embodiment, the pilot provides the robot with demonstrations of conversational skills to conduct a natural interaction with humans and successfully perform the intended task (social interactions in a gaming scenario, with gaze and speech turnovers). The behaviors of its two human partners are also recorded through static HD cameras and headset microphones to ease annotation. Training autonomous behavioral models for our social robot is the main goal of this 8-hour corpus, but the study of elicited human behaviors is also possible with the corpus and annotations we released.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Nowadays machine learning needs adequate training data. What do social robots need to train to social interaction with naive humans? Would it be successful to imitate human signals, just like children do? The generation of verbal and non-verbal behaviors for robots is frequently based on human-human interaction dataset <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3]</ref>. But robots -even humanoids ones -have different bodies and capabilities, will not grow up in a human body and must be prepared to be alternatively considered by users as agents or objects, or even ignored when their behavior is no more appropriate... Some studies have already highlighted the differences between Human-Human Interaction (HHI) and Human-Robot Interaction (HRI): Children appear to be more expressive when playing with another child than with a robot <ref type="bibr" target="#b3">[4]</ref>, the position of a human's head during turn-taking varies depending on whether the change is occurring between two humans or between a human and a robot <ref type="bibr" target="#b4">[5]</ref>, the duration of gaze fixations of a robot face is longer than for a human face <ref type="bibr" target="#b5">[6]</ref>, humans modify their prosody when addressing a machine <ref type="bibr" target="#b6">[7]</ref>. To tackle this problem and study human-robot interactions, the wizard of Oz method is often used <ref type="bibr" target="#b7">[8]</ref>, where the robot is remotely controlled by a human using buttons and predefined actions <ref type="bibr" target="#b8">[9]</ref>. A second possible method is a robot controlled by rules <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11]</ref>. However both options restrict the possible actions to those that are predefined, Both introduce a lack of naturalness and fluidity in the interaction, also modifying the transistions. In addition, they often limit the experiment to the study of a unique aspect of the interaction.</p><p>If learning by imitation, robots should do from other robots that have the same sensors/actuators (body) and already engage successfully in fluid, natural, ecological interactions with humans ... while humans interact with this yet-to-be autonomous robot!</p><p>We describe here an original method for collecting such corpora with fluid humans/robot interactions: immersive teleoperation of a humanoid robot can collect multimodal signals intrinsically adapted to the specific robot sensors/actuators and its specific reaction times, while bringing the social know-how, language understanding, and decision-taking of a human tutor into the sensory-motor coupling.</p><p>The here delivered 8-hour RoboTrio2 corpus <ref type="bibr" target="#b11">[12]</ref> is such an example, with many annotations. It consists of a task-oriented interaction of a robot with 23 humans pairs, collected in French by a single pilot. This dataset was successfully used to train a machine learning model for robot gaze control <ref type="bibr" target="#b12">[13]</ref>. It could also be used to study conversational modes, gaze and head behavior, or to compare with behavior from HH data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Immersive teleoperation</head><p>Figure <ref type="figure" target="#fig_0">1</ref> shows our setup for RoboTrio2. The human pilot that immersively drives the robot is in a remote room and wears an HTC Vive VR headset, that embeds two SMI eyetrackers. He also uses stereo earbuds and a microphone. Immersive teleoperation of a robot, to collect natural interaction data of 2 human users in front of an autonomous-looking social robot (and its tablet, using mixed-reality to show in-game help). The table sides also support two HD cameras, directed towards the humans to ease the automatic/manual annotation process.</p><p>His chin and lip corners have motion capture markers to drive the robot face in real time with his articulations: The robot is a modified iCub <ref type="bibr" target="#b13">[14]</ref> that has an articulated mouth (hiding a speaker), mobile eyes (each embedding a VGA camera) that move like human ones (with 3 degrees of freedom, including vergence), and a microphone in each ear (with human-shaped ear pinnas). The robot neck allows human-like orientations of the head (with 3 degrees of freedom).</p><p>Logged streams: With our original immersive teleoperation system <ref type="bibr" target="#b14">[15]</ref>, all these real time streams are synchronized and recorded, and control the iCub robot in real-time. Pilot's gaze and pilot head movements are 60 Hz streams. We also record what the robot does from these motor instructions to drive the equivalent 6 degrees of freedom (3 for the neck, 3 for the eyes including vergence). The pilot generates his interactive behaviors (where to look, what to say, head movements...) from what he hears and what he sees through the robot sensors, captured by the robot ears and through the pair of cameras embedded in its mobile eyes. No joystick or button press for the pilot, he just directs his own face and gaze, that the VR headset and its dual eye-tracker track. Pilot's jaw and lip movements are tracked by a Qualisys motion capture system and drive the robot face to shape its mouth, that relays the pilot speech through a speaker.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">The RoboTrio2 corpus</head><p>The corpus involves a cooperative game played by the two humans. They sit in front of a social robot that acts as a game animator and referee. This robot is teleoperated as previously described, resulting in a high level of embodiment. What is demonstrated by the pilot is a viable solution with the specific robot sensors/actuators to conduct a natural real time interaction with humans (decoding and generating meaningful gaze and aversion, speech turnovers...) to successfully perform the intended social interaction needed by the gaming scenario. What is experienced by the human players is an autonomous-looking robot that utilizes natural language and ecological head and gaze patterns to perform the joint task. This 8-hour corpus logs data streams and events linking perception and action, making it ideal for building autonomous behavior models for our social robot, Nina, a modified iCub. But with all the provided annotations, it can also be used to study all the humans that interacted with this robot: we recorded 23 interactions with different human pairs (either male or female) while the robot is always teleoperated by the same human pilot (to help build a coherent one-to-many robot behavior model).</p><p>The game: It is played by a team of two humans, trying to find the words most commonly associated with a given theme (previously played online by other human players). E.g. for the "eat" theme, the words that would score the most are "drink", "food", "lunch", "diner", "swallow" and "feed". The same 9 theme words are played for all the games, and 5 answers are collected per theme. During the game, our players collaborate to find the best answers and look/question the robot at will. This scenario generates a lot of interaction and social cues; thinking about the theme, brainstorming and debating potential answers, etc. The robot guides them as its human pilot would, and frequently takes part in the conversation. The corpus is both complex and rich in verbal and non-verbal content for the players and the robot (mutual gaze, gaze aversion, speech overlap, backchannel . . . ). Videos, annotations and extra details can be seen online <ref type="bibr" target="#b15">[16]</ref>.</p><p>To ease the post-recording annotations, the two human players are also recorded by two fixed HD cameras (synchronized with the Qualisys capture system). These are not used by the robot nor the pilot, but were helpful in annotating the interaction signals and meaningful events: gaze directed to robot/other player, prephonatory gestures, thinking attitudes... while the first person cameras are low resolution and will exhibit motion blur.</p><p>We also ran OpenFace <ref type="bibr" target="#b16">[17]</ref> to extract head rotations and eye movements as well as FACS Action Units for every player (seen by the HD cameras), giving access to higher-level events (e.g. lip opening for prephonatory gestures).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">What has been annotated</head><p>We use Elan from MPI <ref type="bibr" target="#b17">[18]</ref> to concentrate all the multi-channels audio and video streams in parallel tracks, plus the annotations of the robot streams/motion capture that form the corpus. Figure <ref type="figure" target="#fig_1">2</ref> shows some of the hierarchical annotations of the verbal content. Speech transcriptions: Speech has been transcribed for the pilot as well as the human players. In our specific scenario, some spot-words play specific roles and have been annotated specifically: the themes that the referee gives to the players (known beforehand), and all the words that the players may give as a proposition, or discuss together before making a formal proposition.</p><p>Speech acts of the robot/pilot: We listed 23 different classes for the pilot speech intents, including: ask for a proposition (or a validation), repeat a proposition or the theme again, give the score/the theme/an explanation or feedback, wait for players after each round.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Gazes of the robot/pilot:</head><p>The pilot gaze focal point is computed from the 60 Hz recordings of the pilot's head and eyes movements. After detecting ocular saccades, these points were classified using Gaussian Mixture Models (GMM) into 4 different targets: LeftUser (leftmost player), RightUser (rightmost player), Tablet (live game info), and Elsewhere.</p><p>Gaze of the users: By combining the players' head and eye positions provided by OpenFace, their gaze was classified using GMM (after detection of ocular saccades) according to their three targets: Robot, other User, Elsewhere.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Statistics on the corpus</head><p>Of the 23 recorded sequences, 11 (nearly 4 hours) are fully annotated both verbally and nonverbally. To illustrate the richness and interest of this corpus, this section presents some statistics on the behavior of the pilot and the users, as well as some findings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Verbal statistics:</head><p>As the roles in this corpus are asymmetrical, verbal behavior of the pilot/robot and players differ. As seen in Table <ref type="table" target="#tab_0">1</ref>, the number of utterances is equivalent between the participants, but the average duration of these and therefore the total speaking time differ significantly. Indeed, users produce a lot of backchanneling (few hundred) or positive/negative feedbacks (almost 2,000) to share their reactions to the proposals or scores given, resulting in very short utterances, unlike the pilot, who animates the game and may have to use longer sentences when announcing the theme or scores. Concerning the pilot's intentions, 10 of the 23 occurred for at least 90 utterances. The 3 most frequent are "ask for the validation of a proposal", "give score" and "ask for a proposal", with 574, 367 and 363 utterances respectively. Furthermore, as this corpus is multi-party, the pilot can address one player or both at the same time. This allows the study of the different participant roles in a conversation (speaker, listener, side participant,. . . ), and their regulation, called "Footing" in <ref type="bibr" target="#b18">[19]</ref>. To obtain an indication of who the pilot's addressee(s) are, we detected the use of French pronouns in his sentences: "Tu" (You) for one addressee vs. "Vous" (You) for both. As a result, 172 utterances contain the "Tu" pronoun and 632 the "Vous" pronoun. Players' first names were also used in 139 utterances. This corpus has already been used successfully to compare the different behaviors of the teleoperated robot's head depending on whether he was addressing one or both players <ref type="bibr" target="#b19">[20]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Gaze statistics:</head><p>As gaze is one of the most important non-verbal cues in human and HRI conversations <ref type="bibr" target="#b20">[21]</ref>, this section presents some general statistics on it. First, figure <ref type="figure" target="#fig_2">3</ref> shows the gaze distributions of the pilot/robot and the users for the 11 sequences. Both users are looked at equally by the pilot (there could be some variations inside sequences but not globally). We can also see the significant use of the tablet, which is looked at by the robot/pilot to retrieve needed information. The players have looked at the robot a lot, indicating that they haven't put the robot aside and have included it in the conversation. The Elsewhere class is also well represented, due to the many moments when players need to think.</p><p>As the pilot's behavior can be affected by his role in the conversation, speaker or listener, table <ref type="table" target="#tab_1">2</ref> shows in more detail the distribution of his gaze according to his activity. When he's speaking, he looks more at the tablet that displays information about the game in progress. When one of the players is speaking, the pilot will often look at him/her (green cells), however the proportion of his gaze directed at the other player is not low (yellow cells). The role of game facilitator requires the pilot to observe the reactions of the players and motivate their collaboration, making his behavior a complex one (best generated with machine learning and ad hoc data).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>We have shown here how the immersive teleoperation of a robot can produce a valuable corpus, especially for training social robotics models. In contrast to human-human corpora <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b22">23]</ref>, our data may include the change of expectations and behavior in front of robotic bodies. In addition, this setup provides data from a fluid interaction in HRI, where the robot's behaviors are less constrained than when using usual wizard of Oz or rule-based methods <ref type="bibr" target="#b23">[24,</ref><ref type="bibr" target="#b24">25,</ref><ref type="bibr" target="#b25">26]</ref>. The recorded behaviours are also suitable for studying conversational modes, gaze and head behavior in natural HRI. As an example, we provide RoboTrio2, an 8-hour annotated multimodal corpus of a multi-party game (available online <ref type="bibr" target="#b11">[12]</ref>), and outline here both some of its contents and first results from its analysis.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1:Immersive teleoperation of a robot, to collect natural interaction data of 2 human users in front of an autonomous-looking social robot (and its tablet, using mixed-reality to show in-game help). The table sides also support two HD cameras, directed towards the humans to ease the automatic/manual annotation process.</figDesc><graphic coords="2,173.59,475.47,248.10,134.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Hierarchical annotation of speech intents in Elan. Highlighted instant is scoring the proposition "chimique" (chemical), ranked fifth. Top image pair corresponds to the first-person view of the pilot through the two eyes/cameras of the robot, and shows the virtual tablet with the theme "formule" (formula) being played currently, and the eight best answers (magic, one -as in formula one -, expression, mathematical, chemical, recipe, polite, method). Grid on top right lists the previous/next intents of the robot/pilot in this dialog.</figDesc><graphic coords="4,127.56,336.42,340.15,237.47" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Distributions of the pilot &amp; players gazes</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Verbal statistics of the participants</figDesc><table><row><cell>Stats</cell><cell>Pilot</cell><cell cols="2">LeftUser RightUser</cell></row><row><cell>#Utterrances</cell><cell>2812</cell><cell>2714</cell><cell>2663</cell></row><row><cell cols="2">Mean duration 1.84s</cell><cell>0.91s</cell><cell>0.99s</cell></row><row><cell cols="3">Speaking Time 86min 41min</cell><cell>44min</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Distribution of the pilot's gaze depending on his activity as speaker or as listener.</figDesc><table><row><cell>Pilot's</cell><cell></cell><cell cols="2">Who speaks?</cell></row><row><cell>target</cell><cell>Self</cell><cell cols="2">LeftUser RightUser</cell></row><row><cell>LeftUser</cell><cell>23.6%</cell><cell>50.8%</cell><cell>31.6%</cell></row><row><cell cols="2">RightUser 24.0%</cell><cell>31.4%</cell><cell>51.6%</cell></row><row><cell>Tablet</cell><cell cols="2">39.0% 8.6%</cell><cell>8.8%</cell></row><row><cell cols="2">Elsewhere 6.3%</cell><cell>4.0%</cell><cell>3.9%</cell></row><row><cell>Saccade</cell><cell>7.2%</cell><cell>5.0%</cell><cell>5.1%</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This data collection was funded by a CNRS S2IH PEPS project, involving GIPSA-lab, LPL and INT. We are grateful to our subjects and to the people who contributed to the immersive teleoperation platform (M. Sauze, R. Cambuzat, and C. Plasson) and to the RoboTrio2 recording/annotation (N. Loudjani, O. Granier, and J. Rengot). Part of this work is funded by ANR 19-P3IA-0003 MIAI and a PhD granted by ANRT (2021/0836).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Analysis of role-based gaze behaviors and gaze aversions, and implementation of robot&apos;s gaze control for multi-party dialogue</title>
		<author>
			<persName><forename type="first">T</forename><surname>Shintani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">T</forename><surname>Ishi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ishiguro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th International Conference on Human-Agent Interaction, HAI &apos;21</title>
				<meeting>the 9th International Conference on Human-Agent Interaction, HAI &apos;21<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="332" to="336" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Towards an engagement-aware attentive artificial listener for multi-party interactions</title>
		<author>
			<persName><forename type="first">C</forename><surname>Oertel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jonell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kontogiorgos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">F</forename><surname>Mora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-M</forename><surname>Odobez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gustafson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Frontiers in Robotics and AI</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page">555913</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Optimization and improvement of a robotics gaze control system using lstm networks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Domingo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gómez-García-Bermejo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zalama</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<biblScope unit="volume">81</biblScope>
			<biblScope unit="page" from="1" to="18" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Child-robot interaction across cultures: How does playing a game with a social robot compare to playing a game alone or with a friend?</title>
		<author>
			<persName><forename type="first">S</forename><surname>Shahid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Krahmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Swerts</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers in Human Behavior</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="page" from="86" to="100" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Head pose patterns in multiparty human-robot team-building interactions</title>
		<author>
			<persName><forename type="first">M</forename><surname>Johansson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Skantze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gustafson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Social Robotics: 5th International Conference, ICSR 2013</title>
		<title level="s">Proceedings</title>
		<meeting><address><addrLine>Bristol, UK</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">October 27-29, 2013. 2013</date>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="351" to="360" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Adaptive eye gaze patterns in interactions with human and artificial agents</title>
		<author>
			<persName><forename type="first">C</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Schermerhorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Scheutz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Interactive Intelligent Systems (TiiS)</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1" to="25" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Learning when to listen: Detecting system-addressed speech in human-human-computer dialog</title>
		<author>
			<persName><forename type="first">E</forename><surname>Shriberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Stolcke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hakkani-Tür</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Heck</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">INTERSPEECH</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="334" to="337" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Wizard of oz studies in hri: A systematic review and new reporting guidelines</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Riek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Hum.-Robot Interact</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="119" to="136" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Wizard of oz studies: Why and how</title>
		<author>
			<persName><forename type="first">N</forename><surname>Dahlbäck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jönsson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ahrenberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Conference on Intelligent User Interfaces, IUI &apos;93</title>
				<meeting>the 1st International Conference on Intelligent User Interfaces, IUI &apos;93<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="1993">1993</date>
			<biblScope unit="page" from="193" to="200" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Exploring turn-taking cues in multi-party humanrobot discussions about objects</title>
		<author>
			<persName><forename type="first">G</forename><surname>Skantze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Johansson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Beskow</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI &apos;15</title>
				<meeting>the 2015 ACM on International Conference on Multimodal Interaction, ICMI &apos;15<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="67" to="74" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Multi-party interaction with a robot receptionist</title>
		<author>
			<persName><forename type="first">M</forename><surname>Moujahid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hastie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Lemon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction, HRI &apos;22</title>
				<meeting>the 2022 ACM/IEEE International Conference on Human-Robot Interaction, HRI &apos;22</meeting>
		<imprint>
			<publisher>IEEE Press</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="927" to="931" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">The robotrio2 corpus</title>
		<author>
			<persName><forename type="first">F</forename><surname>Elisei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Haefflinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Prévot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bailly</surname></persName>
		</author>
		<ptr target="TOolsforLANGuage)-www.ortolang" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Data-driven generation of eyes and head movements of a social robot in multiparty conversation</title>
		<author>
			<persName><forename type="first">L</forename><surname>Haefflinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Elisei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bouchot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Varini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bailly</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Social Robotics</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="191" to="203" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">An articulated talking face for the icub</title>
		<author>
			<persName><forename type="first">A</forename><surname>Parmiggiani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Randazzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maggiali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Elisei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bailly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Metta</surname></persName>
		</author>
		<idno type="DOI">10.1109/HUMANOIDS.2014.7041309</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE-RAS International Conference on Humanoid Robots</title>
				<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Immersive Teleoperation of the Eye Gaze of Social Robots Assessing Gaze-Contingent Control of Vergence, Yaw and Pitch of Robotic Eyes</title>
		<author>
			<persName><forename type="first">R</forename><surname>Cambuzat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Elisei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bailly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Simonin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Spalanzani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISR 2018 -50th International Symposium on Robotics</title>
				<meeting><address><addrLine>Munich, Germany</addrLine></address></meeting>
		<imprint>
			<publisher>VDE</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="232" to="239" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Elisei</surname></persName>
		</author>
		<ptr target="https://www.gipsa-lab.grenoble-inp.fr/~frederic.elisei/RoboTrio" />
		<title level="m">Presentation of the robotrio corpus</title>
				<imprint>
			<date type="published" when="2024-01">2024. 1-April-2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">OpenFace: A general-purpose face recognition library with mobile applications</title>
		<author>
			<persName><forename type="first">B</forename><surname>Amos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ludwiczuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Satyanarayanan</surname></persName>
		</author>
		<idno>CMU-CS-16-118</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
		<respStmt>
			<orgName>CMU School of Computer Science</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Annotating multi-media/multi-modal resources with elan</title>
		<author>
			<persName><forename type="first">H</forename><surname>Brugman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Russel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Nijmegen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">LREC</title>
				<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="2065" to="2068" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">E</forename><surname>Goffman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Footing, Semiotica</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="1" to="30" />
			<date type="published" when="1979">1979</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">On the benefit of independent control of head and eye movements of a social robot for multiparty human-robot interaction</title>
		<author>
			<persName><forename type="first">L</forename><surname>Haefflinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Elisei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gerber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bouchot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-P</forename><surname>Vigne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bailly</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Human-Computer Interaction</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Kurosu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Hashizume</surname></persName>
		</editor>
		<meeting><address><addrLine>Switzerland, Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer Nature</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="450" to="466" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Social eye gaze in human-robot interaction: A review</title>
		<author>
			<persName><forename type="first">H</forename><surname>Admoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Scassellati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Hum.-Robot Interact</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="25" to="63" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">4d cardiff conversation database (4d ccdb): A 4d database</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Marshall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">L</forename><surname>Rosin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vandeventer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Aubrey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">of natural, dyadic conversations, Auditory-Visual Speech Processing</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">2015</biblScope>
			<biblScope unit="page" from="157" to="162" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Multimodal human-human-robot interactions (mhhri) dataset for studying personality and engagement</title>
		<author>
			<persName><forename type="first">O</forename><surname>Celiktutan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Skordos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gunes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Affective Computing</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="484" to="497" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">The vernissage corpus: A conversational humanrobot-interaction dataset</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">B</forename><surname>Jayagopi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sheiki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Klotz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wienke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-M</forename><surname>Odobez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wrede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Khalidov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Nyugen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wrede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gatica-Perez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), IEEE</title>
				<imprint>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="149" to="150" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Ue-hri: A new dataset for the study of user engagement in spontaneous human-robot interactions</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ben-Youssef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Clavel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Essid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bilac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chamoux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI &apos;17</title>
				<meeting>the 19th ACM International Conference on Multimodal Interaction, ICMI &apos;17<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="464" to="472" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">The ehri database: a multimodal database of engagement in human-robot interactions</title>
		<author>
			<persName><forename type="first">E</forename><surname>Kesim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Numanoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Bayramoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">B</forename><surname>Turker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Hussain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sezgin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yemez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Erzin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Language Resources and Evaluation</title>
		<imprint>
			<biblScope unit="page" from="1" to="25" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
