=Paper= {{Paper |id=Vol-2676/paper8 |storemode=property |title=Rebo Junior: Analysis of Dialogue Structure Quality for a Reflection Guidance Chatbot |pdfUrl=https://ceur-ws.org/Vol-2676/paper8.pdf |volume=Vol-2676 |authors=Irmtraud Wolfbauer,Viktoria Pammer-Schindler,Carolyn Rose |dblpUrl=https://dblp.org/rec/conf/ectel/WolfbauerPR20 }} ==Rebo Junior: Analysis of Dialogue Structure Quality for a Reflection Guidance Chatbot== https://ceur-ws.org/Vol-2676/paper8.pdf
                    Rebo Junior: Analysis of Dialogue Structure Quality for
                               a Reflection Guidance Chatbot

                       Irmtraud Wolfbauer1 [], Viktoria Pammer-Schindler1,2 [] and Carolyn P. Rose3 []
                                       1 Know-Center GmbH, Inffeldgasse 13, 8010 Graz, Austria

                        iwolfbauer@know-center.at             https://orcid.org/0000-0002-2973-9680
                                     2 Graz University of Technology, 8010 Graz, Austria

                        viktoria.pammer@tugraz.at             https://orcid.org/0000-0001-7061-8947
                                  3Carnegie Mellon University, Pittsburgh PA 15213, USA

                             cprose@cs.cmu.edu            https://orcid.org/0000-0003-1128-5155



                           Abstract. Conversational user interfaces open up new opportunities for reflec-
                           tion guidance. This paper presents a computer-mediated dialogue structure for
                           reflecting on learning tasks, Rebo Junior, and its evaluation in the context of ap-
                           prenticeship training. We answer three research questions. Firstly, how appren-
                           tices react to Rebo Junior; secondly, whether Rebo Junior’s dialogue structure is
                           apt to lead apprentices in reflective conversations; and thirdly, how user engage-
                           ment with Rebo Junior develops over time. During three months, 17 apprentices
                           led 153 reflective conversations with Rebo Junior in the context of a training
                           workshop, 117 in phase one and 36 in phase two of the study (five to thirteen
                           interactions per apprentice). We coded interactions manually for coherence, level
                           of reflectivity, and user engagement. Our results show that apprentices react well
                           to the intervention and that the dialogue structure is successful in leading appren-
                           tices through different levels of reflection (114 out of 153 showed observable
                           reflection on the learning experience; 133 out of 153 expressed learning or
                           planned behaviour change for future tasks). Furthermore, the interactions be-
                           tween the apprentices and Rebo Junior result in coherent conversations (149 out
                           of 153 were coherent). Contrary to expectations, engagement did not decrease
                           over time in either phase. With the present paper, we therefore publish a dialogue
                           structure for reflecting on learning tasks that has worked extremely well despite
                           no adaptivity in the conversational interface. Overall, we interpret the results of
                           our work as underscoring the importance of dialogue structure quality in conver-
                           sational agents.

                           Keywords: learning technology; reflection guidance; dialogue structure; levels
                           of reflection; reflection guidance chatbot; proof of concept evaluation


                   1       Introduction and Learning Context

                   One-on-one reflection with trainers and teachers is unchallenged and not replaceable
                   with conversational agents. However, time with instructors is limited and expensive.
                   For remote locations or in times of quarantine where schools, workshops and factories
                   are closed, it can furthermore be difficult to arrange.




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                   2


                       In this paper we report on the development and evaluation of a computer-mediated
                   dialogue structure for reflecting on learning tasks. The computer-mediated dialogue
                   structure was designed as pre-cursor to an adaptive conversational agent, for whom the
                   reported dialogue structure will act as a default dialogue path. The computer-mediated
                   dialogue structure is called “Rebo Junior”. We understand it as the junior version of the
                   future reflection chatbot “Rebo” because it follows through with its pre-defined ques-
                   tions and does not react to the user’s responses.
                       We have developed and evaluated Rebo Junior in the context of a training workshop
                   for apprentices in electrical engineering, metal and mechatronics. Apprenticeship train-
                   ing for these vocational fields in Austria (similar to Germany and Switzerland) is struc-
                   tured into four years of dual education. Apprentices learn their craft in companies edu-
                   cating apprentices supervised by dedicated apprenticeship supervisors and receive the-
                   oretical education at vocational school for a minimum of five weeks each year. The
                   training workshop we collaborate with is a learning site specially financed by partici-
                   pating organisations in addition to obligatory vocational school. In this training work-
                   shop, the goal is to teach apprentices fundamental practical knowledge and skills they
                   will need in their workplaces, as well as to provide them with fundamental theoretical
                   knowledge, forging links between theory and application. In each year of apprentice-
                   ship training, a pre-defined time is spent in the workshop. The field study described in
                   this paper was conducted with first-year apprentices, who receive a three-month train-
                   ing at the training workshop before starting to work at their respective companies.
                       Within this training workshop, apprentices receive learning tasks from their trainers.
                   These learning tasks are designed to correspond to the apprentices’ currently expected
                   level of skill and to resemble future workplace tasks. In this learning context, it is the
                   role of Rebo Junior to reflect with each apprentice individually after each practical
                   learning task on how the task went as well as on insights gained and lessons learned for
                   the future. The goals of reflection are to support learning in the domain and to help
                   students improve their ability to reflect, which is considered an important competence
                   in lifelong learning. An example of a learning task is the following: “Produce the work-
                   piece according to the plan. Pay attention to measurements and timing. All measure-
                   ments in the plan are assessed according to given general tolerance and deviation
                   thereof. (Plans of how to cut materials and assemble them into a pyramid attached)”.
                       In the field study described here, we evaluate the concept of Rebo, the reflection
                   guidance chatbot and study user engagement and dialogue structure quality. This paper
                   contributes to the existing body of literature evidence about user engagement with a
                   non-adaptive computational dialogue structure throughout 12 weeks (5-13 interactions
                   per apprentice); and a dialogue structure for reflecting on learning tasks.


                   2       Related Work

                   2.1     Reflection

                   By reflection we mean systematic review of past experiences with the goal to learn [1].
                   Reflection works on different levels (Table 1): Learners remember an experience and
                   think about it carefully. Perceived emotions are attended to [1], the learning experience




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                  3


                   is pondered and evaluated, and eventually, the focus is rearranged from retrospective
                   to the future. Learners identify the implications of the experience on future planning
                   and gain new perspectives that, in some cases, affect personal concepts and goals [2–
                   4]. In formal learning environments, reflection helps students to monitor and direct their
                   own learning [1]. In informal learning environments, such as working environments,
                   reflection helps learners to learn from and in relationship to ongoing experience without
                   a dedicated teacher. This emphasizes the importance of reflection for learning for pro-
                   fessionals [5]. Constructive, goal-driven reflection is a deliberate action [1] that can be
                   facilitated by reflection guidance technologies [6–8].

                                                    Table 1. Levels of Reflection

                         Level      Name                 Description
                           0        Revisiting           Returning to learning experience
                           1        Description          Describing learning experience
                           2        Judgement            Did it go well? Why/Why not?
                           3        Emotions             What did it feel like? Did you enjoy it?
                           4        Learning             Evaluate experience - What did you learn?
                           5        Planning             Behaviour change – And next time?

                   In the context of our use case, reflection serves as a means for apprentices to engage in
                   a guided manner with past experiences, such as their theoretical and practical lessons
                   as well as their implementation of learning tasks. We want to improve these learning
                   experiences through reflection. Additionally, guided reflection is intended as training
                   in reflection as an important mechanism for lifelong professional learning.


                   2.2     Conversational Agents for Learning and Reflection Guidance

                   In comparison to existing literature on reflection guidance, the dialogue structure pre-
                   sented here is new as it provides guidance through different levels of reflection, whereas
                   prior literature has focussed on studying isolated reflection prompts [3, 6, 8]. Further-
                   more, apprentices are practically not represented in current literature on technologies
                   for learning. Apprenticeship training is situated in the overlap between the informal
                   learning environment of workplace learning and the formal, educational setting of vo-
                   cational school and trainings. Existing studies on computer-mediated reflection focus
                   on school students [9], university students [7] and professionals [8].
                       Conversational agents in turn have so far been shown to foster the acquisition of
                   factual knowledge (e.g. [10, 11]), to improve text comprehension by scaffolding self-
                   explanation (e.g. [10, 12]), and to facilitate collaborative learning based on collabora-
                   tion scripts [13]. However, they have not been used to mediate reflection yet, and they
                   are typically not used in repeated interactions.
                       In principle, we expect high motivation to reflect with a chatbot because it gives the
                   illusion of a listener [14] and relationships play a critical role in learning [15]. The
                   effect that people prefer interacting with chatbots to other forms of computer-mediated




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                   4


                   learning interventions has also been observed in prior research. For example, Ruan et
                   al. [11] showed that students who interact with a dialogue-based agent to acquire factual
                   knowledge displayed more motivation than students interacting with a more traditional
                   computer-mediated learning app. This increased motivation also led to better learning
                   results.
                       There are, however, very few studies on long-term interactions with conversational
                   agents. Lee et al. [16] created a chatbot to foster self-compassion with which the par-
                   ticipants had daily interactions over 2 weeks, which means that each user had 14 inter-
                   actions with the agent. Upholding user engagement is a key point of interest for chatbot
                   research [17]. On the one hand, it is essential to keep the user interested during the
                   interaction with the agent, as expressed by competitions such as the Alexa Prize [18],
                   where keeping users engaged and interested is the goal. On the other hand, the user’s
                   engagement has to be upheld over longer timespans when repeated interactions with
                   the agent are planned. With the here presented research, we contribute a field study
                   with repeated chatbot interactions over three months to the existing body of knowledge.


                   3       Research Questions

                   We address the following research questions:


                   RQ1. How do apprentices react to and accept Rebo Junior as reflection guidance?
                      We understand a positive reaction to a learning intervention as prerequisite for learn-
                   ing [1, 19].


                   RQ2. How apt is Rebo Junior’s dialogue structure to lead reflective conversations with
                   apprentices?
                      Due to the novelty of conversational reflection guidance, this is a major research
                   question. We understand the suitability of the default dialogue structure to be a prereq-
                   uisite baseline for an adaptive conversational agent. This needs to be explored in real-
                   world learning contexts within specific and situated reflective conversations.


                   RQ3. How does apprentices’ engagement with Rebo Junior develop over time?
                   Repeated and long-term interactions with conversational agents are understudied and
                   at the same time, user engagement is crucial for learning. Our initial assumption was
                   that engagement would decrease over time [20].




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                  5


                   4       Designing Rebo Junior




                   Fig. 1. Rebo – the reflection guidance chatbot: This is the design for the image of Rebo, the
                   reflection guidance chatbot. Rebo Junior, the computer-mediated dialogue structure described in
                   this paper, has the same visual appearance as Rebo, since Rebo Junior will successively change
                   into adaptive, more intelligent versions of Rebo.


                   4.1     Designing Rebo’s Appearance

                   Rebo’s appearance evolved in a three-cycle iterative design process that aimed to make
                   Rebo engaging and likeable. We aimed for engaging and likeable as these are under-
                   stood to be prerequisites for users wanting to talk to an agent [21, 22].
                      Cycle 1. Based on a literature survey, the following initial requirements for Rebo
                   were defined. Rebo needs to look nice and sympathetic, so people want to talk to him.
                   Since social cues were found to be important for motivating users to engage with con-
                   versational agents [23] and users tend to prefer visual appearances of chatbots that cor-
                   respond to the gender stereotypically associated with the task at hand [24], Rebo is
                   referred to as “he”. He needs to look like he is able to communicate (listen, see, talk),
                   but he cannot express emotions because, for instance, a happy face is not suitable for
                   leading a reflective conversation on a bad learning experience. Based on the target au-
                   dience, Rebo should look cool for young people interested in metal and electronics.
                   Literature suggests that he should not try to appear too human because that could trigger
                   the so-called ‘uncanny valley effect’ in users and make Rebo seem spooky [21]. Based
                   on these requirements, 10 first design ideas for Rebo were sketched out.
                      Cycle 2. In the second cycle, these ideas were shown to eight people. We settled on
                   the one design that nobody had any objections to, as rejection caused by negative feel-
                   ings outweighs acceptance by positive reaction [25].
                      Cycle 3. The starting point was, once again, a literature survey. It was found that
                   people feel more inclined to talk to chatbots if they perceive them as high-quality arte-
                   facts [22]. Accordingly, we adapted the design to make it appear high-tech and added
                   some shine and sparkle (Figure 1). The target user group unanimously reacted posi-
                   tively and called Rebo “cool”, so the design was kept.




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                   6


                   4.2     Dialogue Design

                   We have synthesized Boud et al.’s conceptual understanding of reflection as learning
                   mechanism that relates past experiences to future, different experiences [1]; and Fleck
                   & Fitzpatrick’s model of different levels of reflection [26] into a hierarchical view of
                   moving through different perspectives on the past towards learning for the future (Table
                   1). This hierarchical view underlies our design of a dialogue structure.
                      This dialogue structure is intended to actively guide learners from one level of re-
                   flection to the next (Table 2). Our goal is to make sure the learners work through one
                   level after the other. Lower stages of reflection were found to be prerequisites for higher
                   stages in some cases [26], so we want to make sure not to skip a level.

                                  Table 2. Rebo Junior Addresses Subsequent Levels of Reflection

                      Level             Rebo Junior’s Reflective Question
                    0 Revisiting        Achieved outside Rebo Junior via upload of task
                    1 Description       documentation.
                    2 Judgement         How was this task for you? Did everything go well?
                    3 Emotions          Did you have fun with this task? Why/Why not?
                    4 Learning          What tip could you give to a younger apprentice
                                        who performs a task like that for the first time?
                    5 Planning          What will you pay special attention to when you
                                        perform a similar task again?

                   The dialogue structure works through the presented levels of reflection as follows. Ap-
                   prentices return to the experience by accessing the learning platform and viewing the
                   task descriptions before uploading their solutions. The tasks include a description of
                   the performed work or documentation in form of a photograph or video. Therefore,
                   levels 0 and 1 are attended to by uploading the solution to the assigned task and Rebo
                   Junior addresses levels two to five through pre-defined questions (Table 2).


                   5       Method - Evaluation in a Field Study

                   5.1     Study Participants

                   Rebo Junior has been evaluated in a field study with all 18 apprentices in the cohort of
                   1st year apprentices in the training workshop.


                   5.2     Procedure – Using Rebo Junior in the Context of a Practical Learning
                           Task

                   An essential part of apprentices’ practical education in the training workshop are learn-
                   ing tasks set by their trainers, where they have to produce a workpiece largely inde-
                   pendently and document it digitally (e.g. photograph, video, written documentation).




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                  7


                   They upload this documentation to a Moodle-based learning platform1. Subsequently,
                   the apprentices are directed to Rebo Junior, which is integrated within Moodle, and are
                   guided in reflection on their learning experience.
                      The apprentices’ first interaction with Rebo Junior took place in a workshop setting
                   with one of the authors of this paper. The learning platform was introduced, apprentices
                   worked on their first tasks, documented the completion of their tasks, uploaded task
                   descriptions, and then interacted with Rebo Junior. Directly afterwards, apprentices
                   gave their first reaction and feedback in a focus group as a first measure of reaction
                   (RQ1).


                   5.3     Repeated Interactions in Two Field Study Phases

                   The first phase, consisting of the first four weeks of the field study, is characterised by
                   tightly spaced, static interactions with Rebo Junior. All apprentices were present at the
                   training workshop and had daily training where they received practical learning tasks
                   on a regular basis and reflected on them with Rebo Junior. The apprentices had five to
                   nine interactions with Rebo Junior in this phase, 117 altogether.
                      Phase two, the following eight weeks, is more differentiated and explorative. Inter-
                   actions with Rebo Junior are more widely spaced because the apprentices received their
                   training in subgroups according to their different professions. Some of these training
                   sequences took place in other locations than the training workshop, where apprentices
                   did not work with the learning platform and with Rebo Junior. In this phase, apprentices
                   also use a version of Rebo Junior that is able to (randomly) vary verbalisations for each
                   reflection level (levels shown in Table 1). Our initial assumption was that engagement
                   would get lower towards the end of phase one and even more so in phase two because
                   repeated interaction with the agent has been found to produce that effect [20]. It has
                   been hypothesised that varying verbalisations are a means to keep up engagement [27];
                   therefore, we assembled pools of reflective questions for each reflection level and ran-
                   domly picked a different question for each conversation. These question pools were
                   generated in two workshops, one with colleagues within the research team and one with
                   trainers of the training workshop.


                   5.4     Data Collection

                   We collected data by observing the first learning task including apprentices’ interac-
                   tions with Rebo Junior, and in the focus group directly afterwards. Furthermore, we
                   analysed the content of all interactions between the apprentices and Rebo Junior.


                   5.5     Analysis
                   All interactions with Rebo Junior were coded for the aspects coherence, reflection depth
                   and engagement. As for coherence as a semantic property of discourses [28], the ra-
                   tionale for using this concept was that interactions with Rebo Junior are intended to be

                   1 https://abvdigital.know-center.tugraz.at




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                   8


                   conversations and therefore have to be coherent. The coding was either 0 (not coherent)
                   or 1 (coherent). As for reflection depth (Table 3), all dialogues were coded according
                   to [29], with 1 (Provision and description of experience), 2 (Reflection on experiences)
                   and 3 (Learning or change). Two researchers coded for coherence and reflectivity, with
                   an inter-rater reliability of 100% for coherence and 97% for reflectivity. We therefore
                   used coherence and reflective depth of recorded conversations as operative measures
                   of how apt Rebo Junior’s dialogue structure is to lead reflective conversations (RQ2).
                   As for engagement (RQ3), we here differentiate between engaged conversations (2),
                   conversations with low engagement (1), where apprentices reacted to Rebo Junior but
                   showed no inclination to be cooperative, and conversations with missing engagement
                   (0), in which Rebo Junior was ignored.


                   6       Results

                   The apprentices’ feedback after interacting with Rebo Junior for the first time was cap-
                   tured in a focus group. The results are very positive: 17 out of 18 (94%) apprentices
                   liked interacting with the dialogue structure, and 7 out of 10 who commented on per-
                   sonal gain (70%) see benefit in the guided reflection. Apprentices found that interacting
                   with Rebo Junior “was almost like a real talk2” They commented that it was “really
                   cool that Rebo had a real conversation with you3”. Some apprentices also compared the
                   conversational agent with traditional reflection prompts, such as an empty textbox to
                   fill in or when a teacher gives you a sheet of paper to write down your thoughts, and
                   liked him better. The quality of the following interactions over three months, concern-
                   ing reflectivity, engagement, as well as the tone of the conversation, further indicate a
                   positive reaction towards and overall acceptance of Rebo Junior.
                       In the course of our three-month field study, 153 reflective dialogues between the
                   apprentices and Rebo Junior were coded for reflectivity, coherence and engagement.
                   One of the apprentices quit apprenticeship training after their first interaction with Rebo
                   Junior, so we excluded the apprentice from the analysis of the resulting reflective dia-
                   logues. The remaining 17 apprentices had 164 interactions with Rebo Junior (between
                   five and 13 per apprentice). 11 interactions had to be removed because of technical
                   problems, so the total number of valid interactions is 153. Of these, 117 are in phase 1,
                   and 36 in phase 2 of the field study.




                   2   Verbatim quote: Ja, fast wie so ein Gespräch mit dir geführt, er hat dich auch so Sachen ge-
                       fragt. Ja, das war gut!
                   3   Verbatim quote: Ich habe extrem cool gefunden, dass er so einen richtigen Dialog mit einem
                       geführt hat.




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                  9




                   Fig. 2. Coherent and reflective dialogue with Rebo Junior, translated from German. In this dia-
                   logue with Rebo Junior, the apprentice successfully reflected on their learning experience.

                   Figure 2 shows a dialogue which was coded as coherent and highly reflective (levels 2
                   and 3); the apprentice engages in the conversation, thinks about their learning experi-
                   ence and gives adequate answers. Figure 3 shows a dialogue which was coded as co-
                   herent but not reflective (on stages two and three); the apprentice does not really engage
                   in conversation with Rebo Junior but gives very short, non-reflective answers. It could
                   furthermore be observed that with passing time, the answers of apprentices to Rebo
                   Junior’s questions are generally getting shorter, sometimes only existing of keywords
                   instead of full sentences, thus less and less resembling human dialogue.




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                   10




                   Fig. 3. Dialogue showing missing engagement, translated from German. In this dialogue with
                   Rebo Junior, the apprentice did not successfully reflect on their learning experience because they
                   did not engage in the conversation.

                   In phase one, nearly all interactions (116 out of 117) were coherent conversations, in
                   the sense of a meaningful sequence of question, answer, and follow-up question. This
                   despite the fact that Rebo Junior, just being a computational interface to a static dia-
                   logue structure, does not adapt responses to user statements. The first level of reflection,
                   description, was reached in all interactions because all apprentices needed to upload a
                   description of their learning task prior to reflecting with Rebo Junior. Level two, reflec-
                   tion, was reached in 89 interactions (76%) and level three, learning or change, in 109
                   interactions (93%) (Table 3). Four interactions (3%) reached only stage one because of
                   missing user engagement. Three out of the four interactions where apprentices did not
                   engage in reflection were still coherent conversations.
                      Of the 36 valid interactions with Rebo Junior in phase two with randomly picked
                   questions for each level of reflection, 33 were coherent conversations. In the three cases
                   where the resulting dialogue was not coherent, missing engagement was the reason.
                   The first reflection level, description, was reached in all interactions, as explained
                   above. Level two, analysis of the learning experience, was reached in 25 interactions
                   (69%) and level three, learning or change, in 24 interactions (65%) (Table 3). Seven
                   interactions (19%) reached only level one, six of them due to missing user engagement.




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                 11


                               Table 3. Coding Interactions between the apprentices and Rebo Junior

                                                                          Phase 1 Phase 2                 Overall
                    Concept         Description
                                                                          117 int. 36 int.               153 int.
                    Coherence* 0: Incoherent, sequence makes no sense 1 (1%) 3 (8%)                      4 (3%)
                                  1: Coherent, given answers and follow- 116        33                     149
                                  ing questions match                      (99%) (92%)                    (97%)
                    Stage of      1: Provision and description of experi- 117       36                     153
                    Reflec-       ence                                    (100%) (100%)                  (100%)
                    tion**        2: Reflection on experiences, including    89     25                     114
                                  analysis and potential solutions         (76%) (69%)                    (75%)
                                  3: Learning or change                     109     24                     133
                                                                           (93%) (65%)                    (87%)
                   Engagement 0: Missing engagement                       3 (3%) 4 (11%)                 7 (5%)
                                  1: Low engagement                       3 (3%) 6 (17%)                 9 (6%)
                                  2: Engaged                                111     26                     137
                                                                           (95%) (72%)                    (90%)
                   * Inter-coder agreement: 100% ** Inter-coder agreement: 97%

                   For both reflectivity and user engagement, a Chi2 test shows a significant drop in phase
                   two as compared to phase one. The effect is moderate for reflectivity (Chi2=7.680;
                   p=0.021; Cramer's V: 0.232) and considerable for engagement (Chi2= 15.28; p<0.001;
                   Cramer's V= 0.316).


                   7       Discussion

                   Rebo Junior is a very successful intervention, in that it has been well received and ap-
                   prentices to a great extent led reflective conversations with him. In almost all cases,
                   apprentices stayed engaged with Rebo Junior throughout repeated interactions (five to
                   13 interactions per apprentice over three months), despite the non-adaptiveness of the
                   dialogue structure. This is encouraging for ongoing research on conversational agents
                   for learning, knowing that a positive disposition towards the intervention and continu-
                   ous engagement is important for learning [1, 17, 19].
                      Our results also show that the dialogue structure encoded in Rebo Junior successfully
                   facilitates and guides reflection. Those apprentices who engage in a conversation with
                   Rebo Junior are able to reflect on multiple levels, the resulting reflective dialogues
                   throughout portray successful reflection. This validates the quality of the dialogue
                   structure and the initial assumption that engaged learners can be guided by Rebo Junior
                   towards higher levels of reflection.
                      Despite our initial assumptions, we did not see engagement as decreasing over time
                   when regarding the two phases of the field study separately. Factors positively influ-
                   encing this continued engagement may be that in repeated interactions the learning task
                   on which apprentices reflect is different every time, designed by their workshop trainers
                   to match the apprentices’ current knowledge and skills. It could be that the dialogue




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                   12


                   structure working through the levels of reflection in the same order every time was
                   more comforting in its familiarity than boring due to repetition. However, we saw less
                   engagement in interactions with Rebo Junior with different verbalisations.
                      It is difficult to isolate reasons for the drop in engagement and the lower reflectivity
                   of the resulting dialogues in phase two (varying verbalisations); there are numerous
                   influencing factors. Firstly, the training setting was different than in phase one for a
                   considerable number of apprentices and became altogether more fragmented. Practical
                   learning tasks were scarcer, locations of education varied, and instructors changed for
                   periods outside the training workshop. Secondly, Rebo Junior’s question pools were
                   introduced and each conversation varied. Contrary to an earlier voiced assumption [27],
                   such varying verbalisation may have been more impeding by introducing unpredicta-
                   bility than helping by introducing welcome change. Concerning reflectivity, it should
                   also be taken into account that the default dialogue structure was very elaborate and
                   had been developed over weeks, whereas the alternative questions for phase two were
                   generated in a workshop setting with the aim to provide variability. Therefore, it is also
                   possible that not all questions aim as clearly at a specific reflection level while being
                   open enough not to permit single-word answers; in other words, that the concrete dia-
                   logue structures in the interactions were simply not as good as in phase one. Overall,
                   we interpret the results concerning the drop in reflectivity as emphasising the im-
                   portance of careful wording for reflection prompts, especially in conversational reflec-
                   tion guidance.


                   8       Conclusion

                   We do not envision conversational reflection guidance to replace human teachers. On
                   the contrary, we fully expect that human teachers will remain unchallenged by technol-
                   ogy in principle. However, conditions for human teachers are not always optimal: time
                   is scarce, there are often more students per teacher than would be ideal, and circum-
                   stances can prevent teachers and students from getting together. In the kind of voca-
                   tional settings studied here, the supervisors who train apprentices in their respective
                   companies also have to consider work performance in parallel to apprentices’ learning.
                   In all these cases, variants of intelligent tutoring systems may be helpful. Our overall
                   research goal is therefore to develop conversational reflection guidance that can help
                   apprentices to learn how to reflect by pre-structuring reflections, and to learn better
                   within their domain through reflection. As existing computational reflection guidance
                   is mainly based on single prompts or essay writing, conversational guidance is a valu-
                   able contribution to the field. Such conversational guidance would in principle be ex-
                   pected to be successful, as natural language conversation is the way humans interact
                   with each other, and especially in reflection, conversations are the traditional way a
                   human teacher would instruct a student. With the present paper, we publish and posi-
                   tively evaluate a dialogue structure for reflecting on learning tasks. Further, we interpret
                   the results of our exploration of varying verbalisation to underscore the importance of
                   exact phrasing to fully exploit dialogue structure quality.




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                  13


                       As limitation of the present work, and direction for future work, we see that our
                   analysis of dialogue quality has so far been limited to the dialogic level without con-
                   sidering the content level. In other words, our analysis focusses on what the conversa-
                   tional agent is capable of per design: to structure reflection. This focus of analysis is in
                   line with existing research on reflection analytics [30, 31]. We plan a follow-up study
                   in order to investigate the depth of reflection with respect to correctness and appropri-
                   ateness of insight within the learning domain, and in relation to the apprentices’ ex-
                   pected competence levels. We are especially interested in complementing automatic
                   analyses of reflectivity (reflection analytics) with such a domain-specific dimension.
                   This would have implications for research on reflection analytics, complementing ex-
                   isting research on reflection analytics [30, 31] in two regards: Firstly, extending from
                   analysing reflective essays and statements towards analysing conversations, and sec-
                   ondly, extending from a structural assessment of reflectivity towards including a con-
                   tent-related assessment.


                   Acknowledgements

                   This work has partially been funded by the WKO; and within the Austrian COMET
                   Program – Competence Centers for Excellent Technologies – under the auspices of the
                   Austrian Federal Ministry of Transport, Innovation and Technology, the Austrian Fed-
                   eral Ministry of Economy, Family and Youth and by the State of Styria. COMET is
                   managed by the Austrian Research Promotion Agency FFG. This work was also funded
                   in part by NSF grant IIS 1822831.


                   References
                    1. Boud, D., Keogh, R., Walker, D.: Reflection. Turning experience into learning, London,
                       New York (1985).
                    2. Carrol, M.: Levels of reflection: on learning reflection. Psychotherapy in Australia 16 (2010)
                    3. Renner, B., Prilla, M., Cress, U., Kimmerle, J.: Effects of Prompting in Reflective Learning
                       Tools: Findings from Experimental Field, Lab, and Online Studies. Frontiers in Psychology
                       (2016).
                    4. Wood, D.: Learning from experience through reflection. Organizational Dynamics (1996).
                    5. Pammer, V., Krogstie, B., Prilla, M.: Let’s Talk About Reflection at Work. International
                       Journal of Technology Enhanced Learning (2015).
                    6. Ifenthaler, D.: Determining the effectiveness of prompts for self-regulated learning in prob-
                       lem-solving scenarios. Educational Technology & Society 15, 38–52 (2012).
                    7. Verpoorten, D., Westera, W. & Specht, M.: Reflection amplifiers in online courses: a clas-
                       sification framework. Journal of Interactive Learning Research 2011, 167–190 (22).
                    8. Fessl, A., Wesiak, G., Rivera-Pelayo, V., Feyertag, S., Pammer, V.: In-App Reflection Guid-
                       ance: Lessons Learned Across Four Field Trials at the Workplace. IEEE Trans. Learning
                       Technol. (2017).
                    9. Kovanović, V., Joksimović, S., Mirriahi, N., Blaine, E., Gašević, D., Siemens, G., Dawson,
                       S.: Understand students' self-reflections through learning analytics. In: Proceedings of the
                       8th Int. Conf. on Learning Analytics & Knowledge, Sydney. ACM, New York. (2018).




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                   14


                   10. Graesser, A. C., VanLehn, K., Rose, C., Jordan, P., Harter, D.: Intelligent Tutoring Systems
                       with Conversational Dialogue. AI magazine 22, 39 ff (2001).
                   11. Ruan, S., Jiang, L., Xu, J., Tham, B.J.-K., Qiu, Z., Zhu, Y., Murnane, E., Brunskill, E.,
                       Landay, J.: QuizBot: A Dialogue-based Adaptive Learning System for Factual Knowledge.
                       In: CHI 2019, May 4-9, Glasgow, Scotland, UK.
                   12. Graesser, A.C., McNamara, D.S., VanLehn, K.: Scaffolding Deep Comprehension Strate-
                       gies Through Point&Query, AutoTutor, and iSTART. Educational Psychologist (2005).
                   13. Adamson, D., Dyke, G., Rosé, C.: Towards an Agile Approach to Adapting Dynamic Col-
                       laboration Support to Student Needs. International Journal of AI in Education (2014).
                   14. Knights, S.: Reflection and Learning: The Importance of a Listener. In: Boud et al. (1985).
                       Reflection: Turning Experience into Learning, pp. 85–90. RoutledgeFalmer: London, NY.
                   15. Eraut, M.: Informal learning in the workplace. Studies in Continuing Education (2004).
                   16. Lee, M., Ackermans, S., van As, N., Chang, H., Lucas, E., IJsselsteijn, W.: Caring for Vin-
                       cent. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems:
                       Glasgow, UK. ACM, New York (2019).
                   17. Shum, H.-y., He, X.-d., Di Li: From Eliza to XiaoIce: challenges and opportunities with
                       social chatbots. Frontiers of Information Technology & Electronic Engineering, 19(1), 10-
                       26 (2018).
                   18. Alexa Prize, https://developer.amazon.com/alexaprize (2020), last accessed 27 April 2020.
                   19. Kirkpatrick, D.L., Kirkpatrick, J.D.: Evaluating training programs. The four levels, 3rd edn.
                       Berrett-Koehler, San Francisco (2010)
                   20. Kocielnik, R., Xiao, L., Avrahami, D., Hsieh, G.: Reflection Companion. Proc. ACM Inter-
                       act. Mob. Wearable Ubiquitous Technol. (2018).
                   21. Ciechanowski, L., Przegalinska, A., Magnuski, M., Gloor, P.: In the shades of the uncanny
                       valley: An experimental study of human–chatbot interaction. Future Generation Computer
                       Systems (2019).
                   22. Zamora, J.: I'm Sorry, Dave, I'm Afraid I Can't Do That. In: Proceedings of the 5th Int. Conf.
                       on Human Agent Interaction, Bielefeld, pp. 253–260. ACM Press, New York. (2017).
                   23. Feine, J., Morana, S., Maedche, A.: Designing a Chatbot Social Cue Configuration System.
                       In: 40th International Conference on Information Systems (2019).
                   24. Zimmerman, J., Forlizzi, J., Evenson, S.: Research through design as a method for interac-
                       tion design research in HCI, pp. 493–502. ACM, New York (2007)
                   25. Baumeister, R.F., Bratslavsky, E., Finkenauer, C., Vohs, K.D.: Bad is stronger than good.
                       Review of General Psychology (2001).
                   26. Fleck, R., Fitzpatrick, G.: Reflecting on reflection. In: Proceedings of the 22nd Conf. of the
                       Computer-Human Interaction Special Interest Group of Australia. ACM, New York (2010).
                   27. Kocielnik, R., Hsieh, G.: Send Me a Different Message: Utilizing Cognitive Space to Create
                       Engaging Message Triggers. In: Proceedings of the 2017 ACM Conference on Computer
                       Supported Cooperative Work and Social Computing (2017)
                   28. van Dijk, T.: Text and Context. Explorations in the Semantics and Pragmatics of Discourse.
                       Longman Linguistics Library (1977).
                   29. Prilla, M., Renner, B.: Supporting Collaborative Reflection at Work. In: Proceedings of the
                       18th ACM Int. Conf. on Supporting Group Work. 182–193. ACM, New York (2014).
                   30. Cui, Y., Wise, A.F., Allen, K.L.: Developing reflection analytics for health professions ed-
                       ucation: A multi-dimensional framework to align critical concepts with data features. Com-
                       puters in Human Behavior (2019).
                   31. Ullmann, T.D.: Automated Analysis of Reflection in Writing: Validating Machine Learning
                       Approaches. International Journal of Artificial Intelligence in Education (2019).




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).