<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Colton Botta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Avi Segal</string-name>
          <email>avise@post.bgu.ac.il</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kobi Gal</string-name>
          <email>kobig@bgu.ac.il</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Contextual Multi-Armed Bandit</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Interventions</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Incentives</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Edinburgh</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Online systems utilize user data, such as demographics, past performance, preferences and skillset to construct an accurate model of users and maximize personalization. Some of these user features are “shallow” traits which seldom change (e.g. age, race, gender) while others are “deep” traits that are more volatile (e.g. performance, goals, interests). In this work, we explore how reasoning about this diversity of user features can enhance performance of personalized systems. By modeling the personalization process as a Reinforcement Learning (RL) problem, we introduce Diversity Aware Bandits for Intervention Personaliztion (DABIP), a novel contextual multi-armed bandit algorithm that leverages the dynamics within user features to cluster users while maximizing outcomes. We demonstrate the eficacy of this approach using two real world datasets from diferent domains.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction</p>
      <p>
        © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
bandit algorithm, as a baseline in two domains. Our results show that DABIP achieves a higher
average reward than LOCB in each domain when predicting intervention outcomes per user.
2. Background
We give an overview of CMAB algorithms and diversity.
2.1. Contextual Multi-Armed Bandits
Contextual Multi-Armed Bandit (CMAB) is an extension of the Multi-Armed Bandit (MAB)
problem where, at each timestep, the agent is presented with a list of arms (actions) and a
context vector (additional data) about the environment. The agent needs to select and perform
a single action. The agent then receives a reward for that arm only. Over time, the agent learns
the underlying reward distribution of each arm and how that distribution is influenced by the
context, and endeavors to maximize the total reward received over time [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. One recent work
introduced the Local Clustering in Bandits (LOCB) algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] which implemented a “soft”
clustering approach, by which users are clustered together if their preferences are within a
certain threshold of each other.
2.2. Diversity
The existence of diferences between humans in a group is one notion of diversity [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], with these
diferences often falling into two distinct categories: surface-level diferences and deep-level
diferences [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Surface-level diferences include, for example, age, sex, ethnicity, and race
and are generally defined by their low-dynamics and ability to be observed immediately [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Deep-level diferences, on the other hand, may include skills, values, preferences, and desires.
These are more volatile and can only be observed through prolonged interaction between people
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. One example of the importance of this classification is highlighted by the WeNet project,
which places human diversity at the center of a new machine mediated paradigm of social
interactions [
        <xref ref-type="bibr" rid="ref6 ref9">9, 6</xref>
        ].
3. DABIP
We now describe a Diversity Aware Bandit for Intervention Personalization algorithm.
3.1. Problem Definition
Let  = {1, ..., } represent a set of n total users and  = 1, ...,  represent a sequence of timesteps.
At timestep, t, a user,   , is drawn such that   ∈  . Alongside   , the agent receives the context,
  = { 1, ,  2, , ...,  , } with one context vector for each of k arms and each context vector having
dimension d such that  , ∈ ℝ . The agent chooses one arm  , , to recommend to   and receives
reward   in return. We assume that each user is associated with an unknown bandit parameter
 , that describes how   interacts with the environment and can be thought of as a representation
of how user   behaves [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. As in previous bandit settings [
        <xref ref-type="bibr" rid="ref10 ref11 ref4">10, 4, 11</xref>
        ], the goal is to minimize the
total regret,   given by:
      </p>
      <p>= ∑[ ,</p>
      <p>( 
=1
 , ∈  
,
 , ) −  ,   ]
(1)
where, at each round,  , we compute the regret by taking the reward achieved from the best
possible arm choice,  , , and subtracting the reward achieved from the agent’s chosen arm,   .
We also assume that each user, i, has a set of features, F, of length q such that at any time, t,
there exists  , = { ,1, ,  ,2, ...,</p>
      <p>
        ,, }.
3.2. DABIP Algorithm
The algorithm has three main steps: (1) Calculate the underlying feature dynamics of all users
over time, (2) Form clusters of users with similar feature dynamics, then (3) Utilize the clusters
and past user performance to personalize interventions to users. The full details of the algorithm
are given in Appendix A.
4. DABIP Performance in Multiple Domains
We apply the DABIP algorithm to two datasets from two diferent domains.
4.1. Eedi Dataset
Eedi1 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] dataset includes over 17 million interactions of students answering multiple choice
questions. It provides interaction logs of the student ID, question ID, student answer (range a-d),
and the correct answer (range a-d). Every question has an associated list of features including a
question ID, and a list of subject IDs. Every student has an associated list of features including
gender, date of birth etc.
4.2. WeNet Dataset
The WeNet dataset includes 6600 interactions of users participating in WeNet’s Ask4Help
pilot [
        <xref ref-type="bibr" rid="ref13 ref14 ref9">9, 13, 14</xref>
        ]. Users participated in asking and answering questions, while receiving one
of 4 diferent interventions messages that encourage their participation. The dataset provides
interaction logs of the user ID, intervention messages ID, user activity level following the
intervention. Additionally, every user has an associated list of features including location, big-5
characteristics, music and sports preferences, and past activity in the app. Finally, a binary label
is computed for each intervention denoting if user activity post intervention surpassed a given
threshold (median over post intervention activities).
(a) Educational Dataset
(b) WeNet Dataset
4.3. Experiments
We apply DABIP to both domains. In the educational domain, the algorithm chooses personalized
mathematics questions, based upon past student performance, that are likely to be answered
correctly by the student. In the WeNet domain, the algorithm chooses, based upon users’
past behaviour, personalized interventions that are likely to increase users’ future engagement
beyond a median based threshold. We compared DABIP to the LOCB baseline on both datasets.
LOCB is available in open source 2 which we extended and adapted to operate on our datasets.
5. Results and Analysis
We compare the performance of DABIP and LOCB on the two datasets. As shown in Figure 1a,
DABIP outperforms the LOCB baseline by about 25% on the education dataset. The DABIP-Dyn
approach uses only the deep diversity features and shows comparable results to DABIP for this
dataset. For the WeNet dataset, DABIP outperforms LOCB by about 30%. Additionally, DABIP
demonstrates an improvement of more than 75% when compared to a random approach which
chooses interventions randomly.
      </p>
      <p>Our results show that identifying and extracting feature dynamics can improve RL algorithm
performance, harnessing human diversity proxy information. We argue that identifying the
highly dynamic features allows DABIP to search the space of context-reward associations
more completely and more quickly, thus leading to better reward. This theory requires further
testing, but the results of applying DABIP to real data are promising, and further research into
augmenting our clustering approach is planned for the future.
2https://github.com/banyikun/LOCB
6. Conclusion
In this work, we designed, implemented, and tested DABIP, a diversity aware RL algorithm that
uses feature dynamics as a proxy for underlying human-contextual diversity. We hypothesized
that this technique could improve RL algorithms that operate in environments where user data
is highly dynamic, and this proved true when applying DABIP to two diferent domains. We
believe that extensions to DAABIP can make it an ideal tool for building more performant
personalized applications.</p>
      <p>Acknowledgements
This work was supported in part by the European Union Horizon 2020 WeNet research and
innovation program under grant agreement No 823783.
[16] S. Li, W. Chen, K.-S. Leung, Improved algorithm on online clustering of bandits, arXiv
preprint arXiv:1902.09162 (2019).
[17] S. Li, A. Karatzoglou, C. Gentile, Collaborative filtering bandits, in: Proceedings of the
39th International ACM SIGIR conference on Research and Development in Information
Retrieval, 2016, pp. 539–548.</p>
      <p>
        A. The DABIP Algorithm
We now give a detailed description of the algorithm. DABIP (Algorithm 1) is initialized with the
number of clusters to maintain () , the frequency with which to update the clusters ( 
frequency with which to update the user feature dynamics ( ), and an exploration parameter ( ).
), the
Then, all users are initialized (Lines 2-4) and the algorithm begins iterating over all timesteps
sequentially (Line 5). In each round, t, a user   is presented along with the set of context vectors
  (Line 6). DABIP begins without any user clusters. DABIP first checks if there are any clusters
(Line 7), and if there are none (length( ≤ 0 )), then the arm with the highest upper confidence
bound (UCB) is chosen. As is standard practice [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] in bandit algorithms, UCB is computed
using the estimation of user   ’s unknown bandit parameter,  ,̂ (Lines 14-16) where  ,−−11 is the
covariance matrix and  ,−1 is a normalizing matrix for user  at timestep  − 1 that are used
to compute the ridge regression solution of the coeficients [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. On the other hand, if a user
clustering has been established (length( &gt; 0 )), then the cluster holding user   is set as  ,
, which represents the unknown bandit parameter for the
entire cluster (Line 9).
(Line 8) and DABIP calculates  ̂ ,
      </p>
      <p>Finally, to choose an arm, we compare the UCB using the user’s unknown bandit parameter,
 ,̂ to the UCB using the average unknown bandit parameter of all users in cluster  , ,  ̂
10-12). The maximum of these two UCB values is selected (Line 13). The reasoning behind this
 ,
(Lines
is that previous work has established that clustering users by unknown bandit parameter is
an efective strategy for identifying users who behave similarly in a task, thus resulting in a
collaborative filtering efect [</p>
      <p>
        <xref ref-type="bibr" rid="ref11 ref15 ref4">15, 11, 16, 17, 4</xref>
        ]. In datasets where changes in user features are
not available or considered, these past works still represent the state of the art in clustering
bandit algorithms. Our approach, by comparison, is to gain an advantage in datasets where user
feature dynamics are available and changing. In these cases, we expect the collective bandit
parameter of the cluster where user   resides,  ̂ , , to estimate expected behavior better than  ,̂ .
      </p>
      <p>With an arm chosen and pulled, we observe the reward,   , then update user parameters and
cluster parameters for the cluster that user   resides in (Lines 17-22). Then, any user features,  ,
are updated (Lines 23-24). This step will be tailored to the specific implementation and dataset,
as the number, type, and sophistication of the user features will be entirely dependent on the
problem definition and setup. The count for how many times user   has been considered is also
updated (Line 25). Finally, the most up to date clusters,   , are calculated and returned by the
CLUSTER function (Line 26 - see Algorithm 2), which ends round t.</p>
      <p>The second component of DABIP is clustering users based upon the similarity of their feature
dynamics. The CLUSTER algorithm (Algorithm 2) assumes that each user has a set of features,
F, of length q such that at any time, t, there exists  , = { ,1, ,  ,2, ..., 
,, }. The values of each
individual user feature,  ,,</p>
      <p>may change over time, which can be tracked to cluster users based
upon the similarity of their feature dynamics. To do this, one can observe the value of a feature
at some initial timestep, then again at a later timestep, and calculate the absolute value of the
diference between them. More formally, at some initial timestep,  
, we store the values of
all features for a given user,  :   ,</p>
      <p>. We also initialize a set   that contains one value for each
a recommendation to user   . Thus, each time user   is selected by the algorithm, we can update
 , based upon the observed user features at timestep t, and increment  , by 1. Once the agent
,
c
l
u
s
t
e
r
u
p
d
a
t
e
f
r
e
q
u
e
n
c
y
,
u
s
e
r
f
e
a
t
u
r
e
d
y
n
a
m
i
c
s
R
e
q
u
i
r
e
:

      
u
p
d
a
t
e
f
r
e
q
u
e
n
c
y
,
e
x
p
l
o
r
a
t
i
o
n
p
a
r
a
m
e
t
e
r</p>
      <p>←</p>
      <p>0
1
:
   
e
a
c
h
f
o
r
d</p>
      <p>o

∈</p>
      <p>2
:

←
 ,

←</p>
      <p>0
3
:
,
0
,
0

←</p>
      <p>0
4
:</p>
      <p>f
o
r
d</p>
      <p>o

←
1
,
2</p>
      <p>...
5
:</p>
      <p>r
e
c
e
i
v
e
a
n
d
o
b
t
a
i
n

∈


=
{
,

...,

}
6
:


1
,
2
,</p>
      <p>,
l
e
n
g
t
h
o</p>
      <p>f
i
f
t
h
e</p>
      <p>n

≥</p>
      <p>0
7
:</p>
      <p>C
l
u
s
t
e
r
w
h
e
r
e
r
e
s
i
d
e
s
a
t
r
o
u
n
d</p>
      <p>t

←</p>
      <p>8
:
 ,</p>
      <p>1
̂
−</p>
      <p>1
∑

←</p>
      <p>9
:

 ,
−</p>
      <p>1
 ,
−</p>
      <p>1

∈
|</p>
      <p>|
 ,
 , −</p>
      <p>1
 , −
1</p>
      <p>1
̂
−</p>
      <p>1
w
h
e
r
e
∑

←
     


+
 
 
←




1
0
:
      

∈

 ,
 ,
 ,</p>
      <p>,
 ,
 ,
−</p>
      <p>1

∈</p>
      <p>√
|</p>
      <p>|
 ,

 ,
 ,
 ,
 , −</p>
      <p>1
 , −
1
̂
−</p>
      <p>1

←</p>
      <p>1
1
:


,
−</p>
      <p>1
,
−</p>
      <p>1

̂
−</p>
      <p>1
w
h
e
r
e

←
     


+
 
 
←




1
2
:
   

∈
 ,
 ,
 ,</p>
      <p>,
 ,


,
−</p>
      <p>1
√
 ,</p>
      <p>←
  
(
,</p>
      <p>)
1
3
:</p>
      <p>e
l
s
e
1
4
:
̂
−</p>
      <p>1

←</p>
      <p>1
5
:


,
−</p>
      <p>1
,
−</p>
      <p>1

̂
−</p>
      <p>1
w
h
e
r
e

←
     


+
 
 
←




1
6
:


∈
 ,
 ,
 ,</p>
      <p>,
 ,


,
−</p>
      <p>1
√
 ,</p>
      <p>p
u
l
l
a
n
d
o
b
s
e
r
v
e
r
e
w
a
r
d</p>
      <p>1
7
:</p>
      <p>−</p>
      <p>1

←

+</p>
      <p>1
8
:
,
,
−
1</p>
      <p>←

+</p>
      <p>1
9
:
,
,
−
1</p>
      <p>l
e
n
g
t
h
o</p>
      <p>f
i
f
t
h
e</p>
      <p>n

≥</p>
      <p>0
2
0
:
−</p>
      <p>1

←

+</p>
      <p>2
1
:

,

,
−
1</p>
      <p>,</p>
      <p>,

←

+</p>
      <p>2
2
:

,

,
−
1</p>
      <p>,</p>
      <p>,
f
o
r
d</p>
      <p>o

∈</p>
      <p>2
3
:
, ,</p>
      <p>,
u
p
d
a
t
e
a
c
c
o
r
d
i
n
g
t
o
i
n
f
o
r
m
a
t
i
o
n
g
a
t
h
e
r
e
d
f
r
o
m
p
r
o
b
l
e
m
s
e
t
u
p
a
n
d


2
4
:
, ,


←

+</p>
      <p>1
2
5
:
,</p>
      <p>,

←
,

)
2
6
:</p>
      <p>h
a
s
m
a
d
e
a
r
e
c
o
m
m
e
n
d
a
t
i
o
n
t
o
a
u
s
e
r
t
i
m
e
s
,
s
a
y
a
t
t
i
m
e
,
t
h
e
f
e
a
t
u
r
e</p>
      <p>,
d
y
n
a
m
i
c
s
f
o
r
u
s
e
r
i
,
,
c
a
n
b
e
c
o
m
p
u
t
e
d
b
a
s
e
d
u
p
o
n
h
o
w
t
h
e
f
e
a
t
u
r
e
s
h
a
v
e
c
h
a
n
g
e
d
b
e
t
w
e
e
n


a
n
d
(
A
l
g
o
r
i
t
h
m
2</p>
      <p>L
i
n
e
2
)
.</p>
      <p>T
h
e
d
ife
r
e
n
c
e
s
a
r
e
s
u
m
m
e
d
o
v
e
r
t
i
m
e
t
o
c
o
m
p
u
t
e
a
n
d



   
   

i
s
a
h
y
p
e
r
p
a
r
a
m
e
t
e
r
t
h
a
t
c
o
n
t
r
o
l
s
h
o
w
o
tfe
n
u
s
e
r
f
e
a
t
u
r
e
d
y
n
a
m
i
c
s
a
r
e
u
p
d
a
t
e
d
.</p>
      <p>A
tfe
r
t
h
i
s

c
a
l
c
u
l
a
t
i
o
n
,
i
s
s
e
t
t
o
a
n
d
i
s
s
e
t
t
o
0
.</p>
      <p>T
h
e
p
r
o
c
e
s
s
r
e
p
e
a
t
s
w
h
e
n
=
u
n
t
i
l
a
l
l





   
   

,

,


t
i
m
e
s
t
e
p
s
a
r
e
c
o
m
p
l
e
t
e
.</p>
      <p>B
y
p
e
r
f
o
r
m
i
n
g
t
h
i
s
o
p
e
r
a
t
i
o
n
f
o
r
e
v
e
r
y
u
s
e
r,
w
e
c
o
n
s
t
a
n
t
l
y
h
a
v
e
a
c
c
e
s
s
t
o
w
h
i
c
h
r
e
p
r
e
s
e
n
t
s


t
h
e
c
u
r
r
e
n
t
d
y
n
a
m
i
c
s
o
f
u
s
e
r
i
’
s
f
e
a
t
u
r
e
s
.</p>
      <p>
        W
e
u
s
e
t
h
e
s
i
m
i
l
a
r
i
t
y
b
e
t
w
e
e
n
u
s
e
r
’
s
v
a
l
u
e
s
t
o

c
l
u
s
t
e
r
t
h
e
m
t
o
g
e
t
h
e
r,
r
a
t
h
e
r
t
h
a
n
a
s
d
o
n
e
i
n
p
r
e
v
i
o
u
s
w
o
r
k
s
[
        <xref ref-type="bibr" rid="ref15 ref4">1
5
,
1
6
,
4</xref>
        ]
.
      </p>
      <p>T
o
t
h
a
t
e
n
d
o
r


=
{
...</p>
      <p>}

1
,
2
,
 ,
s
i
m
p
l
i
c
i
t
y
,
w
e
a
s
s
u
m
e
t
h
a
t
e
a
c
h
u
s
e
r
m
u
s
t
a
p
p
e
a
r
i
n
e
x
a
c
t
l
y
o
n
e
c
l
u
s
t
e
r
a
n
d
a
l
l
u
s
e
r
s
a
r
e
s
p
l
i
t
2:
3:
4:
6:
7:

← 
 
 
5: if t %</p>
      <p>== 0 then
← sort  in ascending order
+ 1 and the rest of size ℎ()
8: return</p>
      <p>for the full clustering pseudocode.

DABIP updates clusters after a period of timesteps have passed  
. This is because
calculating the dynamics of the user features requires observing changes in those features over
a period of time. To re-cluster after every timestep would not allow suficient time to observe
any true dynamics, so we update   for each user after every  timesteps in which that user is
selected.</p>
      <p>Algorithm 2   
Require: user feature dynamics update frequency  , user update counts Y, cluster update
frequency</p>
      <p>, user  
1: if   ==</p>
      <p>then
  = ∑
=1 {| , −  ,</p>
      <p>|}
  ← split(

,s) where split(x,y) splits x into ℎ()%
groups each of size ℎ()</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Chofin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Popineau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bourda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-J.</given-names>
            <surname>Vie</surname>
          </string-name>
          ,
          <article-title>Das3h: modeling student learning and forgetting for optimally scheduling distributed practice of skills</article-title>
          , arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>06873</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Nakagawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Iwasawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          ,
          <article-title>Graph-based knowledge tracing: modeling student proficiency using graph neural network</article-title>
          , in: 2019 IEEE/WIC/ACM International Conference On Web Intelligence (WI), IEEE,
          <year>2019</year>
          , pp.
          <fpage>156</fpage>
          -
          <lpage>163</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Schelenz</surname>
          </string-name>
          , I. Bison,
          <string-name>
            <given-names>M.</given-names>
            <surname>Busso</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Götzen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Gatica-Perez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Giunchiglia</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Meegahapola</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ruiz-Correa</surname>
          </string-name>
          ,
          <article-title>The theory, practice, and ethical challenges of designing a diversity-aware platform for social relations</article-title>
          ,
          <source>in: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>905</fpage>
          -
          <lpage>915</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Local clustering in contextual multi-armed bandits</article-title>
          ,
          <source>in: Proceedings of the Web Conference</source>
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>2335</fpage>
          -
          <lpage>2346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Barto</surname>
          </string-name>
          ,
          <article-title>Reinforcement learning: An introduction</article-title>
          , MIT press,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Bison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bidoglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Busso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Abente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cvajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D. R.</given-names>
            <surname>Britez</surname>
          </string-name>
          , G. Gaskell, G. Sciortino,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stares</surname>
          </string-name>
          , et al.,
          <source>D1</source>
          .
          <article-title>3 final model of diversity: Findings from the pre-pilots study (</article-title>
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Harrison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Price</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Bell</surname>
          </string-name>
          ,
          <article-title>Beyond relational demography: Time and the efects of surface-and deep-level diversity on work group cohesion</article-title>
          ,
          <source>Academy of management journal 41</source>
          (
          <year>1998</year>
          )
          <fpage>96</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. E</given-names>
            .
            <surname>Jackson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. B.</given-names>
            <surname>Alvarez</surname>
          </string-name>
          ,
          <article-title>Socialization amidst diversity-the impact of demographics on work team oldtimers and newcomers</article-title>
          , Research in organizational behavior
          <volume>15</volume>
          (
          <year>1992</year>
          )
          <fpage>45</fpage>
          -
          <lpage>109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kun</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Götzen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bidoglia</surname>
            ,
            <given-names>N. J.</given-names>
          </string-name>
          <string-name>
            <surname>Gommesen</surname>
          </string-name>
          , G. Gaskell,
          <article-title>Exploring diversity perceptions in a community through a q&amp;a chatbot</article-title>
          ,
          <source>in: DRS2022: Bilbao, Design Research Society</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Langford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Schapire</surname>
          </string-name>
          ,
          <article-title>A contextual-bandit approach to personalized news article recommendation</article-title>
          ,
          <source>in: Proceedings of the 19th international conference on World wide web</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>661</fpage>
          -
          <lpage>670</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Gentile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karatzoglou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zappella</surname>
          </string-name>
          , E. Etrue,
          <article-title>On context-dependent clustering of bandits</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1253</fpage>
          -
          <lpage>1262</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lamb</surname>
          </string-name>
          , E. Saveliev,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cameron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zaykov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hernández-Lobato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Turner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. G.</given-names>
            <surname>Baraniuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Barton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Jones</surname>
          </string-name>
          , et al.,
          <article-title>Instructions and guide for diagnostic questions: The neurips 2020 education challenge</article-title>
          , arXiv preprint arXiv:
          <year>2007</year>
          .
          <volume>12061</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>A. De Götzen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Simeone</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Morelli</surname>
          </string-name>
          ,
          <article-title>21 mediating social interaction through a chatbot to leverage the diversity of a community, Artistic Cartography and Design Explorations Towards the Pluriverse (</article-title>
          <year>2022</year>
          )
          <fpage>234</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          , I. Bison,
          <string-name>
            <given-names>M.</given-names>
            <surname>Busso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chenu-Abente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zeni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gunel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Veltri</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Götzen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kun</surname>
          </string-name>
          , et al.,
          <article-title>A worldwide diversity pilot on daily routines and social practices (</article-title>
          <year>2020</year>
          ) (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Gentile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zappella, Online clustering of bandits</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>757</fpage>
          -
          <lpage>765</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>