<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Toward Benchmarking Group Explanations: Evaluating the Efect of Aggregation Strategies versus Explanation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Barile</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shabnam Najafian</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Draws</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oana Inel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alisa Rieger</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rishav Hada</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nava Tintarev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Maastricht University</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TU Delft</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>In the context of group recommendations, explanations have been claimed to be useful for finding a satisfactory choice for all the group members and helping them agree on a common decision, improving perceived fairness, perceived consensus, and satisfaction. In this work, we present a preregistered evaluation of the impact of using social choice-based explanations for group recommendations (i.e., explanations that intuitively describe the strategy used to generate the recommendation). Our objective is to conceptually replicate a previous study and investigate whether a) the used aggregation strategy or b) the explanation afected the most users' fairness perception, consensus perception, and satisfaction. Our results show that the participants are able to discriminate between the diferent strategies, assigning worse evaluations to the Most Pleasure strategy (which chooses the item with the highest of the individual evaluations). In addition to a condition with no (natural language) explanation, we introduce a more detailed social choice-based explanation, evaluating whether additional information about the strategy has a positive impact on the evaluation of the group recommendation. However, we surprisingly found no efect of level of explanations, either as a main efect or as an interaction efect with the aggregation strategy. Overall, our results suggest that users' perceptions of fairness, consensus, and satisfaction are primarily formed based on the social choice aggregation strategies for the studied group scenario. Our work also highlights the challenges of replication studies in recommender systems and discusses some of the design choices that may influence results when attempting to benchmark findings for the efectiveness of group explanations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Social Choice-based Explanations</kwd>
        <kwd>Group Recommender Systems</kwd>
        <kwd>Explainable Recommender Systems</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In many domains, such as online communities [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], music, movies or TV programs [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3, 4, 5, 6</xref>
        ],
and tourism [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], people consume recommendations in groups rather than individually.
Several approaches in the literature [
        <xref ref-type="bibr" rid="ref4 ref7 ref8">4, 8, 7</xref>
        ] propose social choice strategies, which combine the
individual preferences of all group members and predict an item that is suitable for everyone.
Each such aggregation strategy, however, has its trade-ofs. As stated by Arrow’s theorem [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
the performance of an aggregation strategy depends on the evaluation context, meaning that it
is unlikely for an aggregation strategy to outperform other strategies in all situations.
Nevertheless, understanding why particular items are recommended is not a trivial task, especially
for group recommendations [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In general, explanations [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ] have been proposed as a
means of describing why certain items are recommended. The adoption of explanations has
proved to increase user acceptance of recommended items [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. In the context of group
recommendations, however, the role of explanations is even more challenging. Multiple functions
need to be met, besides explaining why certain items are recommended [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ] — to help users
agree on a joint decision, as well as improve users’ perceived fairness, perceived consensus, and
satisfaction [
        <xref ref-type="bibr" rid="ref15 ref17 ref5">15, 5, 17</xref>
        ].
      </p>
      <p>
        To the best of our knowledge, however, only a few studies [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] have focused on generating
and evaluating explanations based on social choice aggregation strategies to increase fairness
and consensus perception of users or their satisfaction. We have identified several limitations
in the current literature on group recommendation explanations that we address in this paper.
First, social choice-based aggregation strategies and their explanations are not evaluated in
isolation. Hence, it is unclear to what extent users’ fairness perception, consensus perception, and
satisfaction evaluations depend on a) the explanations or b) simply the social choice aggregation
strategies. Furthermore, while we agree that the field of group recommendation explanations is
a young one, there is no precedent of replication studies, let alone benchmarks and baselines to
compare explanations against. The challenges the replication crises have posed in the social
sciences and medicine suggest that similar dificulties would be present in other fields involving
user studies [
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ].
      </p>
      <p>In this paper, thus, we address the aforementioned limitations by taking the first steps toward
an explanation benchmark for group explanations. We conduct a preregistered between-subjects
user study with 400 participants, where each participant evaluates one aggregation strategy
and one explanation type in terms of perceived fairness, perceived consensus, and satisfaction
regarding the group recommendations 1. In addition, we also test for interaction efects between
aggregation strategies and explanation types. Thus, we address the following research questions:
RQ1: Are there diferences between social choice-based aggregation strategies in group
recommendation settings regarding users’ fairness perception, consensus perception, or
satisfaction?
RQ2: Do explanations that are based on the group recommendation aggregation strategy at
hand increase users’ fairness perception, consensus perception, or satisfaction?
RQ3: Does the efectiveness of explanations (w.r.t. users’ fairness perception, consensus
perception, or satisfaction) vary depending on the aggregation strategies at hand?
RQ4: Are users’ levels of perceived fairness or perceived consensus related to their satisfaction
concerning the group recommendations?
1To preregister our study, we publicly determined our research questions, hypotheses, experimental setup, and
data analysis plan before any data collection. The (time-stamped) preregistration can be found at https://osf.io/
ghbsq.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work and Hypotheses</title>
      <p>In this section, we introduce the social choice-based aggregation strategies used to generate
recommendations for groups. Then, we illustrate the relevant literature on the explanations for
group recommender systems. Motivated by relevant literature, we also present the hypotheses
that we verify in our study.</p>
      <sec id="sec-2-1">
        <title>2.1. Social Choice-based Aggregation Strategies</title>
        <p>
          There are two main approaches to generate group recommendations: (i) aggregated models:
aggregate individual preferences (e.g., existing ratings) into a group model, generating then
the group recommendations based on such a group model; and (ii) aggregated predictions
or strategies: aggregate individual item-ratings predictions and recommend items with the
highest aggregated scores to the group [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Several aggregation strategies inspired by Social
Choice Theory have been proposed to aggregate individuals’ information [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. An overview of
these strategies, known as social choice-based aggregation strategies, can be found in Masthof
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Following, we describe six of the most utilized social choice-based aggregation strategies:
(i) Additive Utilitarian (ADD) is a consensus-based strategy [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], so it takes into account the
preferences of all group members, recommending the item with the highest sum of all group
members ratings; (ii) Fairness (FAI) is a consensus-based strategy [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] well suited in the context of
repeated decisions, in which the items are ranked as the individuals are choosing them in turn;
(iii) Approval Voting (APP) is a majority-based strategy [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], so it focuses on the most popular
items among group members, recommending the item which has the highest number of ratings
greater than a predefined threshold; (iv) Least Misery (LMS) is a borderline strategy [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], so it
takes into account only a subset of group members preferences and recommends the item which
has the highest of all lowest ratings; (v) Majority (MAJ) is a borderline strategy [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] which
recommends the item with the highest number of all ratings representing the majority of
itemspecific ratings; (vi) Most Pleasure (MPL) is a borderline strategy [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] which recommends the
item with the highest of all individual group members ratings. Social choice-based aggregation
strategies are widely used in the group recommenders literature [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. In Masthof [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], several
experiments are presented to identify the best strategy in terms of perceived group satisfaction.
The results, however, show that there is no winning strategy — diferent strategies perform well
in diferent scenarios. This consideration leads us to the following hypotheses related to RQ12:
• H1a: There is a diference between social choice-based aggregation strategies in group
recommendation settings regarding users’ fairness perception.
• H1b: There is a diference between social choice-based aggregation strategies in group
recommendation settings regarding users’ consensus perception.
• H1c: There is a diference between social choice-based aggregation strategies in group
recommendation settings regarding user satisfaction.
        </p>
        <p>2We note here that we slightly changed the preregistered hypotheses according to the change made to the
research question. The intention is to compare all five aggregation strategies and not only the ones that are
categorized as consensus-based.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Explaining to Groups</title>
        <p>
          In general, explanations can be seen as additional information that is associated with the
recommendations to achieve several goals, such as increasing the transparency (explaining
how the recommendation system works), efectiveness (helping the user in making good
decisions), and usability of the system, as well as user satisfaction [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Several studies in diferent
domains showed the benefits of using explanations for recommendations in increasing users
acceptance rate and satisfaction [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], or trust in the system [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. In group recommendations,
explanations can achieve further goals: fairness (consider all group members’ preference as
much as possible); consensus (help group members to agree on the decision) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ];
privacypreserving (preserve group members’ confidential data, to avoid concerns about a possible loss
of privacy by, e.g., disclosing the preference information of individual group members in the
explanation) [
          <xref ref-type="bibr" rid="ref24 ref25 ref26">24, 25, 26</xref>
          ]. However, most of the research on explanations for recommender
systems focus on single-user scenarios, while only a few studies investigate the problem of
generating explanations for groups. Typically, such explanations are related to the underlying
mechanism of the employed social choice-based aggregation strategy [
          <xref ref-type="bibr" rid="ref17 ref27 ref5">5, 27, 17</xref>
          ]. Natural
language explanation styles based on the underlying social choice aggregation strategies were
introduced in Najafian and Tintarev [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], while Kapcak et al. [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] extended this work using the
wisdom of the crowd to improve the quality of the initially proposed explanations.
QuijanoSanchez et al. [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] introduced explanations including the social factors of personality and
tie strength between group members to generate tactful explanations (e.g., explanations that
avoid damaging friendships). In a more extensive study, Tran et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] propose three types of
explanations for six social choice-based aggregation strategies (ADD, FAI, APP, LMS, MAJ, and
MPL), by considering: (1) the aggregation strategy itself - Type 1, (2) the aggregation strategy
itself and the decision history - Type 2, and (3) the aggregation strategy itself and the future
decision plan - Type3. In a user study, they evaluated these explanations and showed that
explanations related to the ADD and MAJ strategies help the most to increase the fairness and
consensus perception, and satisfaction regarding the group recommendation. They also found
that users’ perceived fairness or consensus correlates with their satisfaction.
        </p>
        <p>Although these works present valuable ways to generate explanations for the most used
benchmark aggregation strategies in group recommender systems research, it is unclear whether
the efects attributed to the explanations might not, in fact, depend on the aggregation strategies
themselves. Starting from this consideration, we formulated a second set of hypotheses that we
intend to validate, related to RQ2:
• H2a: Explanations based on the aggregation strategy at hand increase users’ fairness
perception concerning group recommendations.
• H2b: Explanations based on the aggregation strategy at hand increase users’ consensus
perception concerning group recommendations.
• H2c: Explanations based on the aggregation strategy at hand increase users’ satisfaction
concerning group recommendations.</p>
        <p>Furthermore, an aspect that has not been investigated is the level of detail that the explanation
can achieve concerning the aggregation strategy used and whether this afects the users’ fairness
perception, consensus perception, and satisfaction. To this end, we introduce a third set of
hypotheses which we intend to test, related to RQ3:
• H3a: The efect of aggregation strategy-based explanations on users’ fairness perception
concerning group recommendations is moderated by the type of aggregation strategy at
hand.
• H3b: The efect of aggregation strategy-based explanations on users’ consensus
perception concerning group recommendations is moderated by the type of aggregation strategy
at hand.
• H3c: The efect of aggregation strategy-based explanations on user satisfaction
concerning group recommendations is moderated by the type of aggregation strategy at
hand.</p>
        <p>
          Finally, we also validate the correlation between user satisfaction and perceived fairness and
consensus, c.f., [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]:
• H4a: Users’ perceived fairness is positively related to user satisfaction concerning group
recommendations.
• H4b: Users’ perceived consensus is positively related to user satisfaction concerning
group recommendations.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>We conducted an online between-subjects user study to test our hypotheses.3 We presented
users with scenarios that reflected one of five diferent social choice-based aggregation strategies
for group recommender systems and that included either no explanation or one of two diferent
explanation types. This section outlines the materials, variables, procedure, participant sample,
and statistical analyses related to our user study.</p>
      <sec id="sec-3-1">
        <title>3.1. Materials</title>
        <sec id="sec-3-1-1">
          <title>Aggregation Strategies</title>
          <p>
            Our study considered five diferent social choice-based aggregation strategies for group
recommender systems, that have been evaluated in prior work [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]. Each of these strategies aggregates
the preferences of several users to obtain a recommendation for the group as a whole [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ].
Diferently than in [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ], we do not consider the Fairness aggregation strategy because the
explanation types that we propose can not be generated for this strategy. Each aggregation
strategy is applied to the rating scenario presented in Table 1, where each item (i.e., the three
restaurants, Rest A, Rest B, and Rest C) is rated on a 5-star rating scale (i.e., 1 - the worst and
3All material for analyzing our results and replicating our user study, (i.e., document with preregistration of
all the hypotheses tested, user study materials, data gathered in the user study and the analysis scripts) is publicly
available – https://osf.io/5xbgf/.
5 - the best). Specifically, we consider the following aggregation strategies, from Section 2.1:
Additive Utilitarian (ADD); Approval Voting (APP) considering a threshold equal to 3, as in [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ];
Least Misery (LMS); Majority (MAJ); Most Pleasure (MPL).
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Explanations</title>
          <p>
            Each explanation is presented after showing the scenario in Table 1 and a recommendation
generated with one of the aggregation strategies considered (see Section 3.2 for more details).
We evaluate three types of explanations (see Table 4): (i) Basic explanations, which explain the
aggregation strategy at hand. These explanations have been adopted from Tran et al. [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ],
and refer to Type 1; (ii) Detailed explanations, that explain the aggregation strategy in greater
detail by describing the specific reason why a given item has been recommended; additionally,
we included a condition no explanation, where the aggregation strategy is applied, but no
explanation is given. Participants did, however, see the ratings of the other group members in
this condition.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Procedure</title>
        <p>
          Our study consisted of two subsequent steps. During the first step (after participants had agreed
to informed consent), we introduced participants to the study and asked them for their gender
and age. The second step of our study started with the following scenario (taken from Tran
et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]):
        </p>
        <p>
          “Assume, there is a group of four friends (Alex, Anna, Sam, and Leo). Every month, a group
decision is made by these friends to decide on a restaurant to have dinner together. To select a restaurant
for the dinner next month, the group again has to take the same decision. In this decision, each
group member explicitly rated three restaurants (Rest A, Rest B, and Rest C) using a 5-star rating
scale (1: the worst, 5: the best). The ratings given by group members are shown in the table below:”
After that, Table 1 was shown. Participants saw a group
recommendation either with or without an explanation
depending on which aggregation strategy and explana- Table 1: Ratings of group members for
tion type they had been assigned to (see Table 4). We the restaurants (1: the worst,
then measured perceived fairness, perceived consensus, 5: the best) from Tran et al.
and satisfaction (see Section 3.3). We also included an [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
attention check where we specifically instructed par- Alex Anna Sam Leo
ticipants on what option to select. Finally, participants Rest A 2 2 4 4
could explain their answers in a text field. The study Rest B 1 4 4 4
had been approved by the ethics committee of our in- Rest C 5 1 1 1
stitution.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Variables</title>
        <sec id="sec-3-3-1">
          <title>Independent Variables</title>
          <p>(i) Aggregation strategy (categorical, between-subjects). Each participant was exposed to a
scenario that reflected one of the five aggregation strategies (i.e., ADD, APP, LMS, MAJ, or MPL;
see Section 3.1). (ii) Explanation type (categorical, between-subjects). Each participant saw
either no explanation, a basic explanation, or a detailed explanation (see Section 3.1).</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>Dependent Variables</title>
          <p>We measured each of our three dependent variables by asking participants to respond to a
statement on a seven-point Likert scale ranging from “strongly agree” (scored as − 3) to “strongly
disagree” (scored as 3). We have: (i) Perceived fairness (ordinal): “The group recommendation
is fair to all group members”; (ii) Perceived Consensus (ordinal): “The group members will
agree on the group recommendation”; (iii) Satisfaction (ordinal): “The group members will be
satisfied with regard to the group recommendation”.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>Descriptive Variables</title>
          <p>We collected data on two demographic variables: (i) Age (categorical), participants could select
one of the options 18-25, 26-35, 36-45, 46-55, &gt;55; (ii) Gender (categorical). Participants could
select one of the options female, male, or other. Participants could also select a “prefer not to
say” option for these variables.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Participants</title>
        <p>
          Before data collection, we computed the required sample size for our study in a power analysis
for a between-subjects ANOVA (Fixed efects, special, main efects, and interactions; see Section
3.5) using G*Power [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. Here, we specified the default efect size f = 0.25, a significance
threshold  = 01.015 = 0.005 (due to testing multiple hypotheses; see Section 3.5), a power of
(1 −  ) = 0.8, and that we will test 5 × 3 = 15 groups (i.e., 5 diferent aggregation strategies for
3 diferent explanation scenarios). We performed this computation for each hypothesis using
their respective degrees of freedom. This resulted in a total required sample size of at least
378 participants. We thus recruited 400 participants from the online participant pool Prolific 4,
all of whom were proficient English speakers above 18 years of age. To maintain high-quality
answers, we selected only participants that had an approval rate of at least 90% and participated
in at least 10 prior studies. Each participant was allowed to participate in our study only once
and received £0.63 as a reward for participation. We excluded one participant from data analysis
because they did not pass the attention check we included in the experiment. The resulting
sample of 399 participants was composed of 61% female, 38% male, and 1% other participants.
They represented a diverse range of age groups: 28% were between 18 and 25, 29% between 26
and 35, 17% between 36 and 45, 14% between 46 and 55, and 13% were above 55 years of age. We
3.0
2.5
2.0
n1.5
ito1.0
p
ce0.5
r
eP0.0
sse−0.5
irn−1.0
a
F−1.5
−2.0
−2.5
−3.0
randomly distributed participants over the 15 conditions (i.e., exposing them to 1/5 aggregation
strategies and 1/3 explanation types).
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Statistical Analyses</title>
        <p>
          For each of the three dependent variables in our study (i.e., fairness perception, consensus
perception, and satisfaction), we conducted a two-way analysis of variance (ANOVA) using aggregation
strategy and explanation type as between-subjects factors. These three ANOVAs were used to
test nine hypotheses (i.e., H1a – H3c). Specifically, each of them tested main efects of
aggregation strategy (H1a – H1c) and explanation type (H2a – H2c) as well as the interaction between
these two variables in afecting the dependent variables ( H3a – H3c). We chose this type of
analysis despite the anticipation that our data may not be normally distributed (i.e., violating an
ANOVA assumption) because ANOVAs are usually robust to Likert-type ordinal data [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. We
additionally performed two Spearman correlation analyses to test hypotheses H4a and H4b.
We thus tested 11 diferent hypotheses. Applying a Bonferroni correction [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ], we lowered the
significance threshold to  = 0.05 = 0.0046. Since we found significant main efects related to
11
our first six hypotheses ( H1a – H2c; see Section 4), we conducted Tukey posthoc analyses to
investigate specific diferences between the aggregation strategies and explanation types. The
p-values from these posthoc analyses were adjusted to correct for multiple testing (i.e., written
as adj).
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>Descriptive Statistics</title>
        <p>Participants’ distribution over the 15 diferent conditions (i.e., all possible combinations between
the five aggregation strategies and the three explanation types) was balanced: each condition
was shown to 6% to 7% of participants. On average, participants spent 2.9 (sd = 2.2; no notable
diference between conditions) minutes on the task. Qualitative feedback from participants
suggested that the scenario and task were understandable. Participants had a slight overall
tendency to perceive fairness, consensus, and satisfaction in the scenarios, as 51%, 51%, and 56%
at least somewhat agreed to these three items, respectively. Figure 1 shows participants’ mean
fairness perception, consensus perception, and satisfaction across explanation types and split by
aggregation strategies.</p>
        <p>RQ1: diferences between social-choice based aggregation strategies regarding
explanation efectiveness. We found significant diferences between the five aggregation
strategies concerning all three dependent variables fairness perception, consensus perception,
and satisfaction (H1a – H1c;  = [36.19, 49.57], all  &lt; 0.001,  2 = [0.27, 0.34]; see Table
2). So, overall, participants expressed diferent levels regarding these three variables based on
which aggregation strategy they were exposed to. Posthoc analyses revealed that MPL led to
lower levels on all three variables compared to all other aggregation strategies (all adj &lt; 0.001).
The only other significant diferences we found between aggregation strategies was that APP
(adj = 0.004) and MAJ (adj = 0.005) each led to lower fairness perception compared to
LMS. In sum, participants – irrespective of which explanation type they saw – viewed MPL as
significantly less fair, consensual, and satisfying compared to other strategies, and judged MAJ
as well as APP as less fair compared to LMS.</p>
      </sec>
      <sec id="sec-4-2">
        <title>RQ2: diferences between explanation types (i.e., no explanation, basic explanation,</title>
        <p>or detailed explanation). We found no significant diferences between the three explanation
types regarding all three dependent variables (H2a – H2c;  = [0.14, 0.35],  = [0.71, 0.86], all
 2 = 0.00; see Table 2). So, our results contain no evidence for a diference between explanation
types concerning our three dependent variables.</p>
      </sec>
      <sec id="sec-4-3">
        <title>RQ3: interactions between aggregation strategies and explanation types regarding</title>
        <p>explanation efectiveness. There were no significant interaction efects between the five
aggregation strategies and the three explanation types (H3a – H3c;  = [0.65, 1.25],  =
[0.27, 0.71],  2 = [0.01, 0.03]; see Table 2). The efect of explanation types on participants’
fairness perception, consensus perception, and satisfaction thus did not significantly difer based
on which aggregation strategy was applied.</p>
      </sec>
      <sec id="sec-4-4">
        <title>RQ4: associations between explanation efectiveness measures. In line with the find</title>
        <p>
          ings of Tran et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], Spearman correlation analyses revealed significant positive relationships
between fairness perception and satisfaction ( = 0.71,  &lt; 0.001) as well as between
consensus perception and satisfaction ( = 0.76,  &lt; 0.001). This means that, as participants’ fairness
and consensus perception increased, satisfaction also increased.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>In the following sections, we look closer at our results and their implications. We discuss the
diference between aggregation strategies, the diference between diferent explanation levels,
and the efect of the chosen scenario. We conclude with lessons learned for future benchmarking
studies in explanations research and limitations of our study.</p>
      <sec id="sec-5-1">
        <title>5.1. The Diferences Between Aggregation Strategies</title>
        <p>
          As shown in Section 4, there are diferences between the aggregation strategies in terms of
perceived fairness, perceived consensus, and satisfaction. The MLP strategy obtains the lowest
scores, regardless of the type of explanation received. Furthermore, MAJ and APP are perceived
to be less fair than LMS. We discuss how these results may have interacted with the presented
scenario in Section 5.5. However, these results are in contrast with the findings of Tran et al.
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], where the same scenario was used. In such work, the MAJ and ADD strategies scored
better than the LMS strategy. An explanation of this diference can be the diferent design of our
experiment: we implemented a between-subject design to guarantee the independence between
the conditions; on the contrary, in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] each user evaluated six strategies and was exposed to
diferent explanation types. Although the strategies were presented in a randomized order to
reduce biases, it is possible that the user used an explanation type seen first as a reference point
to compare with, in the following evaluations, which introduced noise in the users’ evaluations.
Furthermore, to evaluate the efect of the aggregation strategy separately from the explanation,
we asked participants to evaluate the recommendation. In contrast, Tran et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] asked the
participants to evaluate the explanation provided, hence the evaluation of the explanation was
influenced by the evaluation of the aggregation strategy.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. The Role of Explanations</title>
        <p>
          The results presented showed no significant diference between the diferent types of
explanations. Furthermore, no interaction efects between the explanations and the aggregations
regarding the measured dependent variables (perceived fairness, perceived consensus, and
satisfaction) were found. However, these results are not enough to claim that the explanations
are not useful for group recommender systems. First, it must be considered that the used
scenario was particularly simple to evaluate. More complex scenarios might involve a more
balanced situation between subgroups with diferent preferences, or a greater number of options
to choose from: such factors might complicate the assessment; in such cases, an explanation of
the approach used might have an impact. Moreover, the strategies presented here represent
baselines for group recommenders. Therefore, it is necessary to formalise the explanations for
these strategies, as they serve as a reference against which more articulated strategies can be
compared. The most recent lines of research in group recommenders, however, try to integrate
into the recommendation generation process personal factors (experience in the domain [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] or
personality [
          <xref ref-type="bibr" rid="ref28 ref33 ref34">33, 28, 34</xref>
          ]), as well as social factors (tie strength [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ], centrality of group members
in the group social network [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ], group diversity [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ]). In such cases, an explanation may have
an impact on the transparency and comprehensibility of the system and result in diferent
evaluations regarding fairness perception, consensus perception, and satisfaction. This, of
course, also leads to privacy issues, concerning which personal information of one or more
individuals can be mentioned in an explanation.
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. The Link Between Fairness, Consensus, and Satisfaction</title>
        <p>
          The correlation between fairness perception (or consensus perception) and satisfaction, already
reported in Tran et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], and also showed in our results, confirms the close connection
between these concepts. A solution perceived as less fair is also perceived as less satisfactory,
and a less satisfactory solution is unlikely to be accepted by the group. This confirms that
these aspects, sometimes considered secondary, are crucial and that a group recommendation
system must take them into account, both in the generation of recommendations and in their
evaluation.
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Lessons Learned for Benchmarking</title>
        <sec id="sec-5-4-1">
          <title>Report on Participant Recruitment</title>
          <p>
            Numerous platforms can be used to outsource user studies [
            <xref ref-type="bibr" rid="ref38">38</xref>
            ], such as Prolific and Amazon
Mechanical Turk. Recruitment might also focus on particular users, such as students or staf
members. Filtering conditions, such as those for quality control also afect which demographics
take part in a study. More generally, any selection of study participants can influence the
outcome of the evaluation, which should not be generalized outside the scope of the scenario [
            <xref ref-type="bibr" rid="ref39">39</xref>
            ].
Therefore, we recommend a thorough reporting on how participants were recruited.
          </p>
        </sec>
        <sec id="sec-5-4-2">
          <title>Report Study Design and Statistical Analysis Rigorously</title>
          <p>
            The choice of the quantitative study, between-subjects, within-subjects or mixed designs is also
influencing the conclusions that can be drawn, as well as the statistical analysis that should
be applied. In any case, randomizing participants to conditions is of paramount importance,
regardless of study design. More personalized study designs, such as the one conducted by Tran
et al. [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ], should clearly specify how each scenario has been allocated to participants, to be able
to replicate them. We, in particular, recommend more rigorous reporting of how randomization
is performed, as well as sharing scripts to support replication and comparison.
          </p>
        </sec>
        <sec id="sec-5-4-3">
          <title>Ensure consistency in measurement or motivate changes well</title>
          <p>
            In separating the evaluation of explanations and aggregation strategy, we found it was no longer
feasible to ask participants to evaluate the explanations rather than the resulting
recommendation. In addition, compared to Tran et al. [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ], we ask study participants to rate explanations’
efectiveness on a 7-point Likert scale, instead of a 5-point Likert scale, since this ensures greater
robustness in the use of ANOVA analysis, according to [
            <xref ref-type="bibr" rid="ref30">30</xref>
            ]. While these changes may not
have afected the results, such changes in the design must be described and motivated when
attempting to benchmark such user studies.
          </p>
        </sec>
        <sec id="sec-5-4-4">
          <title>Report on Completeness</title>
          <p>We found that certain aggregation strategies can not be explained in certain instances or
scenarios. In this paper, this was the case of the Fairness strategy, which is well-suited for
repeated decisions, but less applicable for single decisions as in our case. We recommend that
future work not only describes the cases where explanations can be generated but also describes
the edge cases for which they cannot.</p>
        </sec>
        <sec id="sec-5-4-5">
          <title>Consider the Efect of the Scenario</title>
          <p>The proposed scenario in this work was selected to specifically study groups with heterogeneous
preferences. However, this choice is likely to have afected our specific results. For example, the
MPL strategy in this specific scenario recommends a solution that displeases at most three out
of four group members (Rest C, see Table 1). It is not surprising, therefore, that it is identified
as the least fair, least satisfactory strategy, and the one on which it is most dificult to reach
an agreement. The result might have been diferent if it displeased fewer group members. We,
therefore, recommend not only to clearly report on the scenario used, but also to discuss its
implications.</p>
        </sec>
        <sec id="sec-5-4-6">
          <title>Consider efects of the role of the participant in a group</title>
          <p>The evaluations are given in this paper based on an external evaluator who may be more
unbiased (than someone within the group). Users within the group may be influenced by their
own preferences. Furthermore, the assessment of the fairness of a scenario will likely difer
depending on whether it favors the user, e.g., if MLP displeases 2 users and whether the active
user is one of them.</p>
        </sec>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Limitations</title>
        <sec id="sec-5-5-1">
          <title>Recommendations and explanations are not evaluated by group members</title>
          <p>
            As previously mentioned, in line with the evaluation approach in Tran et al. [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ], our study
participants were asked to evaluate the recommendations as external evaluators. This means
that study participants were not members of the group. We hypothesize, however, that their
evaluations could be diferent when part of the group. Deciding for an evaluator that is part of
the group would entail controlling more cases, such as when the evaluator is in the majority
preference, minority preference, or a tie preference.
          </p>
        </sec>
        <sec id="sec-5-5-2">
          <title>We do not measure nor capture the reasoning process of the study participants regarding recommendations</title>
          <p>
            In the condition with no explanations, we provide a mere description of the recommendation.
However, we do not capture how study participants reflect on the recommendation or to what
extent they understand it. Prior literature [
            <xref ref-type="bibr" rid="ref14 ref40 ref41">40, 14, 41</xref>
            ], however, provides several directions
for measuring recommendation understandability, which could be investigated in future work.
Nevertheless, our descriptive analysis in Section 4 shows that participants spent a similar
amount of time to complete each explanation condition. This suggests that they spent a similar
amount of efort analyzing their fairness and consensus perception, as well as satisfaction
regarding the recommended restaurant.
          </p>
        </sec>
        <sec id="sec-5-5-3">
          <title>Recommendations are provided for unnamed restaurants</title>
          <p>
            We did not want to influence participants’ decisions by providing real restaurant names as
recommendations. This helped us control for the potential bias that could have been added
while showing a real restaurant name. Such normalization, however, could potentially influence
the assessments of the study participants, compared to a customized recommendation. Another
limitation of our study is that all recommendations are in the restaurants’ domain. Diferent
recommendation domains could be diferently perceived in terms of fairness, consensus, and
satisfaction. In particular, the investment related to the domain considered has shown to
have an impact on the evaluation of the recommendations [
            <xref ref-type="bibr" rid="ref42">42</xref>
            ]; the restaurant domain is
generally perceived as a medium-low investment, compared to other domains suitable for group
recommendations, such as tourism.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>We present a preregistered evaluation of the impact of using social choice-based explanations
for group recommendations. Overall, our finding suggests that explanations are not necessarily
helpful for improving perceptions of the recommendations. Participants’ evaluations were not
influenced by the presence of an explanation, and their perceptions of fairness, consensus, and
satisfaction were primarily formed based on the social choice-based aggregation strategies.
Participants evaluated the Least Misery (LMS) strategy as more fair than the Approval Voting
(APP) and the Majority (MAJ), while the Most Pleasure (MPL) was considered the worst in
terms of perceived fairness, perceived consensus, and satisfaction. We also discuss some of the
challenges and decision points required to benchmark future studies of group explanations. In
particular, we highlighted the importance of clarifying and motivating the recruitment process
and properly choosing the experimental design, specifying how each condition is assigned to
each participant. Furthermore, we discussed how the choice of the scenario to present for the
evaluation can influence the results, and that, therefore, the results should always be discussed
in relation to it. In future work, we plan to investigate the influence of internal vs. external
evaluators. We plan to thoroughly study the reasoning process of evaluators and measure the
level of understanding regarding the recommended item. To observe to what extent our results
generalize, we also plan to replicate our study with other scenarios and domains.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Appendix - Basic and Detailed Explanations</title>
      <p>In this appendix, we specify how to generate the Basic and Detailed explanations used in this
work, for each of the aggregation strategies considered (see section 3.1). Let  = {1, ..., }
be a group of users, and  = {1, ..., } be a set of items. Also, let {1 , 2 , ..., ¯ } be a
subset of group members who gave a specific rating  to the item  selected by the considered
strategy. Hence, we can define the explanations, for each aggregation strategy, as in the Table 3,
while the Table 4 shows the explanations obtained on the scenario used in the experiment (see
the Table 1).
“Restaurant B has been
recommended to the group since
it achieves the highest
number of ratings which are above
3.”
“Restaurant A has been
recommended to the group since
no group members has a real
problem with it.”
“Restaurant B has been
recommended to the group since
most group members like it.”</p>
      <p>Detailed explanation
“Restaurant B has been recommended to the
group since it achieves the highest total rating
(as the sum of the ratings of all members for
Restaurant B is 13 which is higher than other
items).
“Restaurant B has been recommended to the
group since it achieves the highest number of
ratings which are above a threshold (as the three
group members Anna, Sam, and Leo gave it
ratings higher than 3).”
“Restaurant A has been recommended to the
group since no group members has a real
problem with it (as Alex and Anna gave it a rating of
2 which is the highest rating among the lowest
ratings per restaurant).”
“Restaurant B has been recommended to the
group since most group members like it (as 3
out of 4 group members gave it a high rating).”</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.-L.</given-names>
            <surname>Chen</surname>
          </string-name>
          , L.-C. Cheng,
          <string-name>
            <given-names>C.-N.</given-names>
            <surname>Chuang</surname>
          </string-name>
          ,
          <article-title>A group recommendation system with consideration of interactions among group members</article-title>
          ,
          <source>Expert systems with applications 34</source>
          (
          <year>2008</year>
          )
          <fpage>2082</fpage>
          -
          <lpage>2090</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. K.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. Y.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. U.</given-names>
            <surname>Ryu</surname>
          </string-name>
          ,
          <article-title>A group recommendation system for online communities</article-title>
          ,
          <source>International journal of information management 30</source>
          (
          <year>2010</year>
          )
          <fpage>212</fpage>
          -
          <lpage>219</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>M. O'connor</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Cosley</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Konstan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>Polylens: A recommender system for groups of users</article-title>
          ,
          <source>in: ECSCW 2001</source>
          , Springer,
          <year>2001</year>
          , pp.
          <fpage>199</fpage>
          -
          <lpage>218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          , Group modeling:
          <article-title>Selecting a sequence of television items to suit a group of viewers</article-title>
          , in: Personalized digital television, Springer,
          <year>2004</year>
          , pp.
          <fpage>93</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Najafian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <article-title>Generating consensus explanations for group recommendations: an exploratory study</article-title>
          ,
          <source>in: Adjunct Publication of the 26th Conference on User Modeling, Adaptation and Personalization</source>
          , ACM,
          <year>2018</year>
          , pp.
          <fpage>245</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hong</surname>
          </string-name>
          , Attentive group recommendation,
          <source>in: The 41st International ACM SIGIR Conference on Research &amp; Development in Information Retrieval</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>645</fpage>
          -
          <lpage>654</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Najafian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Herzog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Inel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <article-title>You do not decide for me! evaluating explainable group aggregation strategies for tourism</article-title>
          ,
          <source>in: Proceedings of the 31st ACM Conference on Hypertext and Social Media</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>187</fpage>
          -
          <lpage>196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          , Group recommender systems: aggregation, satisfaction and group attributes,
          <source>in: recommender systems handbook</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>743</fpage>
          -
          <lpage>776</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K. J.</given-names>
            <surname>Arrow</surname>
          </string-name>
          ,
          <article-title>A dificulty in the concept of social welfare</article-title>
          ,
          <source>Journal of political economy 58</source>
          (
          <year>1950</year>
          )
          <fpage>328</fpage>
          -
          <lpage>346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Felfernig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Boratto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stettinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tkalčič</surname>
          </string-name>
          ,
          <article-title>Explanations for groups</article-title>
          , in: Group Recommender Systems, Springer,
          <year>2018</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          ,
          <article-title>Efective explanations of recommendations: user-centered design</article-title>
          ,
          <source>in: Proceedings of the 2007 ACM conference on Recommender systems</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>153</fpage>
          -
          <lpage>156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zanker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Felfernig</surname>
          </string-name>
          , G. Friedrich,
          <source>Recommender systems: an introduction</source>
          , Cambridge University Press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. De Gemmis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Felfernig</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Lops</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Ricci</surname>
          </string-name>
          , G. Semeraro,
          <article-title>Human decision making and recommender systems</article-title>
          ,
          <source>ACM Transactions on Interactive Intelligent Systems (TiiS) 3</source>
          (
          <issue>2013</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gedikli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <article-title>How should i explain? a comparison of diferent explanation types for recommender systems</article-title>
          ,
          <source>International Journal of Human-Computer Studies</source>
          <volume>72</volume>
          (
          <year>2014</year>
          )
          <fpage>367</fpage>
          -
          <lpage>382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Felfernig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Boratto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stettinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tkalčič</surname>
          </string-name>
          ,
          <article-title>Explanations for groups</article-title>
          , in: Group Recommender Systems, Springer,
          <year>2018</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ntoutsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stefanidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nørvåg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.-P.</given-names>
            <surname>Kriegel</surname>
          </string-name>
          ,
          <article-title>Fast group recommendations by applying user clustering</article-title>
          ,
          <source>in: International conference on conceptual modeling</source>
          , Springer,
          <year>2012</year>
          , pp.
          <fpage>126</fpage>
          -
          <lpage>140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T. N. T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Felfernig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Samer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stettinger</surname>
          </string-name>
          ,
          <article-title>Towards social choice-based explanations in group recommender systems</article-title>
          ,
          <source>in: Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Nosek</surname>
          </string-name>
          , G. Alter,
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Banks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Borsboom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Breckler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Buck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Chambers</surname>
          </string-name>
          , G. Chin,
          <string-name>
            <given-names>G.</given-names>
            <surname>Christensen</surname>
          </string-name>
          , et al.,
          <source>Promoting an open research culture, Science</source>
          <volume>348</volume>
          (
          <year>2015</year>
          )
          <fpage>1422</fpage>
          -
          <lpage>1425</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>B.</given-names>
            <surname>Nosek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cohoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kidwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Spies</surname>
          </string-name>
          ,
          <source>Estimating the Reproducibility of Psychological Science, Science</source>
          <volume>349</volume>
          (
          <year>2015</year>
          )
          <article-title>aac47160</article-title>
          . doi:
          <volume>10</volume>
          .1126/science.aac4716.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Senot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kostadinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bouzid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Picault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aghasaryan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bernier</surname>
          </string-name>
          ,
          <article-title>Analysis of strategies for building group profiles</article-title>
          , in: International Conference on User Modeling, Adaptation, and Personalization, Springer,
          <year>2010</year>
          , pp.
          <fpage>40</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          ,
          <article-title>A survey of explanations in recommender systems</article-title>
          ,
          <source>in: 2007 IEEE 23rd international conference on data engineering workshop</source>
          , IEEE,
          <year>2007</year>
          , pp.
          <fpage>801</fpage>
          -
          <lpage>810</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Herlocker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>Explaining collaborative filtering recommendations</article-title>
          ,
          <source>in: Proceedings of the 2000 ACM conference on Computer supported cooperative work, ACM</source>
          ,
          <year>2000</year>
          , pp.
          <fpage>241</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Swearingen</surname>
          </string-name>
          ,
          <article-title>The role of transparency in recommender systems</article-title>
          , in: CHI'
          <article-title>02 extended abstracts on Human factors in computing systems</article-title>
          ,
          <year>2002</year>
          , pp.
          <fpage>830</fpage>
          -
          <lpage>831</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Najafian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Delic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tkalcic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <article-title>Factors influencing privacy concern for explanations of group recommendation</article-title>
          ,
          <source>in: Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Najafian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Inel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <article-title>Someone really wanted that song but it was not me! evaluating which information to disclose in explanations for group recommendations</article-title>
          ,
          <source>in: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>S.</given-names>
            <surname>Najafian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Draws</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Barile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tkalcic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <article-title>Exploring user concerns about disclosing location and emotion information in group recommendations</article-title>
          ,
          <source>in: Proceedings of the 32st ACM Conference on Hypertext and Social Media</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>155</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Ö. Kapcak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Spagnoli</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Robbemond</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Vadali</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Najafian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <article-title>Tourexplain: A crowdsourcing pipeline for generating explanations for groups of tourists</article-title>
          , in: Workshop on Recommenders in Tourismco-located
          <source>with the 12th ACM Conference on Recommender Systems (RecSys</source>
          <year>2018</year>
          ), CEUR,
          <year>2018</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>L.</given-names>
            <surname>Quijano-Sanchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Recio-Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Diaz-Agudo</surname>
          </string-name>
          ,
          <article-title>Make it personal: a social explanation system applied to group recommendations</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>76</volume>
          (
          <year>2017</year>
          )
          <fpage>36</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>F.</given-names>
            <surname>Faul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Erdfelder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Lang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Buchner</surname>
          </string-name>
          ,
          <string-name>
            <surname>G*</surname>
          </string-name>
          <article-title>Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences</article-title>
          ,
          <source>Behavior Research Methods</source>
          <volume>39</volume>
          (
          <year>2007</year>
          )
          <fpage>175</fpage>
          --
          <lpage>191</lpage>
          . doi:
          <volume>10</volume>
          .3758/BF03193146.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>G.</given-names>
            <surname>Norman</surname>
          </string-name>
          ,
          <article-title>Likert scales, levels of measurement and the "laws" of statistics</article-title>
          ,
          <source>Advances in Health Sciences Education</source>
          <volume>15</volume>
          (
          <year>2010</year>
          )
          <fpage>625</fpage>
          -
          <lpage>632</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10459-010-9222-y.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Napierala</surname>
          </string-name>
          , What Is the Bonferroni correction?,
          <year>2012</year>
          . URL: http://www.aaos.org/ news/aaosnow/apr12/research7.asp.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gartrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beach</surname>
          </string-name>
          , R. Han,
          <string-name>
            <surname>S</surname>
          </string-name>
          . Mishra,
          <string-name>
            <given-names>K.</given-names>
            <surname>Seada</surname>
          </string-name>
          ,
          <article-title>Enhancing group recommendation by incorporating social relationship interactions</article-title>
          ,
          <source>in: Proceedings of the 16th ACM international conference on Supporting group work</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Delic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bridge</surname>
          </string-name>
          ,
          <article-title>Conflict resolution in group decision making: insights from a simulation study, User Modeling and User-Adapted Interaction 29 (</article-title>
          <year>2019</year>
          )
          <fpage>895</fpage>
          -
          <lpage>941</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cervone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Barile</surname>
          </string-name>
          ,
          <article-title>An altruistic-based utility function for group recommendation</article-title>
          ,
          <source>in: Transactions on Computational Collective Intelligence XXVIII</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>F.</given-names>
            <surname>Barile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Rossi,</surname>
          </string-name>
          <article-title>The adaptation of an individual's satisfaction to group context: the role of ties strength and conflicts</article-title>
          ,
          <source>in: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>357</fpage>
          -
          <lpage>358</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>A.</given-names>
            <surname>Delic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Neidhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Werthner</surname>
          </string-name>
          ,
          <article-title>How to use social relationships in group recommenders: empirical evidence</article-title>
          ,
          <source>in: Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>121</fpage>
          -
          <lpage>129</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>A.</given-names>
            <surname>Delic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Werthner</surname>
          </string-name>
          ,
          <article-title>The efects of group diversity in group decisionmaking process in the travel and tourism domain</article-title>
          ,
          <source>in: Information and Communication Technologies in Tourism 2020</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>129</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>E.</given-names>
            <surname>Peer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Brandimarte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Acquisti</surname>
          </string-name>
          ,
          <article-title>Beyond the turk: Alternative platforms for crowdsourcing behavioral research</article-title>
          ,
          <source>Journal of Experimental Social Psychology</source>
          <volume>70</volume>
          (
          <year>2017</year>
          )
          <fpage>153</fpage>
          -
          <lpage>163</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>J.</given-names>
            <surname>Beel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Breitinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Langer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <article-title>Towards reproducibility in recommender-systems research, User modeling and user-adapted interaction 26 (</article-title>
          <year>2016</year>
          )
          <fpage>69</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Knijnenburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Reijmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Willemsen</surname>
          </string-name>
          ,
          <article-title>Each to his own: how diferent users call for diferent interaction methods in recommender systems</article-title>
          ,
          <source>in: Proceedings of the fifth ACM conference on Recommender systems</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>141</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>Are explanations helpful? a comparative study of the efects of explanations in ai-assisted decision-making</article-title>
          ,
          <source>in: 26th International Conference on Intelligent User Interfaces</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>318</fpage>
          -
          <lpage>328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          ,
          <article-title>Over-and underestimation in diferent product domains</article-title>
          ,
          <source>in: Workshop on Recommender Systems associated with ECAI</source>
          , Springer Boston, MA,
          <year>2008</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>