=Paper= {{Paper |id=Vol-3124/paper17 |storemode=property |title=Development of an Instrument for Measuring Users' Perception of Transparency in Recommender Systems |pdfUrl=https://ceur-ws.org/Vol-3124/paper17.pdf |volume=Vol-3124 |authors=Marco Hellman,Diana C. Hernandez-Bocanegra,Jürgen Ziegler |dblpUrl=https://dblp.org/rec/conf/iui/HellmanH022 }} ==Development of an Instrument for Measuring Users' Perception of Transparency in Recommender Systems== https://ceur-ws.org/Vol-3124/paper17.pdf
Development of an Instrument for Measuring Users’
Perception of Transparency in Recommender Systems
Marco Hellmann, Diana C. Hernandez-Bocanegra and Jürgen Ziegler
University of Duisburg-Essen, Forsthausweg 2, 47057 Duisburg, Germany


                                          Abstract
                                          Transparency is increasingly seen as a critical requirement for achieving the goal of human-centered AI systems in general
                                          and also, specifically, recommender systems (RS). However, defining and operationalizing the concept is still difficult, due
                                          to its multi-faceted nature. Currently, there are hardly any measurement instruments to adequately assess the perceived
                                          transparency of RS in user studies. Thus, we present the development of a measurement instrument that aims at capturing
                                          perceived transparency as a multidimensional construct. The results of our validation show that transparency can be
                                          distinguished with respect to input (what data does the system use?), functionality (how and why is an item recommended?),
                                          output (why and how well does an item fit one’s preferences?), and interaction (what needs to be changed for a different
                                          prediction?). The study is intended as a first iteration in the development of a reliable and fully validated measurement tool
                                          for assessing transparency in RS.

                                          Keywords
                                          Recommender systems, transparency, explanations, user study



1. Introduction                                                                                                    data, the recommendation algorithm, or features of the
                                                                                                                   recommended items may be exposed to the user, trans-
The request for more transparency in intelligent systems                                                           parency as a user-centric quality can only be assessed
has become steadily louder in recent years, formulated                                                             by measuring users’ perception and understanding of
in academic research as well as in most public and cor-                                                            those system aspects that are relevant for their decision
porate policies concerning the ethics of artificial intelli-                                                       making and trust in the system [4].
gence [1, 2]. Although there is now broad agreement that                                                              Despite the acclaimed relevance of transparency in RS,
transparency is of high relevance for developing human-                                                            the instruments available for measuring it from a user
centred AI systems, the concept is still elusive due to                                                            perspective are still very limited. Some instruments for
its multi-faceted nature and the different objectives it is                                                        assessing overall recommendation quality include a small
intended to serve. The questions raised when asking for                                                            number of items related to perceived transparency [5],
transparency include, for example, the system aspects                                                              but these measures still seem far from covering the multi-
that should be made transparent, or the riskiness of an                                                            ple facets involved. To the best of our knowledge, there is
AI function at an individual or societal level.                                                                    no instrument focusing specifically on RS transparency.
   A need for greater transparency has also been noted                                                             A further shortcoming of existing instruments is the lack
for recommender systems (RS), a frequent, user-facing                                                              of sufficiently considering the cognitive processes in-
type of AI-driven technology, to better support users’ in                                                          volved in users’ understanding of recommendations and
their decision-making and to avoid potentially negative                                                            in their ability to influence the system according to their
consequences, e. g. users getting trapped in filter bub-                                                           needs if such influence is possible.
bles [3]. Various methods have been proposed to this                                                                  In this paper, we describe steps towards a more holis-
end, ranging from disclosing the user profile on which a                                                           tic and cognitively grounded psychometric instrument
recommendation is based to providing explicit explana-                                                             for measuring perceived transparency in RS. We first
tions. Still, the multi-facetedness of the concept makes                                                           explain the questionnaire development process that re-
it difficult to design effective transparent RS. A central                                                         sulted in a validated set of items specifically focused on
question that must be solved to this end is how trans-                                                             RS transparency. The candidate items for this develop-
parency of a RS can be measured and evaluated. While                                                               ment were chosen to reflect the different steps involved
different aspects of the system, for example, the input                                                            in cognitively processing the information provided about
Joint Proceedings of the ACM IUI Workshops 2022, March 2022,
                                                                                                                   the recommendation process and its output. To further
Helsinki, Finland                                                                                                  validate the instrument, we performed an analysis of the
$ marco.hellmann@stud.uni-due.de (M. Hellmann);                                                                    effects of perceived transparency as measured by our
diana.hernandez-bocanegra@uni-due.de                                                                               new instrument on factors related to trust in the RS and
(D. C. Hernandez-Bocanegra); juergen.ziegler@uni-due.de                                                            effectiveness of the recommendations. An influence of
(J. Ziegler)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative   transparency on users’ trust in the system and on the ac-
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR Workshop Proceedings (CEUR-WS.org)                                        ceptance of the recommendations has been suggested in
prior research, e. g., in [5]. We analyzed these influences    mains an applications, and formulate the measurement
through structural equation modeling to show that the          of the construct transparency using only a single item ("I
construct ’transparency’ as measured by our instrument         understood why the items were recommended to me"),
has in fact the assumed effects.                               this latter being a frequently used item for the evaluation
   Our contribution is thus twofold: we provide a system-      of RS transparency.
atically derived and validated measurement instrument             Consequently, we set out to formulate and validate a
for transparency in RS, and we can show that the differ-       more comprehensive way to measure the perceived trans-
ent transparency factors represented in the questionnaire      parency of a RS, as described in the methods section. The
have an impact on the effectiveness of recommendations         procedure followed the typical procedure for developing
and trust in the system, albeit to different degrees.          psychometric measurement instruments (e.g. [15]):
                                                                  (1) To operationalize a target construct, first a larger
                                                               number of candidate items is formulated and compiled.
2. Related work                                                Here, we draw on the basic structure of RS ([16], [17])
                                                               and typical user questions related to artificial intelligence
Users’ perception of the transparency of a RS may be
                                                               algorithms [18]. Second, items were also derived from a
influenced by several factors. Providing explanations is
                                                               qualitative preliminary study, to further analyze the un-
one important aspect, and some studies have shown that
                                                               certainties in users’ mental models, which can be under-
transparency is positively influenced by the quality of
                                                               stood as the notion that users have about how a system
the explanations given ([5], [6]) and that it is related to
                                                               or a certain type of systems work [19].
control over the system [7]. The effect of systematically
                                                                  (2) We examined the factor structure of the trans-
varied levels and styles of explanation on perceived trans-
                                                               parency construct, which was formed as a reflective fac-
parency has been studied and assessed via questionnaires
                                                               tor in the sense of classical test theory (see also [20]). We
(see e.g. [8], [9], [10]). Also, a positive influence of in-
                                                               considered 4 factors that could group individual ques-
teraction possibilities as well as perceived control on the
                                                               tionnaire items, and that might contribute to variances in
perceived transparency of the system was reported by [5].
                                                               perceived transparency, inspired on dimensions defined
Transparency perception seems to be enhanced both by
                                                               by [18]: Input ("what kind of data does the system learn
the perceived quality of explanations and the perceived
                                                               from"), output ("what kind of output does the system
accuracy or quality of recommendations. In addition, the
                                                               give"), functionality ("how / why does the system make
authors show a positive effect of transparency on trust
                                                               predictions") and interaction (what if / how to be that,
and through trust an indirect effect on purchase inten-
                                                               "what would the system predict if this instance changes
tions. According to [11], this can be related to evaluating
                                                               to..").
the effectiveness of the RS. Moreover, studies suggest
                                                                  (3) The developed measurement instrument was vali-
that perceived transparency promotes satisfaction with
                                                               dated. For this purpose, the framework model of [7] was
the system [12] [7].
                                                               used.
   The influence of personal factors on the perception of
recommender systems has often been investigated in the
light of the general decision-making behavior of users         2.0.1. Mental models and stages of cognitive
(see [13]). [9] showed that individuals with a rational               processing
decision-making style trusted the recommender system           Transparency is frequently discussed like an objective
tested more and rated its efficiency and effectiveness         property of a system. A system becomes only transparent,
higher. Furthermore, they showed that individuals with         however, if its users can understand the transparency-
an intuitive decision-making style rate the quality of         related information, such as explicit explanations, and
explanations better.                                           evaluate it with respect to their goals. The degree of
   To date, however, few measurement tools exist to quan-      comprehension may depend on the mental model users
titatively assess the transparency of a RS as perceived        have about how the system works [21], either based on
by users. [6] surveyed perceived transparency using two        preconceptions, previous experiences with similar sys-
items (“I understand why the system recommended the            tems, or on the interaction with and perception of the
artworks it did”; “I understand what the system bases its      present system [22]. As discussed in [19], mental models
recommendations on”), in the domain of art objects. [14]       that drift considerably from actual system functioning
use a single item ("I did not understand why the items         may result in broadening the "gulfs" described by [22]:
were recommended to me (reverse scale)"), for event rec-       1) the gulf of execution, when the user’s mental model
ommendations. [8] proposed an item that explicitly refers      is inaccurate in terms of how the system can be used to
to explanations: "Which explanation interfaces are con-        execute a task, 2) the gulf of evaluation, when the output
sidered to be transparent by the users?". [5] proposed         (as consequence of a user’s action) differs from what is
an evaluation framework for RS, involving different do-        expected, according to the user’s mental model.
   To bridge these gulfs, users must process the informa-     3.1. Formulation and compilation of
tion provided by the system at different cognitive levels.         questionnaire items
The items of the proposed questionnaire were formulated
to reflect the action levels according to [23]. According     Here, we draw on the basic structure of RS ([16], [17])
to their model, the quality of interaction with the system    and typical user questions to AI algorithms [18]. Candi-
can be described through a cycle of evaluation and exe-       date items were also chosen to cover different stages of
cution. For example, at first, the user may perceive the      the cognitive action cycle described in related work. Sec-
output of the system (e.g., the recommendations and ex-       ond, items were also derived from a qualitative pre-study,
planations), then interpret the information gathered (e.g.,   consisting of interviews with users to further analyze the
how the system works), and thereby evaluate the state of      uncertainties in users’ mental models [19], in regard to
the system (e.g., performance of the system and quality       different commercial RS, like Netflix, Spotify or Amazon.
of the output). As a consequence, the user formulates            A total of 6 interviews were conducted via video call,
goals aiming to achieve with the system or matches their      with voluntary participants. When selecting the inter-
goals with the evaluation of the system (e.g., get more       view partners, care was taken to represent in the sample
accurate or diverse recommendations). The user then           different age groups and experience with Internet appli-
pursues an intention (e.g., improve recommendations),         cations. Students and non-students from different age
which is translated into planning actions (e.g., change       groups (20 to 50 years) were interviewed. Overall, pre-
input), which they finally execute. While this cognitive      vious exposure to recommender systems was equally
cycle is well-known in the HCI field, it has hardly been      strong among all participants. Only one interviewee had
applied in the investigation of transparency for AI-based     lower experience and one interviewee had slightly higher
systems.                                                      experience.
   The authors in [23] assume that there are gaps between        The aim of the interviews was to capture the experi-
the users’ goals and their knowledge about the system,        ence, perception and evaluation as well as possible ques-
and the extent to which system provides descriptions          tions of users regarding the functionality or transparency
about its functioning (gulfs of execution and of evalu-       of recommender systems. The subjects were asked to ex-
ation, as mentioned beforehand). By taking actions to         plain the functionality of RS from their perspective and
bridge those gaps, (making system functions to match          to create a corresponding sketch. Following this, uncer-
goals, and making the output represent a “good concep-        tainties and possible lack of transparency were discussed.
tual model of the system that is easily perceived, inter-     Finally, prototypical explanations from [24] for increas-
preted and evaluated” [23]), system designers may con-        ing the perceived transparency were evaluated by the
tribute to minimize cognitive effort by users [23], and       interview partners. The explanations refer differently to
to decrease the discrepancy between the mental model          the input used, the functionality and the output. In addi-
of the system and its functioning, which may have an          tion, they use different visual forms of representation, e.g.
impact on the perception of transparency, as discussed by     star ratings, profile lines, text. In this way, uncertainties
[19]. We argue then, that a more comprehensive instru-        as well as wishes for more transparency by users could
ment to measure perceived transparency is still needed,       be identified. Each question encountered in interviews
so that such impact can be evaluated not only on the          was directly transformed into one or more items.
basis of general perceived understanding ("I understood          A resulting set of 92 items was collected and discussed
why recommended"), but also on the basis of the extent        by the research team, where linguistic revision and elim-
to which output and functionalities that reflect the con-     ination of redundancies were also performed. The dis-
ceptual model of the system are perceived, interpreted        cussions led to a reduction of the set to 34 items, which
and evaluated by users.                                       were used as input for the online validation described in
                                                              the next section.

3. Methods                                                    3.2. Online user study
To operationalize the construct of perceived transparency,    We conducted a user study to examine item quality and
we conducted the following steps, based on the typical        factor structure, as described below.
procedure for developing measurement instruments (e.g.,
[15]): 1. Formulation and compilation of questionnaire        Participants We recruited 171 participants (89 female,
items. 2. Examination of items quality and factor struc-      mean age 29 and range between 18 and 69) through the
ture, based on an online study. 3. Validation of the mea-     crowdsourcing platform Prolific. We restricted the task
surement instrument. We describe each step below.             to workers in the U.S and the U.K., with an approval rate
                                                              greater than 98%. Participants were rewarded with £1.15.
Time devoted to the survey (in minutes): M=13.2, SD=            Data analysis We performed an exploratory factor
7.33.                                                           analysis (EFA) to further reduce the initial set of items
   We applied a quality check to select participants with       and a Confirmatory Factor Analysis (CFA) to test internal
quality survey responses (we included attention checks          reliability and convergent validity. Furthermore, we eval-
in the survey, e.g. “This is an attention check. Please click   uated discriminant validity of the resultant set of items, in
here the option ‘Disagree’”. We discarded participants          relation to other constructs of the subjective evaluation
with at least 1 failed attention check, or those who did        of RS, for example explanation quality, effectiveness and
not finish the survey. Thus, the responses of 17 of the         overall satisfaction, according to the frameworks defined
192 initial Prolific respondents were discarded and not         by [7] and [5].
paid. 4 additional cases were removed due to suspicious
response behavior, e.g. responding all questions with the
same value within the same page. Thus, 171 cases were           4. Results
used for further analysis.
   The target sample size was chosen to allow performing        4.1. Exploratory Factor Analysis (EFA)
CFA analysis. [25], p. 389, recommend a minimum of              The factor structure was exploratively examined, aiming
n>50 or three times the number of indicators. [26], p.          to further reduce the set of items. A total of 5 EFAs with
102, recommend a minimum of n>100 or five times the             principal axis factor analysis and promax rotation were
number of indicators. Thus, given that we wanted to             performed. First, items that did not have a unique princi-
evaluate a set of 34 items, the sample size was set to a        pal loading or had a principal loading that was too low
minimum of 170 participants.                                    (<.40) were removed. In the first 4 EFAs, 11 items were
                                                                removed based on this criterion. Subsequently, more
Questionnaires We utilized the set of 34 items result-          stringent criteria were used (factor loadings <.50). The
ing of the formulation of items step described above.           guideline values are based on [31]. Thus, 2 items were
Additionally, aiming to further validate the final mea-         removed again. Subsequently, a 6-factorial structure re-
surement instrument (4.3), we used items from [5] to            sulted, with a total of 21 items and a variance resolution
evaluate perception of control (how much they think             of 62.45%. Reliability of the factors fall in the range ‘good’
they can influence the system), interaction adequacy and        to ‘very good’ (.782 to .888), as defined by [32]. The in-
interface adequacy, information sufficiency and recom-          ternal consistency across all items is .867.
mendation accuracy. Furthermore we included items
from [7] to evaluate the perception of system effective-
                                                                4.2. Confirmatory Factor Analysis (CFA)
ness (construct perceived system effectiveness, system is
useful and helps the user to make better choices), and          Following the exploration of the factor structure, the
of trust in the system [27] (constructs trusting beliefs -      result obtained was tested for internal reliability and con-
subconstructs benevolence, integrity, and competence-,          vergent validity using confirmatory analysis. A first CFA
user considers the system to be honest and trusts its rec-      was performed, resulting in 8 items with low factor load-
ommendations; and trusting intentions, user willing to          ings, which were eliminated from the set. Two factors
share information and to follow advice). We used items          were removed in the process because they did not load on
described from [28, 29] for explanation quality, and from       a second-level overall transparency factor. A final CFA
[30] to evaluate decision-making style. All items were          with 4 factors was performed (model fit X2 = 86.997, df =
measured with a 1-5 Likert-scale (1: Strongly disagree, 5:      61, p = .016; X2 /df = 1.426; CFI = .975; TLI = .968; RMSEA
Strongly agree).                                                = .050; SRMR = .047). Reliability across all items is equal
                                                                to .884. This model comprises a final set of four factors
Procedure Participants were asked to choose a service           and 13 items, which are reported in Table 1 along with
from five applications, for which they were required to         factor loadings.
have an active account: Amazon, Spotify, Netflix, Tripad-          The four factors identified can be associated with the
visor, and Booking. Participants were instructed to open        concepts Input, composed of 3 items, Output, also with
the application, browse it at their own discretion. They        3 items, Functionality with 5 items, and Interaction with
were explicitly told to select an item that was relevant        only 2 items. Although the initial item set comprised
to them and which they would actually buy or consume.           questions for all stages of the cognitive action cycle, af-
A real purchasing of items was explicitly not requested.        ter CFA, items related the perception level were only
Participants were asked to return to the survey after com-      left for the factor Functionality, comprising questions
pleting the task and to answer questions about the system       about whether users are aware of transparency-related
they used.                                                      information if provided by the system (e.g.: "The system
                                                                provided information about how well the recommenda-
                                                                tions match my preferences"). This factor covers mostly
Table 1
Test results of internal reliability and convergent validity of our proposed transparency questionnaire.
                                                                                                            Cronbach   Factor
 Factor          Items
                                                                                                            alpha      loading
                 It was clear to me what kind of data the system uses to generate recommendations.                       0.817
 Input           I understood what data was used by the system to infer my preferences.                        0.842     0.901
                 I understood which item characteristics were considered to generate recommendations.                    0.712
                 I understood why the items were recommended to me.                                                      0.771
 Output          I understood why the system determined that the recommended items would suit me.              0.801     0.794
                 I can tell how well the recommendations match my preferences.                                           0.710
                 The system provided information to understand why the items were recommended.                           0.731
                 The system provided information about how the quality of the items was determined.                      0.705
 Functionality   The system provided information about how my preferences were inferred.                       0.847     0.736
                 The system provided information about how well the recommendations match my preferences.                0.696
                 I understood how the quality of the items was determined by the system.                                 0.760
                 I know what actions to perform in the system so that it generates better recommendations                0.896
 Interaction                                                                                                   0.888
                 I know what needs to be changed in order to get better recommendations                                  0.892



perception-related questions. The missing coverage of      fect of higher overall transparency on the perception of
perception-related items in other factors is likely due    the recommendations and the overall system. Further-
to limitations of the systems used for the online study    more, we assumed that transparency is influenced by
which do not, for example, provide access to the data      system-related aspects (accuracy, interaction quality, and
on which recommendations are based, thus preventing        explanation quality) as well as by personal characteristics
users to become aware of input data . The factor Output    such as decision-making behavior) as described in the
comprises items related to the interpretation and evalua-  related work section. Some of these factors can also be
tion stages. The factor Interaction has the smallest scope expected to influence perceived control over the system,
with 2 items and covers only the facets of action planning a construct that may mediate the impact of these factors
or action execution. This factor thus describes whether    on transparency perception. This led us to formulating
users know which actions they would have to perform if     the hypotheses shown in table 3.
they wanted to receive other recommendations.                 In the following, the relationships of the factors in
                                                           the structural equation model are presented (see fig. 1).
4.3. Discriminant validity of                              Only significant paths with standardized path coefficients
                                                           are shown. Indirect effects are only considered for the
       measurement instrument                              transparency factors relevant here. The final model is
We determined discriminant validity of the instrument in shown to have a very good fit: X = 75.767, df = 57, p =
                                                                                               2

relation to other constructs of the subjective evaluation .049; X /df = 1.329; CFI = .980; TLI = .965; RMSEA = .044;
                                                                   2

of RS, for example explanation quality, effectiveness and SRMR = .072. The model is thus adequate to describe the
overall satisfaction, according to the frameworks defined relationships in the data set.
by [7] and [5]. Discriminant validity was assessed using      Influences on perceived transparency of the sys-
inter-construct correlations (see results in table 2). We tem. Transparency with respect to interaction is rated
found that the squared correlations between pairs of con- higher when users are more likely to exhibit an intuitive
structs were all less than the value of average variances decision-making style (0.186, p <.05) and users report
that are shown in the diagonal, representing “a level of higher perceived control (0.293, p <.001). The latter is
appropriate discriminant validity” [5].                    increased by the quality of interaction (0.502, p <.001) and
                                                           explanations (0.341, p <.001). Users thus know better how
                                                           to influence recommendations when they have more op-
5. Structural Equation Model                               portunities to interact with the system, and can gather in-
     (SEM)                                                 formation about the system through explanations as well
                                                           as through ’trial and error’ (indirect: explanation quality
To explore the relation between the transparency factors →control →Transparency-interaction: 0.100, p < .01; in-
assessed by the questionnaire and the effects of perceived teraction quality →control →Transparency-interaction:
transparency on recommendation effectiveness and trust 0.147, p < .001).
in the system, as well as the impact of factors influenc-     Similar observations can be made for functionality.
ing transparency, we set up a Structural Equation Model Again, transparency is rated higher when users are more
(SEM). The model is based on hypotheses we derived likely to exhibit an intuitive decision-making style (0.141,
from existing research that has shown the positive ef- p <.05) and users report higher perceived control (0.261,
Table 2
Inter-construct correlation matrix. Average Variance Extracted (AVE) on the main diagonal; correlations below the diagonal;
quadratic correlations above the diagonal. Target value for AVE ≥.5. p<0.05*, p<0.01**.
                               1         2         3         4         5          6         7        8         9         10        11        12        13        14        15        16        17        18      19
 1 Transp. - input          0.662     0.227     0.235     0.136     0.111     0.023      0.009    0.125     0.053     0.051     0.075     0.040     0.065     0.063     0.159     0.002     0.026     0.081     0.021
 2 Transp. - output         0,476**   0.577     0.231     0.121     0.157     0.039      0.019    0.146     0.187     0.054     0.094     0.291     0.071     0.041     0.239     0.001     0.186     0.240     0.074
 3 Transp. - function       0,485**   0,481**   0.527     0.155     0.246     0.021      0.061    0.341     0.153     0.022     0.114     0.094     0.147     0.153     0.183     0.021     0.127     0.168     0.094
 4 Transp. - interaction    0,369**   0,348**   0,394**   0.799     0.119     0.000      0.055    0.048     0.072     0.000     0.056     0.030     0.060     0.106     0.070     0.008     0.064     0.035     0.038
 5 Control                  0,333**   0,396**   0,496**   0,345**   0.775     0.004      0.018    0.242     0.366     0.052     0.154     0.090     0.153     0.198     0.156     0.007     0.144     0.141     0.118
 6 DM style - rational      0,153*    0,197**   0.146     0.016     0.061     0.454      0.041    0.062     0.004     0.073     0.018     0.032     0.017     0.017     0.026     0.004     0.036     0.058     0.030
 7 DM style - intuitive     0.092     0.138     0,246**   0,234**   0.136     -0,203**   0.502    0.022     0.027     0.000     0.000     0.019     0.006     0.011     0.012     0.012     0.001     0.001     0.005
 8 Explanation quality      0,353**   0,382**   0,584**   0,220**   0,492**   0,248**    0.148    0.557     0.091     0.080     0.265     0.151     0.112     0.100     0.230     0.030     0.199     0.177     0.171
 9 Interaction adequacy     0,230**   0,432**   0,391**   0,269**   0,605**   0.064      0,163*   0,301**   0.791     0.082     0.065     0.048     0.116     0.151     0.101     0.020     0.147     0.118     0.084
 10 Interface adequacy      0,226**   0,232**   0.147     0.008     0,228**   0,270**    -0.001   0,282**   0,286**   0.618     0.123     0.054     0.052     0.043     0.207     0.020     0.108     0.130     0.187
 11 Info. sufficiency       0,273**   0,307**   0,337**   0,236**   0,393**   0.133      -0.001   0,515**   0,254**   0,350**   —         0.104     0.064     0.063     0.182     0.048     0.216     0.188     0.170
 12 Recomm. accuracy        0,201**   0,539**   0,307**   0,174*    0,300**   0,180*     0.137    0,389**   0,220**   0,232**   0,323**   —         0.086     0.062     0.259     0.021     0.187     0.326     0.221
 13 Trust - benevolence     0,254**   0,266**   0,384**   0,245**   0,391**   0.130      0.079    0,334**   0,341**   0,228**   0,252**   0,293**   0.666     0.661     0.366     0.095     0.162     0.332     0.282
 14 Trust - integrity       0,250**   0,202**   0,391**   0,326**   0,445**   0.129      0.106    0,316**   0,388**   0,207**   0,251**   0,249**   0,813**   0.476     0.332     0.088     0.179     0.238     0.272
 15 Trust - competence      0,399**   0,489**   0,428**   0,265**   0,395**   0,162*     0.111    0,480**   0,318**   0,455**   0,427**   0,509**   0,605**   0,576**   0.608     0.030     0.278     0.440     0.358
 16 Trust - share info.     0.040     0.028     0.146     0.091     0.086     0.060      0.109    0,174*    0.141     0.140     0,219**   0.143     0,308**   0,297**   0,174*    —         0.064     0.062     0.078
 17 Trust - follow advice   0,160*    0,431**   0,356**   0,253**   0,379**   0,189*     0.028    0,446**   0,384**   0,328**   0,465**   0,433**   0,402**   0,423**   0,527**   0,252**   —         0.213     0.269
 18 Effectiveness           0,284**   0,490**   0,410**   0,187*    0,375**   0,241**    0.036    0,421**   0,344**   0,360**   0,434**   0,571**   0,576**   0,488**   0,663**   0,249**   0,461**   0.545     0.389
 19 Overall satisfaction    0.145     0,272**   0,306**   0,194*    0,343**   0,174*     0.069    0,414**   0,289**   0,432**   0,412**   0,470**   0,531**   0,522**   0,598**   0,280**   0,519**   0,624**   —




Table 3
Overview of hypothesis addressed in SEM
 Hypotheses                    Reference                   Relevant factor                                                                    Explanation
 Factors influencing perceived transparency (X →perceived transparency)
H-1.1              [6],[5]                Explanation quality      Comprehensibility and contribution of the explanations to the understanding of the system
H-1.2              [5]                    Accuracy                 Match between the items and the user’s preferences
H-1.3              [5] (indirect effect) Interaction quality       Possibilities of adaptation and feedback
H-1.4              [5]                    Control                  Possibilities of personalization
H-1.5              [10]                   decision-making styles Rational / Intuitive
 Effects of perceived transparency (perceived transparency →Y).
H-2.1              [3],[14]               Trust                    Trusting beliefs and intentions
H-2.2              [11]                   Effectiveness            Usefulness of the system
H-2.3              [14], [12]             Overall satisfaction     Satisfaction with the system




p <.001). The quality of interaction, promoting perceived                                                     .001). It both indirectly and directly (0.416, p <.001) in-
control, has a positive effect on transparency concern-                                                       creases the transparency of the functionality when users
ing how the system works (indirect: interaction qual-                                                         rate explanations positive (indirect: explanation quality
ity →control →Transparency-functionality: 0.131, p <                                                          →control →Transparency-functionality: 0.089, p < .01).




Figure 1: Structural model. p<0.05*, p<0.01**, p<0.001***
   The input is perceived as more transparent the better follow the advice of the recommendation system is in-
users can interact with the system. Thus, here again, creased (indirect: Transparency-functionality →Trust-
perceived control has a direct positive effect (0.200, p competence →Trust-follow advice: 0.071, p <.05). Thus,
<.01). The quality of the interaction thus repeatedly has it is clear that an understanding of the internal mecha-
an indirect effect (indirect: interaction quality →control nisms of recommender systems leads to trusting beliefs
→Transparency-input: 0.100, p < .05). Similarly to what and thus to trusting actions and a positive overall evalu-
has already been shown with regard to functionality, the ation.
quality of the explanations also has a direct, positive         Transparency with regard to the input has a negative
effect on transparency of the input (0.209, p <.01) in addi- effect. If users can see which data is used, this has a
tion to the indirect effect (explanation quality →control negative effect on the willingness to follow the advice
→Transparency-input: 0.068, p < .05).                        of the recommendation system in this model (-0.144, p
   The transparency of the output shows how well users <.05). Thus, this shows a certain counterbalance to a
can assess why a recommendation is made or should transparent functionality, possibly triggered by too much
match the user’s preferences. This is directly increased information or a general distrust regarding data privacy.
by the quality of the interaction with the system (0.311, This shows that transparency can also have negative con-
p <.001), i.e. when possibilities are offered or used to sequences. However, these turn out to be comparatively
indicate one’s own preferences. On the other hand, there small. Transparent output again has strong positive ef-
are no direct or indirect influences of the explanations. fects. If users can understand why the recommended
Instead, the accuracy of the recommendation has a posi- item matches their preferences, this increases trust in
tive influence on the transparency of the output (0.454, the competence of the system (0.194, p <.01). Indirectly,
p <.001). Accordingly, the output is easier to understand transparency also promotes overall satisfaction via this
if it is rated as suitable. Unsuitable recommendations increase in trust (indirect: Transparency-output →Trust-
would thus be more difficult for the user to comprehend. competence →overall satisfaction: 0.055, p <.05). Further-
   As shown, transparency is positively influenced by the more, the increase in transparency indirectly (indirect:
quality of explanations, accuracy of recommendations, Transparency-output →Trust-competence →effectiveness:
opportunities for interaction, and perceived control. Hy- 0.056, p <.05), but also directly (0.127, p <.05), contributes
potheses 1.1, 1.2, 1.3 and 1.4 can thus be considered con- to a higher rating of the system’s effectiveness. Indirectly,
firmed. The influence of the decision-making style is this in turn increases overall satisfaction with the sys-
limited to the intuitive style. Therefore, hypothesis 1.5 tem (indirect: Transparency-output →Trust-competence
can only be partially confirmed.                             →effectiveness →overall satisfaction: 0.019, p <.05). Ad-
   Effects of perceived transparency of the system. ditionally, it increases the willingness to follow the ad-
No effects can be observed for transparency with regard vice of the recommendation system when users better
to interaction. It is possible that effects exist on factors understand the output (direct: 0.268, p <.001; indirect:
that were not surveyed in this study. For the other trans- Transparency-output →Trust-competence →Trust-follow
parency factors, however, significant positive effects can advice: 0.073, p <.05).
be observed.                                                    As shown, the transparency factors have clear effects
   Transparency regarding the functionality has the strongeston trust in the system, evaluation of effectiveness and on
and most diverse effect. If users can understand the the overall satisfaction. Therefore hypotheses 2.1, 2.2 and
internal mechanisms, they trust the recommendation 2.3 can be considered confirmed. Thus, perceived trans-
system more. Direct positive effects can be observed parency can also be viewed as a mediator of perceived
on benevolence (0.248, p <.01) and trust in the compe- control over the system, user characteristics, and other
tence (0.188, p <.05) of the system. Indirectly, such trans- qualities of the system. The importance of the different
parency thus contributes to a better evaluation of the sys- factors of perceived transparency can be shown by the
tem’s effectiveness (indirect: Transparency-functionality differentiated assessment.
→Trust-benevolence →effectiveness: 0.074, p <.01; in-
direct: Transparency-functionality →Trust-competence
→effectiveness: 0.055, p <.05). Via the increase in ef- 6. Discussion
fectiveness, overall satisfaction with the system is also
                                                             We aimed at developing a measurement tool that is specif-
promoted (indirect: Transparency-functionality →Trust-
                                                             ically focused on capturing the transparency of RS as
benevolence →effectiveness →overall satisfaction: 0.024,
                                                             perceived by users. In an initial interview study, con-
p <.05). Via the increase in perceived benevolence, the
                                                             cerns and uncertainties in relation to RS transparency
willingness to share information about oneself is also
                                                             were identified, which are well in line with the general
increased (indirect: Transparency-functionality →Trust-
                                                             AI-related questions compiled by [18]. This indicates that
benevolence →Trust-information sharing: 0.072, p <.05).
                                                             the scheme developed by these authors can be a useful
Moreover, via trust in competence, the willingness to
starting point for developing measures also for specific      system state to their own goal to decide about the next
systems such as RS, which address a wider range of users      action, a stage defined by Norman [22] as evaluation. The
beyond more expert users as in the original work by [18].     item “I can tell how well the recommendations match my
   Our confirmatory analyses confirmed our hypothesis         preferences” from our scale relates to this stage, by assess-
that subjective perceived transparency can be charac-         ing explicitly the correspondence of the recommended
terized by the factors: input, output, functionality and      items with one’s own preferences. Items from the inter-
interaction. Adequate reliability as well as convergent       action group ("I know what needs to be changed in order
and divergent validity was demonstrated, which indicates      to get better recommendations") can be associated with
that identified transparency factors can clearly be consid-   intent formation and the downstream path in the action
ered as independent, and they can be distinguished from       cycle.
each other and also from other factors of the subjective         As discussed by [23], designers can contribute to close
evaluation of RS (trust, effectiveness, etc.).                the gap between mental models (users’ idea on how the
   The identified factors in our analysis reflect the basic   system works [22]), and the actual system’s functioning,
components of RS as defined by [16], i.e., the input (what    by providing output and functionalities reflecting an ad-
data does the system use?), the functionality (how and        equate system’s conceptual model, that can be “easily
why is an item recommended?), and the output (why and         perceived, interpreted and evaluated” [23]. The above
how well does an item match one’s own preferences?).          can in turn impact perceived transparency [19]. Conse-
Additionally, the factor interaction could be extracted.      quently, our instrument can contribute to a more compre-
This factor is consistent with the category interaction       hensive assessment of subjective perceived transparency,
(what if / how to be that, what has to be changed for a       by going beyond the one-dimensional construct address-
different prediction?) of the prototypical questions to AI,   ing a general "why-recommended" understanding, and
formulated by [18].                                           assessing instead, the extent to which output and func-
   Furthermore, the final set of items can also be consid-    tionalities reflecting the system’s conceptual model are
ered through the lens of the different interaction stages     in fact perceived, interpreted and evaluated.
as defined by Norman [22]. In our examined context, for
example, the stage perception relates to the presence of
system functions that explicitly reveal information on        7. Conclusion and Outlook
how the recommendations were derived, e.g. through
                                                              The instrument developed can be seen as a first step
explanations. Items of the type “The system provided in-
                                                              towards assessing transparency in RS in a more com-
formation about how. . . ”, grouped under the factor func-
                                                              prehensive and cognitively meaningful manner. Overall,
tionality, could be validated, indicating that making in-
                                                              reliability and construct validity of the developed mea-
formation about the recommendation process observable
                                                              surement instrument could be confirmed, identifying
is a prerequisite for further cognitive processing. This
                                                              four transparency factors (input, output, functionality,
indicates that the evaluation of perceived transparency
                                                              interaction) and resulting in a 13 item questionnaire (see
should consider not only items related to users’ inter-
                                                              Table 1). The expected influence of system aspects and
pretation (i.e. “user understands”, as it has traditionally
                                                              personal characteristics on the transparency factors could
been evaluated in RS research), but also items related to
                                                              be demonstrated for the developed factors with the excep-
the presence and perception of transparency-related sys-
                                                              tion of transparency regarding interaction, which may be
tem functions (e.g. “user notices that the system actually
                                                              due to the limited interaction possibilities in the applica-
explains”).
                                                              tions used by participants. Furthermore, we could show
   Once the user perceives a system output (e.g. the fea-
                                                              the impact of different transparency aspects on trust in
tures of a recommendation or an explanation), the next
                                                              the system and on the overall evaluation of the system.
stage is the interpretation of the system state, in which
                                                                 The differentiated assessment of transparency makes it
users use their knowledge to interpret the new system
                                                              possible to elaborate the significance of individual aspects
state [22]: in our context, to assess the recommendation
                                                              of transparency in more detail than it was possible with
inferred by the system. Our validated final set includes
                                                              previous measurement instruments. Thus, it could be
items which are related to the interpretation stage, and
                                                              shown that transparency with respect to functioning
are of the type “I understood what data was used . . . ”,
                                                              and output is of greater importance for the dependent
which can be grouped under the factor input), or “I under-
                                                              variables considered than transparency with respect to
stood how the system determined . . . ”, grouped under the
                                                              interaction and input.
factor functionality of our developed scale. This group of
                                                                 The findings obtained here should be considered under
items is also consistent with the definition of perceived
                                                              the following limitations. Real systems were tested for
transparency by [5], which focuses on the perceived un-
                                                              this online study. On the one hand, this allowed us to ob-
derstanding of the inner processes of RS.
                                                              tain users’ views with respect to applications they were
   In a subsequent stage, users compare the interpreted
familiar with and that were fully functional. On the other        tended Abstracts on Human Factors in Computing
hand, no controlled manipulation of influencing variables         Systems (2002) 830–831.
was possible. We also did not analyze the differences be- [4] N. Tintarev, J. Masthoff, Explaining Recommenda-
tween the systems which would have required a larger              tions: Design and Evaluation, in: F. Ricci, L. Rokach,
sample, also addressing questions outside the scope of            B. Shapira (Eds.), Recommender Systems Hand-
the present study. An effect of explanations could only           book, Springer, 2015, pp. 353–382. URL: https://doi.
be shown for the factors input and functionality, partly          org/10.1007/978-1-4899-7637-6_10. doi:10.1007/
mediated by perceived control, which may also be due to           978-1-4899-7637-6_10.
the limited explanations provided by the systems used. [5] P. Pu, L. Chen, R. Hu, A user-centric evaluation
In addition, only systems that were already known to the          framework for recommender systems, in: Proceed-
users were tested. Thus, a stronger expression of trust           ings of the fifth ACM conference on Recommender
and overall more positive evaluation might be expected.           systems - RecSys 11, 2011, pp. 157–164.
In terms of social desirability or self-overestimation, per- [6] H. Cramer, V. Evers, S. Ramlal, M. van Someren,
ceived understanding might be valued higher than actual           L. Rutledge, N. Stash, L. Aroyo, B. Wielinga, The
understanding would lead one to expect.                           effects of transparency on trust in and acceptance
    Follow-up research should be guided by the limitations        of a content-based art recommender, User Model.
mentioned here for further validation of the measure-             User-Adap. Inter. 18 (2008) 455–496.
ment instrument. The degree of perceived transparency         [7] B. P. Knijnenburg, M. C. Willemsen, Z. Gantner,
should also be compared with actual, genuine understand-          H. Soncu, C. Newell, Explaining the user experience
ing using parallel qualitative methods [6]. Furthermore,          of recommender systems, in: User Modeling and
it is important to check to what extent the questionnaire         User-Adapted Interaction, 2012, p. 441–504.
is also able to evaluate systems that are unknown to          [8] F. Gedikli, D. Jannach, M. Ge, How should i ex-
the users. Assessing unfamiliar systems or specifically           plain? a comparison of different explanation types
designed prototypes would provide the opportunity to              for recommender systems, International Journal of
systematically vary components of the recommender                 Human-Computer Studies 72 (2014) 367–382.
system (input, functionality, output), the quality of expla- [9] D. C. Hernandez-Bocanegra, J. Ziegler,              Ex-
nations, and/or the interaction possibilities [9]. Thus, the      plaining review-based recommendations: Effects
influence of these features on the transparency factors           of profile transparency, presentation style and
and likewise possible differences in their manifestation          user characteristics, Journal of Interactive Me-
should be further explored.                                       dia 19 (2020) 181–200. doi:https://doi.org/10.
    Overall, a first validated version of a questionnaire         1515/icom-2020-0021.
to assess perceived transparency can be presented. The [10] D. C. Hernandez-Bocanegra, J. Ziegler, Effects
findings presented here also provide starting points for          of interactivity and presentation on review-based
research into further elucidating the multi-faceted con-          explanations for recommendations, in: Human-
cept of transparency.                                             Computer Interaction – INTERACT 2021, Springer
                                                                  International Publishing, 2021, pp. 597–618.
                                                             [11] N. Tintarev, J. Masthoff, Evaluating the effective-
Acknowledgments                                                   ness of explanations for recommender systems,
                                                                  User Modeling and User-Adapted Interaction 22
This work was funded by the German Research Founda-
                                                                  (2012) 399–439.
tion (DFG) under grant No. GRK 2167, Research Training
                                                             [12] C.-H. Tsai, P. Brusilovsky, Explaining recommenda-
Group “User-Centred Social Media”.
                                                                  tions in an interactive hybrid social recommender,
                                                                  in: 24th International Conference on Intelligent
References                                                        User Interfaces (IUI 19), 2019, pp. 391–396.
                                                             [13] A. Jameson, M. C. Willemsen, A. Felfernig,
  [1] N. Bostrom, E. Yudkowski, The Ethics of Artificial          M. de Gemmis, P. Lops, G. Semeraro, L. Chen, Hu-
       Intelligence, in: W. Ramsey, K. Frankish (Eds.), Cam-      man decision making and recommender systems,
       bridge Handbook of Artificial Intelligence, Cam-           Recommender Systems Handbook (2015) 611–648.
       bridge University Press, 2014, pp. 316–334.           [14] S. Dooms, T. D. Pessemier, L. Martens, A user-
  [2] U. S. a. H. S. C. (SHS), Recommendation on the              centric evaluation of recommender algorithms for
       Ethics of Artificial Intelligence, Technical Report,       an event recommendation system, in: Proceedings
       UNESCO, 2021. URL: https://unesdoc.unesco.org/             of the RecSys 2011: Workshop on Human Deci-
       ark:/48223/pf0000379920.page=14.                           sion Making in Recommender Systems (Decisions
  [3] R. Sinha, K. Swearingen, The role of transparency           RecSys 11) and User-Centric Evaluation of Recom-
       in recommender systems, CHI EA ’02 CHI ’02 Ex-             mender Systems and Their Interfaces - 2 (UCERSTI
     2) affiliated with the 5th ACM Conference on Rec-            recommender systems, in: Proceedings of 24th
     ommender Systems (RecSys 2011), 2011, p. 67–73.              International Conference on Intelligent User Inter-
[15] M. Bühner, Einführung in die Test- und Fragebo-              faces (IUI 19), ACM, 2019, p. 379–390.
     genkonstruktion, Pearson Studium, Aufl. München,        [29] T. Donkers, T. Kleemann, J. Ziegler, Explaining
     2011.                                                        recommendations by means of aspect-based trans-
[16] D. Jannach, M. Zanker, A. Felfernig, G. Friedrich,           parent memories, in: Proceedings of the 25th Inter-
     Recommender Systems. An introduction, Cam-                   national Conference on Intelligent User Interfaces,
     bridge University Press, 2011.                               2020, p. 166–176.
[17] J. Lu, Q. Zhang, G. quan Zhang, Recommender Sys-        [30] K. Hamilton, S.-I. Shih, S. Mohammed, The devel-
     tems. Advanced Developments, World Scientific                opment and validation of the rational and intuitive
     Publishing, 2021.                                            decision styles scale, Journal of Personality Assess-
[18] Q. V. Liao, D. Gruen, S. Miller, Questioning the             ment 98 (2016) 523–535.
     ai: Informing design practices for explainable ai       [31] A. G. Yong, S. Pearce, A beginner’s guide to factor
     user experiences, Proceedings of the 2020 CHI                analysis: Focusing on exploratory factor analysis,
     Conference on Human Factors in Computing Sys-                Tutorials in Quantitative Methods for Psychology
     tems 9042 (2020) 1–15. doi:https://doi.org/10.               9 (2013) 79–94.
     1145/3313831.3376590.                                   [32] R. A. Peterson, A meta-analysis of cronbach’s co-
[19] T. Ngo, J. Kunkel, J. Ziegler, Exploring mental mod-         efficient alpha, Journal of Consumer Research 21
     els for transparent and controllable recommender             (1994) 381–391.
     systems: A qualitative study, in: Proceedings of the
     28th ACM Conference on User Modeling, Adapta-
     tion and Personalization UMAP 20, 2020, pp. 183–
     191.
[20] D. Borsboom, G. J. Mellenbergh, J. van Heerden,
     The theoretical status of latent variables, Psycho-
     logical Review 110 (2003) 203–219.
[21] J. Kunkel, T. Ngo, J. Ziegler, N. Krämer, Iden-
     tifying Group-Specific Mental Models of Recom-
     mender Systems: A Novel Quantitative Approach,
     in: Human-Computer Interaction – INTERACT
     2021, Lecture Notes in Computer Science, Springer
     International Publishing, Cham, 2021, pp. 383–404.
     doi:10.1007/978-3-030-85610-6_23.
[22] D. A. Norman, Some Observations on Mental Mod-
     els, In Mental Models, Dedre Gentner and Albert
     L. Stevens (Eds.). Psychology Press, New York, NY,
     USA, 1983.
[23] E. Hutchins, J. D. Hollan, D. A. Norman, Direct
     manipulation interfaces, Human-Computer Inter-
     action 1 (1985) 311–338.
[24] Y. Zhang, X. Chen, Explainable recommendation:
     A survey and new perspectives, Foundations and
     Trends in Information Retrieval 14 (2020) 1–101.
[25] K. Backhaus, B. Erichson, R. Weiber, Multivari-
     ate Analysemethoden. Eine anwendungsorientierte
     Einführung, Berlin 13. Aufl, 2011.
[26] J. F. Hair, W. C. Black, B. J. Babin, R. E. Anderson,
     Multivariate data analysis. A global perspective,
     Boston 7. Aufl., 2010.
[27] D. H. McKnight, V. Choudhury, C. Kacmar, Develop-
     ing and validating trust measures for e-commerce:
     An integrative typology, in: Information Systems
     Research, volume 13, 2002.
[28] P. Kouki, J. Schaffer, J. Pujara, J. O’Donovan,
     L. Getoor, Personalized explanations for hybrid