<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Development of an Instrument for Measuring Users' Perception of Transparency in Recommender Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Hellmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diana C. Hernandez-Bocanegra</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jürgen Ziegler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Duisburg-Essen</institution>
          ,
          <addr-line>Forsthausweg 2, 47057 Duisburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Transparency is increasingly seen as a critical requirement for achieving the goal of human-centered AI systems in general and also, specifically, recommender systems (RS). However, defining and operationalizing the concept is still dificult, due to its multi-faceted nature. Currently, there are hardly any measurement instruments to adequately assess the perceived transparency of RS in user studies. Thus, we present the development of a measurement instrument that aims at capturing perceived transparency as a multidimensional construct. The results of our validation show that transparency can be distinguished with respect to input (what data does the system use?), functionality (how and why is an item recommended?), output (why and how well does an item fit one's preferences?), and interaction (what needs to be changed for a diferent prediction?). The study is intended as a first iteration in the development of a reliable and fully validated measurement tool for assessing transparency in RS.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Recommender systems</kwd>
        <kwd>transparency</kwd>
        <kwd>explanations</kwd>
        <kwd>user study</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        data, the recommendation algorithm, or features of the
recommended items may be exposed to the user,
transThe request for more transparency in intelligent systems parency as a user-centric quality can only be assessed
has become steadily louder in recent years, formulated by measuring users’ perception and understanding of
in academic research as well as in most public and cor- those system aspects that are relevant for their decision
porate policies concerning the ethics of artificial intelli- making and trust in the system [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
gence [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Although there is now broad agreement that Despite the acclaimed relevance of transparency in RS,
transparency is of high relevance for developing human- the instruments available for measuring it from a user
centred AI systems, the concept is still elusive due to perspective are still very limited. Some instruments for
its multi-faceted nature and the diferent objectives it is assessing overall recommendation quality include a small
intended to serve. The questions raised when asking for number of items related to perceived transparency [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
transparency include, for example, the system aspects but these measures still seem far from covering the
multithat should be made transparent, or the riskiness of an ple facets involved. To the best of our knowledge, there is
AI function at an individual or societal level. no instrument focusing specifically on RS transparency.
      </p>
      <p>
        A need for greater transparency has also been noted A further shortcoming of existing instruments is the lack
for recommender systems (RS), a frequent, user-facing of suficiently considering the cognitive processes
intype of AI-driven technology, to better support users’ in volved in users’ understanding of recommendations and
their decision-making and to avoid potentially negative in their ability to influence the system according to their
consequences, e. g. users getting trapped in filter bub- needs if such influence is possible.
bles [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Various methods have been proposed to this In this paper, we describe steps towards a more
holisend, ranging from disclosing the user profile on which a tic and cognitively grounded psychometric instrument
recommendation is based to providing explicit explana- for measuring perceived transparency in RS. We first
tions. Still, the multi-facetedness of the concept makes explain the questionnaire development process that
reit dificult to design efective transparent RS. A central sulted in a validated set of items specifically focused on
question that must be solved to this end is how trans- RS transparency. The candidate items for this
developparency of a RS can be measured and evaluated. While ment were chosen to reflect the diferent steps involved
diferent aspects of the system, for example, the input in cognitively processing the information provided about
the recommendation process and its output. To further
validate the instrument, we performed an analysis of the
efects of perceived transparency as measured by our
new instrument on factors related to trust in the RS and
efectiveness of the recommendations. An influence of
transparency on users’ trust in the system and on the
acceptance of the recommendations has been suggested in
Joint Proceedings of the ACM IUI Workshops 2022, March 2022,
Helsinki, Finland
$ marco.hellmann@stud.uni-due.de (M. Hellmann);
diana.hernandez-bocanegra@uni-due.de
(D. C. Hernandez-Bocanegra); juergen.ziegler@uni-due.de
(J. Ziegler)
      </p>
      <p>
        © 2022 Copyright for this paper by its authors. Use permitted under Creative
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmmUoRns LWiceonsrekAstthribouptionP4r.0oIncteerenadtiionnagl s(CC(CBYE4U.0)R.-WS.org)
prior research, e. g., in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We analyzed these influences mains an applications, and formulate the measurement
through structural equation modeling to show that the of the construct transparency using only a single item ("I
construct ’transparency’ as measured by our instrument understood why the items were recommended to me"),
has in fact the assumed efects. this latter being a frequently used item for the evaluation
      </p>
      <p>Our contribution is thus twofold: we provide a system- of RS transparency.
atically derived and validated measurement instrument Consequently, we set out to formulate and validate a
for transparency in RS, and we can show that the difer- more comprehensive way to measure the perceived
transent transparency factors represented in the questionnaire parency of a RS, as described in the methods section. The
have an impact on the efectiveness of recommendations procedure followed the typical procedure for developing
and trust in the system, albeit to diferent degrees. psychometric measurement instruments (e.g. [15]):
(1) To operationalize a target construct, first a larger
number of candidate items is formulated and compiled.
2. Related work Here, we draw on the basic structure of RS ([16], [17])
and typical user questions related to artificial intelligence
algorithms [18]. Second, items were also derived from a
qualitative preliminary study, to further analyze the
uncertainties in users’ mental models, which can be
understood as the notion that users have about how a system
or a certain type of systems work [19].</p>
      <p>(2) We examined the factor structure of the
transparency construct, which was formed as a reflective
factor in the sense of classical test theory (see also [20]). We
considered 4 factors that could group individual
questionnaire items, and that might contribute to variances in
perceived transparency, inspired on dimensions defined
by [18]: Input ("what kind of data does the system learn
from"), output ("what kind of output does the system
give"), functionality ("how / why does the system make
predictions") and interaction (what if / how to be that,
"what would the system predict if this instance changes
to..").</p>
      <p>
        (3) The developed measurement instrument was
validated. For this purpose, the framework model of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] was
used.
      </p>
      <p>
        Users’ perception of the transparency of a RS may be
influenced by several factors. Providing explanations is
one important aspect, and some studies have shown that
transparency is positively influenced by the quality of
the explanations given ([
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) and that it is related to
control over the system [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The efect of systematically
varied levels and styles of explanation on perceived
transparency has been studied and assessed via questionnaires
(see e.g. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). Also, a positive influence of
interaction possibilities as well as perceived control on the
perceived transparency of the system was reported by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Transparency perception seems to be enhanced both by
the perceived quality of explanations and the perceived
accuracy or quality of recommendations. In addition, the
authors show a positive efect of transparency on trust
and through trust an indirect efect on purchase
intentions. According to [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], this can be related to evaluating
the efectiveness of the RS. Moreover, studies suggest
that perceived transparency promotes satisfaction with
the system [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        The influence of personal factors on the perception of
recommender systems has often been investigated in the
light of the general decision-making behavior of users 2.0.1. Mental models and stages of cognitive
(see [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]). [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] showed that individuals with a rational processing
decision-making style trusted the recommender system Transparency is frequently discussed like an objective
tested more and rated its eficiency and efectiveness property of a system. A system becomes only transparent,
higher. Furthermore, they showed that individuals with however, if its users can understand the
transparencyan intuitive decision-making style rate the quality of related information, such as explicit explanations, and
explanations better. evaluate it with respect to their goals. The degree of
      </p>
      <p>
        To date, however, few measurement tools exist to quan- comprehension may depend on the mental model users
titatively assess the transparency of a RS as perceived have about how the system works [21], either based on
by users. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] surveyed perceived transparency using two preconceptions, previous experiences with similar
sysitems (“I understand why the system recommended the tems, or on the interaction with and perception of the
artworks it did”; “I understand what the system bases its present system [22]. As discussed in [19], mental models
recommendations on”), in the domain of art objects. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] that drift considerably from actual system functioning
use a single item ("I did not understand why the items may result in broadening the "gulfs" described by [22]:
were recommended to me (reverse scale)"), for event rec- 1) the gulf of execution, when the user’s mental model
ommendations. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed an item that explicitly refers is inaccurate in terms of how the system can be used to
to explanations: "Which explanation interfaces are con- execute a task, 2) the gulf of evaluation, when the output
sidered to be transparent by the users?". [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed (as consequence of a user’s action) difers from what is
an evaluation framework for RS, involving diferent do- expected, according to the user’s mental model.
      </p>
      <p>To bridge these gulfs, users must process the informa- 3.1. Formulation and compilation of
tion provided by the system at diferent cognitive levels. questionnaire items
The items of the proposed questionnaire were formulated
to reflect the action levels according to [ 23]. According Here, we draw on the basic structure of RS ([16], [17])
to their model, the quality of interaction with the system and typical user questions to AI algorithms [18].
Candican be described through a cycle of evaluation and exe- date items were also chosen to cover diferent stages of
cution. For example, at first, the user may perceive the the cognitive action cycle described in related work.
Secoutput of the system (e.g., the recommendations and ex- ond, items were also derived from a qualitative pre-study,
planations), then interpret the information gathered (e.g., consisting of interviews with users to further analyze the
how the system works), and thereby evaluate the state of uncertainties in users’ mental models [19], in regard to
the system (e.g., performance of the system and quality diferent commercial RS, like Netflix, Spotify or Amazon.
of the output). As a consequence, the user formulates A total of 6 interviews were conducted via video call,
goals aiming to achieve with the system or matches their with voluntary participants. When selecting the
intergoals with the evaluation of the system (e.g., get more view partners, care was taken to represent in the sample
accurate or diverse recommendations). The user then diferent age groups and experience with Internet
applipursues an intention (e.g., improve recommendations), cations. Students and non-students from diferent age
which is translated into planning actions (e.g., change groups (20 to 50 years) were interviewed. Overall,
preinput), which they finally execute. While this cognitive vious exposure to recommender systems was equally
cycle is well-known in the HCI field, it has hardly been strong among all participants. Only one interviewee had
applied in the investigation of transparency for AI-based lower experience and one interviewee had slightly higher
systems. experience.</p>
      <p>The authors in [23] assume that there are gaps between The aim of the interviews was to capture the
experithe users’ goals and their knowledge about the system, ence, perception and evaluation as well as possible
quesand the extent to which system provides descriptions tions of users regarding the functionality or transparency
about its functioning (gulfs of execution and of evalu- of recommender systems. The subjects were asked to
exation, as mentioned beforehand). By taking actions to plain the functionality of RS from their perspective and
bridge those gaps, (making system functions to match to create a corresponding sketch. Following this,
uncergoals, and making the output represent a “good concep- tainties and possible lack of transparency were discussed.
tual model of the system that is easily perceived, inter- Finally, prototypical explanations from [24] for
increaspreted and evaluated” [23]), system designers may con- ing the perceived transparency were evaluated by the
tribute to minimize cognitive efort by users [ 23], and interview partners. The explanations refer diferently to
to decrease the discrepancy between the mental model the input used, the functionality and the output. In
addiof the system and its functioning, which may have an tion, they use diferent visual forms of representation, e.g.
impact on the perception of transparency, as discussed by star ratings, profile lines, text. In this way, uncertainties
[19]. We argue then, that a more comprehensive instru- as well as wishes for more transparency by users could
ment to measure perceived transparency is still needed, be identified. Each question encountered in interviews
so that such impact can be evaluated not only on the was directly transformed into one or more items.
basis of general perceived understanding ("I understood A resulting set of 92 items was collected and discussed
why recommended"), but also on the basis of the extent by the research team, where linguistic revision and
elimto which output and functionalities that reflect the con- ination of redundancies were also performed. The
disceptual model of the system are perceived, interpreted cussions led to a reduction of the set to 34 items, which
and evaluated by users. were used as input for the online validation described in
the next section.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Methods</title>
      <p>3.2. Online user study
To operationalize the construct of perceived transparency, We conducted a user study to examine item quality and
we conducted the following steps, based on the typical factor structure, as described below.
procedure for developing measurement instruments (e.g.,
[15]): 1. Formulation and compilation of questionnaire Participants We recruited 171 participants (89 female,
items. 2. Examination of items quality and factor struc- mean age 29 and range between 18 and 69) through the
ture, based on an online study. 3. Validation of the mea- crowdsourcing platform Prolific. We restricted the task
surement instrument. We describe each step below. to workers in the U.S and the U.K., with an approval rate
greater than 98%. Participants were rewarded with £1.15.</p>
      <p>Time devoted to the survey (in minutes): M=13.2, SD= Data analysis We performed an exploratory factor
7.33. analysis (EFA) to further reduce the initial set of items</p>
      <p>
        We applied a quality check to select participants with and a Confirmatory Factor Analysis (CFA) to test internal
quality survey responses (we included attention checks reliability and convergent validity. Furthermore, we
evalin the survey, e.g. “This is an attention check. Please click uated discriminant validity of the resultant set of items, in
here the option ‘Disagree’”. We discarded participants relation to other constructs of the subjective evaluation
with at least 1 failed attention check, or those who did of RS, for example explanation quality, efectiveness and
not finish the survey. Thus, the responses of 17 of the overall satisfaction, according to the frameworks defined
192 initial Prolific respondents were discarded and not by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
paid. 4 additional cases were removed due to suspicious
response behavior, e.g. responding all questions with the
same value within the same page. Thus, 171 cases were 4. Results
used for further analysis.
      </p>
      <p>The target sample size was chosen to allow performing 4.1. Exploratory Factor Analysis (EFA)
CFA analysis. [25], p. 389, recommend a minimum of The factor structure was exploratively examined, aiming
n&gt;50 or three times the number of indicators. [26], p. to further reduce the set of items. A total of 5 EFAs with
102, recommend a minimum of n&gt;100 or five times the principal axis factor analysis and promax rotation were
number of indicators. Thus, given that we wanted to performed. First, items that did not have a unique
princievaluate a set of 34 items, the sample size was set to a pal loading or had a principal loading that was too low
minimum of 170 participants. (&lt;.40) were removed. In the first 4 EFAs, 11 items were
removed based on this criterion. Subsequently, more
stringent criteria were used (factor loadings &lt;.50). The
guideline values are based on [31]. Thus, 2 items were
removed again. Subsequently, a 6-factorial structure
resulted, with a total of 21 items and a variance resolution
of 62.45%. Reliability of the factors fall in the range ‘good’
to ‘very good’ (.782 to .888), as defined by [ 32]. The
internal consistency across all items is .867.</p>
      <p>Questionnaires We utilized the set of 34 items
resulting of the formulation of items step described above.</p>
      <p>
        Additionally, aiming to further validate the final
measurement instrument (4.3), we used items from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to
evaluate perception of control (how much they think
they can influence the system), interaction adequacy and
interface adequacy, information suficiency and
recommendation accuracy. Furthermore we included items
from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to evaluate the perception of system efective- 4.2. Confirmatory Factor Analysis (CFA)
ness (construct perceived system efectiveness , system is
useful and helps the user to make better choices), and Following the exploration of the factor structure, the
of trust in the system [27] (constructs trusting beliefs - result obtained was tested for internal reliability and
consubconstructs benevolence, integrity, and competence-, vergent validity using confirmatory analysis. A first CFA
user considers the system to be honest and trusts its rec- was performed, resulting in 8 items with low factor
loadommendations; and trusting intentions, user willing to ings, which were eliminated from the set. Two factors
share information and to follow advice). We used items were removed in the process because they did not load on
described from [28, 29] for explanation quality, and from a second-level overall transparency factor. A final CFA
[30] to evaluate decision-making style. All items were with 4 factors was performed (model fit X 2 = 86.997, df =
measured with a 1-5 Likert-scale (1: Strongly disagree, 5: 61, p = .016; X2/df = 1.426; CFI = .975; TLI = .968; RMSEA
Strongly agree). = .050; SRMR = .047). Reliability across all items is equal
to .884. This model comprises a final set of four factors
Procedure Participants were asked to choose a service and 13 items, which are reported in Table 1 along with
from five applications, for which they were required to factor loadings.
have an active account: Amazon, Spotify, Netflix, Tripad- The four factors identified can be associated with the
visor, and Booking. Participants were instructed to open concepts Input, composed of 3 items, Output, also with
the application, browse it at their own discretion. They 3 items, Functionality with 5 items, and Interaction with
were explicitly told to select an item that was relevant only 2 items. Although the initial item set comprised
to them and which they would actually buy or consume. questions for all stages of the cognitive action cycle,
afA real purchasing of items was explicitly not requested. ter CFA, items related the perception level were only
Participants were asked to return to the survey after com- left for the factor Functionality, comprising questions
pleting the task and to answer questions about the system about whether users are aware of transparency-related
they used. information if provided by the system (e.g.: "The system
provided information about how well the
recommendations match my preferences"). This factor covers mostly
It was clear to me what kind of data the system uses to generate recommendations.
      </p>
      <p>I understood what data was used by the system to infer my preferences.</p>
      <p>I understood which item characteristics were considered to generate recommendations.</p>
      <p>I understood why the items were recommended to me.</p>
      <p>I understood why the system determined that the recommended items would suit me.</p>
      <p>I can tell how well the recommendations match my preferences.</p>
      <p>The system provided information to understand why the items were recommended.</p>
      <p>The system provided information about how the quality of the items was determined.</p>
      <p>The system provided information about how my preferences were inferred.</p>
      <p>The system provided information about how well the recommendations match my preferences.</p>
      <p>I understood how the quality of the items was determined by the system.</p>
      <p>
        I know what actions to perform in the system so that it generates better recommendations
I know what needs to be changed in order to get better recommendations
Cronbach
alpha
perception-related questions. The missing coverage of fect of higher overall transparency on the perception of
perception-related items in other factors is likely due the recommendations and the overall system.
Furtherto limitations of the systems used for the online study more, we assumed that transparency is influenced by
which do not, for example, provide access to the data system-related aspects (accuracy, interaction quality, and
on which recommendations are based, thus preventing explanation quality) as well as by personal characteristics
users to become aware of input data . The factor Output such as decision-making behavior) as described in the
comprises items related to the interpretation and evalua- related work section. Some of these factors can also be
tion stages. The factor Interaction has the smallest scope expected to influence perceived control over the system,
with 2 items and covers only the facets of action planning a construct that may mediate the impact of these factors
or action execution. This factor thus describes whether on transparency perception. This led us to formulating
users know which actions they would have to perform if the hypotheses shown in table 3.
they wanted to receive other recommendations. In the following, the relationships of the factors in
the structural equation model are presented (see fig. 1).
4.3. Discriminant validity of Only significant paths with standardized path coeficients
are shown. Indirect efects are only considered for the
measurement instrument transparency factors relevant here. The final model is
We determined discriminant validity of the instrument in shown to have a very good fit: X 2 = 75.767, df = 57, p =
relation to other constructs of the subjective evaluation .049; X2/df = 1.329; CFI = .980; TLI = .965; RMSEA = .044;
of RS, for example explanation quality, efectiveness and SRMR = .072. The model is thus adequate to describe the
overall satisfaction, according to the frameworks defined relationships in the data set.
by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Discriminant validity was assessed using Influences on perceived transparency of the
sysinter-construct correlations (see results in table 2). We tem. Transparency with respect to interaction is rated
found that the squared correlations between pairs of con- higher when users are more likely to exhibit an intuitive
structs were all less than the value of average variances decision-making style (0.186, p &lt;.05) and users report
that are shown in the diagonal, representing “a level of higher perceived control (0.293, p &lt;.001). The latter is
appropriate discriminant validity” [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. increased by the quality of interaction (0.502, p &lt;.001) and
explanations (0.341, p &lt;.001). Users thus know better how
to influence recommendations when they have more
op5. Structural Equation Model portunities to interact with the system, and can gather
in(SEM) formation about the system through explanations as well
as through ’trial and error’ (indirect: explanation quality
To explore the relation between the transparency factors →control →Transparency-interaction: 0.100, p &lt; .01;
inassessed by the questionnaire and the efects of perceived teraction quality →control →Transparency-interaction:
transparency on recommendation efectiveness and trust 0.147, p &lt; .001).
in the system, as well as the impact of factors influenc- Similar observations can be made for functionality.
ing transparency, we set up a Structural Equation Model Again, transparency is rated higher when users are more
(SEM). The model is based on hypotheses we derived likely to exhibit an intuitive decision-making style (0.141,
from existing research that has shown the positive ef- p &lt;.05) and users report higher perceived control (0.261,
Inter-construct correlation matrix. Average Variance Extracted (AVE) on the main diagonal; correlations below the diagonal;
quadratic correlations above the diagonal. Target value for AVE ≥ .5. p&lt;0.05*, p&lt;0.01**.
      </p>
      <p>The input is perceived as more transparent the better follow the advice of the recommendation system is
inusers can interact with the system. Thus, here again, creased (indirect: Transparency-functionality
→Trustperceived control has a direct positive efect (0.200, p competence →Trust-follow advice: 0.071, p &lt;.05). Thus,
&lt;.01). The quality of the interaction thus repeatedly has it is clear that an understanding of the internal
mechaan indirect efect (indirect: interaction quality →control nisms of recommender systems leads to trusting beliefs
→Transparency-input: 0.100, p &lt; .05). Similarly to what and thus to trusting actions and a positive overall
evaluhas already been shown with regard to functionality, the ation.
quality of the explanations also has a direct, positive Transparency with regard to the input has a negative
efect on transparency of the input (0.209, p &lt;.01) in addi- efect. If users can see which data is used, this has a
tion to the indirect efect (explanation quality →control negative efect on the willingness to follow the advice
→Transparency-input: 0.068, p &lt; .05). of the recommendation system in this model (-0.144, p</p>
      <p>The transparency of the output shows how well users &lt;.05). Thus, this shows a certain counterbalance to a
can assess why a recommendation is made or should transparent functionality, possibly triggered by too much
match the user’s preferences. This is directly increased information or a general distrust regarding data privacy.
by the quality of the interaction with the system (0.311, This shows that transparency can also have negative
conp &lt;.001), i.e. when possibilities are ofered or used to sequences. However, these turn out to be comparatively
indicate one’s own preferences. On the other hand, there small. Transparent output again has strong positive
efare no direct or indirect influences of the explanations. fects. If users can understand why the recommended
Instead, the accuracy of the recommendation has a posi- item matches their preferences, this increases trust in
tive influence on the transparency of the output (0.454, the competence of the system (0.194, p &lt;.01). Indirectly,
p &lt;.001). Accordingly, the output is easier to understand transparency also promotes overall satisfaction via this
if it is rated as suitable. Unsuitable recommendations increase in trust (indirect: Transparency-output
→Trustwould thus be more dificult for the user to comprehend. competence →overall satisfaction: 0.055, p &lt;.05).
Further</p>
      <p>As shown, transparency is positively influenced by the more, the increase in transparency indirectly (indirect:
quality of explanations, accuracy of recommendations, Transparency-output →Trust-competence →efectiveness:
opportunities for interaction, and perceived control. Hy- 0.056, p &lt;.05), but also directly (0.127, p &lt;.05), contributes
potheses 1.1, 1.2, 1.3 and 1.4 can thus be considered con- to a higher rating of the system’s efectiveness. Indirectly,
ifrmed. The influence of the decision-making style is this in turn increases overall satisfaction with the
syslimited to the intuitive style. Therefore, hypothesis 1.5 tem (indirect: Transparency-output →Trust-competence
can only be partially confirmed. →efectiveness →overall satisfaction: 0.019, p &lt;.05).
Ad</p>
      <p>Efects of perceived transparency of the system. ditionally, it increases the willingness to follow the
adNo efects can be observed for transparency with regard vice of the recommendation system when users better
to interaction. It is possible that efects exist on factors understand the output (direct: 0.268, p &lt;.001; indirect:
that were not surveyed in this study. For the other trans- Transparency-output →Trust-competence →Trust-follow
parency factors, however, significant positive efects can advice: 0.073, p &lt;.05).
be observed. As shown, the transparency factors have clear efects</p>
      <p>Transparency regarding the functionality has the strongesotn trust in the system, evaluation of efectiveness and on
and most diverse efect. If users can understand the the overall satisfaction. Therefore hypotheses 2.1, 2.2 and
internal mechanisms, they trust the recommendation 2.3 can be considered confirmed. Thus, perceived
transsystem more. Direct positive efects can be observed parency can also be viewed as a mediator of perceived
on benevolence (0.248, p &lt;.01) and trust in the compe- control over the system, user characteristics, and other
tence (0.188, p &lt;.05) of the system. Indirectly, such trans- qualities of the system. The importance of the diferent
parency thus contributes to a better evaluation of the sys- factors of perceived transparency can be shown by the
tem’s efectiveness (indirect: Transparency-functionality diferentiated assessment.
→Trust-benevolence →efectiveness: 0.074, p &lt;.01;
indirect: Transparency-functionality →Trust-competence
→efectiveness: 0.055, p &lt;.05). Via the increase in ef- 6. Discussion
fectiveness, overall satisfaction with the system is also
promoted (indirect: Transparency-functionality →Trust- We aimed at developing a measurement tool that is
specifbenevolence →efectiveness →overall satisfaction: 0.024, ically focused on capturing the transparency of RS as
p &lt;.05). Via the increase in perceived benevolence, the perceived by users. In an initial interview study,
conwillingness to share information about oneself is also cerns and uncertainties in relation to RS transparency
increased (indirect: Transparency-functionality →Trust- were identified, which are well in line with the general
benevolence →Trust-information sharing: 0.072, p &lt;.05). AI-related questions compiled by [18]. This indicates that
Moreover, via trust in competence, the willingness to the scheme developed by these authors can be a useful
starting point for developing measures also for specific system state to their own goal to decide about the next
systems such as RS, which address a wider range of users action, a stage defined by Norman [ 22] as evaluation. The
beyond more expert users as in the original work by [18]. item “I can tell how well the recommendations match my</p>
      <p>Our confirmatory analyses confirmed our hypothesis preferences” from our scale relates to this stage, by
assessthat subjective perceived transparency can be charac- ing explicitly the correspondence of the recommended
terized by the factors: input, output, functionality and items with one’s own preferences. Items from the
interinteraction. Adequate reliability as well as convergent action group ("I know what needs to be changed in order
and divergent validity was demonstrated, which indicates to get better recommendations") can be associated with
that identified transparency factors can clearly be consid- intent formation and the downstream path in the action
ered as independent, and they can be distinguished from cycle.
each other and also from other factors of the subjective As discussed by [23], designers can contribute to close
evaluation of RS (trust, efectiveness, etc.). the gap between mental models (users’ idea on how the</p>
      <p>The identified factors in our analysis reflect the basic system works [22]), and the actual system’s functioning,
components of RS as defined by [ 16], i.e., the input (what by providing output and functionalities reflecting an
addata does the system use?), the functionality (how and equate system’s conceptual model, that can be “easily
why is an item recommended?), and the output (why and perceived, interpreted and evaluated” [23]. The above
how well does an item match one’s own preferences?). can in turn impact perceived transparency [19].
ConseAdditionally, the factor interaction could be extracted. quently, our instrument can contribute to a more
compreThis factor is consistent with the category interaction hensive assessment of subjective perceived transparency,
(what if / how to be that, what has to be changed for a by going beyond the one-dimensional construct
addressdiferent prediction?) of the prototypical questions to AI, ing a general "why-recommended" understanding, and
formulated by [18]. assessing instead, the extent to which output and
func</p>
      <p>Furthermore, the final set of items can also be consid- tionalities reflecting the system’s conceptual model are
ered through the lens of the diferent interaction stages in fact perceived, interpreted and evaluated.
as defined by Norman [ 22]. In our examined context, for
example, the stage perception relates to the presence of
system functions that explicitly reveal information on 7. Conclusion and Outlook
how the recommendations were derived, e.g. through
explanations. Items of the type “The system provided in- The instrument developed can be seen as a first step
formation about how. . . ”, grouped under the factor func- towards assessing transparency in RS in a more
comtionality, could be validated, indicating that making in- prehensive and cognitively meaningful manner. Overall,
formation about the recommendation process observable reliability and construct validity of the developed
meais a prerequisite for further cognitive processing. This surement instrument could be confirmed, identifying
indicates that the evaluation of perceived transparency four transparency factors (input, output, functionality,
should consider not only items related to users’ inter- interaction) and resulting in a 13 item questionnaire (see
pretation (i.e. “user understands”, as it has traditionally Table 1). The expected influence of system aspects and
been evaluated in RS research), but also items related to personal characteristics on the transparency factors could
the presence and perception of transparency-related sys- be demonstrated for the developed factors with the
exceptem functions (e.g. “user notices that the system actually tion of transparency regarding interaction, which may be
explains”). due to the limited interaction possibilities in the
applica</p>
      <p>
        Once the user perceives a system output (e.g. the fea- tions used by participants. Furthermore, we could show
tures of a recommendation or an explanation), the next the impact of diferent transparency aspects on trust in
stage is the interpretation of the system state, in which the system and on the overall evaluation of the system.
users use their knowledge to interpret the new system The diferentiated assessment of transparency makes it
state [22]: in our context, to assess the recommendation possible to elaborate the significance of individual aspects
inferred by the system. Our validated final set includes of transparency in more detail than it was possible with
items which are related to the interpretation stage, and previous measurement instruments. Thus, it could be
are of the type “I understood what data was used . . . ”, shown that transparency with respect to functioning
which can be grouped under the factor input), or “I under- and output is of greater importance for the dependent
stood how the system determined . . . ”, grouped under the variables considered than transparency with respect to
factor functionality of our developed scale. This group of interaction and input.
items is also consistent with the definition of perceived The findings obtained here should be considered under
transparency by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which focuses on the perceived un- the following limitations. Real systems were tested for
derstanding of the inner processes of RS. this online study. On the one hand, this allowed us to
ob
      </p>
      <p>In a subsequent stage, users compare the interpreted tain users’ views with respect to applications they were
familiar with and that were fully functional. On the other
hand, no controlled manipulation of influencing variables
was possible. We also did not analyze the diferences
between the systems which would have required a larger
sample, also addressing questions outside the scope of
the present study. An efect of explanations could only
be shown for the factors input and functionality, partly
mediated by perceived control, which may also be due to
the limited explanations provided by the systems used.</p>
      <p>In addition, only systems that were already known to the
users were tested. Thus, a stronger expression of trust
and overall more positive evaluation might be expected.</p>
      <p>In terms of social desirability or self-overestimation,
perceived understanding might be valued higher than actual
understanding would lead one to expect.</p>
      <p>
        Follow-up research should be guided by the limitations
mentioned here for further validation of the
measurement instrument. The degree of perceived transparency
should also be compared with actual, genuine
understanding using parallel qualitative methods [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Furthermore,
it is important to check to what extent the questionnaire
is also able to evaluate systems that are unknown to
the users. Assessing unfamiliar systems or specifically
designed prototypes would provide the opportunity to
systematically vary components of the recommender
system (input, functionality, output), the quality of
explanations, and/or the interaction possibilities [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Thus, the
influence of these features on the transparency factors
and likewise possible diferences in their manifestation
should be further explored.
      </p>
      <p>Overall, a first validated version of a questionnaire
to assess perceived transparency can be presented. The
ifndings presented here also provide starting points for
research into further elucidating the multi-faceted
concept of transparency.</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>This work was funded by the German Research
Foundation (DFG) under grant No. GRK 2167, Research Training
Group “User-Centred Social Media”.
2) afiliated with the 5th ACM Conference on Rec- recommender systems, in: Proceedings of 24th
ommender Systems (RecSys 2011), 2011, p. 67–73. International Conference on Intelligent User
Inter[15] M. Bühner, Einführung in die Test- und Fragebo- faces (IUI 19), ACM, 2019, p. 379–390.
genkonstruktion, Pearson Studium, Aufl. München, [29] T. Donkers, T. Kleemann, J. Ziegler, Explaining
2011. recommendations by means of aspect-based
trans[16] D. Jannach, M. Zanker, A. Felfernig, G. Friedrich, parent memories, in: Proceedings of the 25th
InterRecommender Systems. An introduction, Cam- national Conference on Intelligent User Interfaces,
bridge University Press, 2011. 2020, p. 166–176.
[17] J. Lu, Q. Zhang, G. quan Zhang, Recommender Sys- [30] K. Hamilton, S.-I. Shih, S. Mohammed, The
develtems. Advanced Developments, World Scientific opment and validation of the rational and intuitive
Publishing, 2021. decision styles scale, Journal of Personality
Assess[18] Q. V. Liao, D. Gruen, S. Miller, Questioning the ment 98 (2016) 523–535.</p>
      <p>ai: Informing design practices for explainable ai [31] A. G. Yong, S. Pearce, A beginner’s guide to factor
user experiences, Proceedings of the 2020 CHI analysis: Focusing on exploratory factor analysis,
Conference on Human Factors in Computing Sys- Tutorials in Quantitative Methods for Psychology
tems 9042 (2020) 1–15. doi:https://doi.org/10. 9 (2013) 79–94.</p>
      <p>1145/3313831.3376590. [32] R. A. Peterson, A meta-analysis of cronbach’s
co[19] T. Ngo, J. Kunkel, J. Ziegler, Exploring mental mod- eficient alpha, Journal of Consumer Research 21
els for transparent and controllable recommender (1994) 381–391.
systems: A qualitative study, in: Proceedings of the
28th ACM Conference on User Modeling,
Adaptation and Personalization UMAP 20, 2020, pp. 183–
191.
[20] D. Borsboom, G. J. Mellenbergh, J. van Heerden,</p>
      <p>The theoretical status of latent variables,
Psychological Review 110 (2003) 203–219.
[21] J. Kunkel, T. Ngo, J. Ziegler, N. Krämer,
Identifying Group-Specific Mental Models of
Recommender Systems: A Novel Quantitative Approach,
in: Human-Computer Interaction – INTERACT
2021, Lecture Notes in Computer Science, Springer
International Publishing, Cham, 2021, pp. 383–404.</p>
      <p>doi:10.1007/978-3-030-85610-6_23.
[22] D. A. Norman, Some Observations on Mental
Models, In Mental Models, Dedre Gentner and Albert
L. Stevens (Eds.). Psychology Press, New York, NY,</p>
      <p>USA, 1983.
[23] E. Hutchins, J. D. Hollan, D. A. Norman, Direct
manipulation interfaces, Human-Computer
Interaction 1 (1985) 311–338.
[24] Y. Zhang, X. Chen, Explainable recommendation:</p>
      <p>A survey and new perspectives, Foundations and</p>
      <p>Trends in Information Retrieval 14 (2020) 1–101.
[25] K. Backhaus, B. Erichson, R. Weiber,
Multivariate Analysemethoden. Eine anwendungsorientierte</p>
      <p>Einführung, Berlin 13. Aufl, 2011.
[26] J. F. Hair, W. C. Black, B. J. Babin, R. E. Anderson,</p>
      <p>Multivariate data analysis. A global perspective,</p>
      <p>Boston 7. Aufl., 2010.
[27] D. H. McKnight, V. Choudhury, C. Kacmar,
Developing and validating trust measures for e-commerce:
An integrative typology, in: Information Systems</p>
      <p>Research, volume 13, 2002.
[28] P. Kouki, J. Schafer, J. Pujara, J. O’Donovan,</p>
      <p>L. Getoor, Personalized explanations for hybrid</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bostrom</surname>
          </string-name>
          , E. Yudkowski,
          <source>The Ethics of Artificial Intelligence</source>
          , in: W. Ramsey,
          <string-name>
            <surname>K.</surname>
          </string-name>
          Frankish (Eds.),
          <source>Cambridge Handbook of Artificial Intelligence</source>
          , Cambridge University Press,
          <year>2014</year>
          , pp.
          <fpage>316</fpage>
          -
          <lpage>334</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>U. S. a. H. S. C.</surname>
          </string-name>
          (SHS),
          <source>Recommendation on the Ethics of Artificial Intelligence</source>
          ,
          <source>Technical Report, UNESCO</source>
          ,
          <year>2021</year>
          . URL: https://unesdoc.unesco.org/ ark:/48223/pf0000379920.page=
          <volume>14</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Swearingen</surname>
          </string-name>
          ,
          <article-title>The role of transparency in recommender systems</article-title>
          ,
          <source>CHI EA '02 CHI '02 Extended Abstracts on Human Factors in Computing Systems</source>
          (
          <year>2002</year>
          )
          <fpage>830</fpage>
          -
          <lpage>831</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          , Explaining Recommendations:
          <article-title>Design and Evaluation</article-title>
          , in: F.
          <string-name>
            <surname>Ricci</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Rokach</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Shapira (Eds.),
          <source>Recommender Systems Handbook</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>353</fpage>
          -
          <lpage>382</lpage>
          . URL: https://doi. org/10.1007/978-1-
          <fpage>4899</fpage>
          -7637-6_
          <fpage>10</fpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-1-
          <fpage>4899</fpage>
          -7637-6_
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>A user-centric evaluation framework for recommender systems</article-title>
          ,
          <source>in: Proceedings of the fifth ACM conference on Recommender systems - RecSys 11</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Evers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramlal</surname>
          </string-name>
          , M. van
          <string-name>
            <surname>Someren</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Rutledge</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Stash</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Wielinga</surname>
          </string-name>
          ,
          <article-title>The efects of transparency on trust in and acceptance of a content-based art recommender, User Model</article-title>
          .
          <source>User-Adap. Inter</source>
          .
          <volume>18</volume>
          (
          <year>2008</year>
          )
          <fpage>455</fpage>
          -
          <lpage>496</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Knijnenburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Willemsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gantner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Soncu</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Newell, Explaining the user experience of recommender systems, in: User Modeling</article-title>
          and
          <string-name>
            <surname>User-Adapted</surname>
            <given-names>Interaction</given-names>
          </string-name>
          ,
          <year>2012</year>
          , p.
          <fpage>441</fpage>
          -
          <lpage>504</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gedikli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <article-title>How should i explain? a comparison of diferent explanation types for recommender systems</article-title>
          ,
          <source>International Journal of Human-Computer Studies</source>
          <volume>72</volume>
          (
          <year>2014</year>
          )
          <fpage>367</fpage>
          -
          <lpage>382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Hernandez-Bocanegra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <article-title>Explaining review-based recommendations: Efects of profile transparency, presentation style and user characteristics</article-title>
          ,
          <source>Journal of Interactive Media</source>
          <volume>19</volume>
          (
          <year>2020</year>
          )
          <fpage>181</fpage>
          -
          <lpage>200</lpage>
          . doi:https://doi.org/10. 1515/icom-2020-0021.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Hernandez-Bocanegra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <article-title>Efects of interactivity and presentation on review-based explanations for recommendations</article-title>
          ,
          <source>in: HumanComputer Interaction - INTERACT 2021</source>
          , Springer International Publishing,
          <year>2021</year>
          , pp.
          <fpage>597</fpage>
          -
          <lpage>618</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          ,
          <article-title>Evaluating the efectiveness of explanations for recommender systems, User Modeling and User-Adapted Interaction 22 (</article-title>
          <year>2012</year>
          )
          <fpage>399</fpage>
          -
          <lpage>439</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>C.-H. Tsai</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Brusilovsky</surname>
          </string-name>
          ,
          <article-title>Explaining recommendations in an interactive hybrid social recommender</article-title>
          ,
          <source>in: 24th International Conference on Intelligent User Interfaces (IUI 19)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jameson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Willemsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Felfernig</surname>
          </string-name>
          , M. de Gemmis, P. Lops,
          <string-name>
            <given-names>G.</given-names>
            <surname>Semeraro</surname>
          </string-name>
          , L. Chen,
          <article-title>Human decision making and recommender systems, Recommender Systems Handbook (</article-title>
          <year>2015</year>
          )
          <fpage>611</fpage>
          -
          <lpage>648</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dooms</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Pessemier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martens</surname>
          </string-name>
          ,
          <article-title>A usercentric evaluation of recommender algorithms for an event recommendation system</article-title>
          ,
          <source>in: Proceedings of the RecSys</source>
          <year>2011</year>
          :
          <article-title>Workshop on Human Decision Making in Recommender Systems (Decisions RecSys 11) and User-Centric Evaluation of Recommender Systems and Their Interfaces - 2 (UCERSTI</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>