<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Interactive Preference Elicitation for Scientific and Cultural Recommendations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eduardo Veas</string-name>
          <email>eduveas@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cecilia di Sciascio</string-name>
          <email>cdissciascio@know-center.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information and Communications Technologies, National University of Cuyo</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Knowledge Visualization, Know-Center GmbH</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a visual interface developed on the basis of control and transparency to elicit preferences in the scientific and cultural domain. Preference elicitation is a recognized challenge in user modeling for personalized recommender systems. The amount of feedback the user is willing to provide depends on how trustworthy the system seems to be and how invasive the elicitation process is. Our approach ranks a collection of items with a controllable text analytics model. It integrates control with the ranking and uses it as implicit preference for content based recommendations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>A recommender system (RS) depends on a model of a user to
be accurate. To build a model of the user, behavioral
recommenders collect preferences from browsing and purchasing
history, whereas rating recommenders require a user to rate
a set of items to state their preferences (implicit and explicit
methods respectively) [Pu et al., 2011]. Preference elicitation
is fundamental for the whole operational lifecycle of a RS: it
affects the recommendations for a new user and also those of
the whole system community, given what the RS learns from
each new user [Cremonesi et al., 2012]. Whichever method
is chosen, preference elicitation represents an added effort,
which may be willingly avoided to the detriment of user
satisfaction. The amount of feedback the user is willing to provide
is a tradeoff between system aspects and personal
characteristics, for example privacy vs recommendation quality
[Knijnenburg et al., 2012].</p>
      <p>In their seminal work, Swearingen et al. pointed out one
challenge: the recommender has to convince the user to try
the recommended items [Swearingen and Sinha, 2001]. To
do so, the recommendation algorithm has to propose items
effectively, but also the interfaces must deliver
recommendations in a way that can be compared and explained [Ricci et
al., 2011]. The willingness to provide feedback is directly
related to the overall perception and satisfaction the user has
of the RS [Knijnenburg et al., 2012]. Explanation
interfaces increase confidence in the system (trust) by explaining
how the system works (transparency) [Tintarev and Masthoff,
2012] and allowing users to tell the system when it is wrong
(scrutability) [Kay, 2006]. Hence, to warrant increased user
involvement the RS has to justify recommendations and let
the user customize their generation. Transparency and
controllability are key facilities of a self-explanatory RS that
promote trust and satisfaction [Tintarev and Masthoff, 2012]</p>
      <p>Our work is set in the scientific and cultural domain. In
this frame, users are most often engaged in exploration and
production tasks that involve gathering and organizing large
collections in preparatory steps (e.g., for writing, preparing
a lecture or presentation). A federated system (FS)
compiles scientific documents or electronic cultural content
(images) upon an explicit or implicit query, with little control
over the way results are generated. Content takes the form of
text document surrogates comprising title and abstract. They
also include minimal additional metadata, like creator, URL,
provider and year of publication.</p>
      <p>This paper introduces a visual tool to support exploration
of scientific and cultural collections. The approach includes
a metaphor to represent a set of documents, with which the
user interacts to understand and define themes of interest.
The contribution of this work is the interactive
personalization feature that, instead of presenting a static ranked list,
allows users to dynamically re-sort the document set in the
visual representation and re-calculate relevance scores with
regards to the own interests. The visual interface employs
controllable methods to represents their results in a transparent
manner which, rather than adding effort, reduces complexity
of the overall task.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Approach</title>
      <p>
        The proposed approach was designed to quickly reorganize
a large collection in terms of its relevance to a set of
keywords expressing the choice of topic. In a nutshell, the goal
is to interactively discover the topics in a collection,
building the knowledge in the user. But, instead of trying to
infer a hidden topic structure fully automatically
        <xref ref-type="bibr" rid="ref1">(as in [Blei,
2012])</xref>
        , we propose an interactive approach, which works as
a conversation between the user and the RS to build a
personalized theme structure. Controllability and transparency
are crucial for the user to understand how a topic came about
from their personal exploration. The challenge for the
interface is to clearly explain the recommendation process, and for
the analytics method to reduce the computational problem to
interactive terms.
      </p>
      <sec id="sec-2-1">
        <title>2.1 Visual Interface</title>
        <p>To search and explore documents based on the themes that
run through them, we build an interface that allows the user
to establish a conversation with the RS. Two main parts of
the interface comprise the topic summary and the
recommendation pane. The topic summary is built from keywords
extracted from the whole collection. Keywords are presented in
a Tag Box, organized and encoded in terms of their frequency
of occurrence in the collection (tf-idf), see Fig. 1. The
recommendation list initially shows the unranked collection.</p>
        <p>As the user interacts with the contents choosing words to
express her information needs, the recommendation list is
ranked on-the-fly (see Fig. 1). The RankView shows the
contribution each keyword has on the overall score of a
document. With a slider, the user can assign a weight to a
keyword and modify its contribution to the score. Furthermore,
the TagBox and RankView illustrate the possible effect of
user actions in a quick overview: mouse over a keyword in
the TagBox shows a micro-chart with the proportion of
documents affected and the RankView highlights those documents
in view that would be affected by choosing the keyword.</p>
        <p>It is important to note that the user is aware and in control
of the ranking and organization of the document collection at
all times. With the visual interface, the user describes her
information needs and chooses documents from the collection
that better reflect those needs. Chosen items can be assigned
to a collection. The act of choosing an item is considered
an expression of preference. With the collection, the system
stores keywords and score of each document. Although this
feedback is not yet incorporated in our ranking approach, we
analyze its effects with a user study and outline future
directions to integrate this additional information in the system.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Text Analytics and Ranking</title>
        <p>Keyword extraction plays two roles: it summarizes the
topics in the collection, and it also provides the basis for the
fast ranking of documents. Preprocessing involves
part-ofspeech tagging, singularizing plural nouns, and stemming
with a Porter Stemmer. Resulting terms form a document
vector, which also constitutes its index. Subsequently, individual
terms are scored with TF-IDF (term frequency - inverse
document frequency). It how important a term is to a document
in a collection, as the coefficient between its frequency in a
document and the logarithm of the times it is repeated in the
collection of documents. The more frequent a term is in a
document and the fewer times it appears in the corpora, the
higher its score will be. TF-IDF scored terms are added to the
metadata of each document. To provide an overview of the
contents in the collection, keywords from all documents are
collected in a global set of keywords. Global keywords are
sorted by the accumulated document frequency (DF),
calculated as the number of documents in which a keyword appears
- regardless of the frequency within the documents.</p>
        <p>Quick exploration of content depends on quickly re-sorting
the documents according the information needs of the user,
expressed with a query built from a subset of the global
keyword collection. We assume that some keywords are more
important to the topic model than others and allow the user to
assign weights to them.</p>
        <p>The documents in the set are then ranked and sorted as
follows. Given a set of documents D = d1; :::; dn, a set of
keywords: K = k1; :::; km and a set of selected keywords:
T = t1; :::; tp; T K; the overall score for document di is
calculated as the sum of the weighted scores of its keywords
matching selected keywords:
p
∑
j=1
sdi =
wtj
mlitj ;</p>
        <p>Where wtj is the weight assigned by the user to the selected
keyword tj , such that 8j : 0 wtj 1; and mditj is the
tfidf score for keyword tj in document di. D is next sorted by
overall score using the quicksort algorithm. Documents in D
are now elements of sequence Q with order determined by:
Q = (qi)in=1; qi; qi+1 2 D ^ sqi
sqi+1 :</p>
        <p>Finally, the ranking position is calculated in such a way that
items with equivalent overall score share the same position.
The position for a sorted document qi is calculated as
rqi =</p>
        <p>Where C = qj =sqj = sqj 1 ; 0 j i represents the set
of all the items with immediate superior overall score than qi.</p>
        <p>The current approach employs a term-frequency-based
scheme to compute document scores, as it is more appropriate
to compute and highlight individual term contributions than a
single similarity measure.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental Setup</title>
      <p>We performed a preliminary study to determine if
controllability and transparency increase the complexity and pose an
extra effort in the task of building topic oriented document
collections. Thus, participants had the task to “gather relevant
items” using our tool (U), or using a recommendation list (L)
with usual tools (keyword search).We chose two variations of
size of the dataset in terms of item count S(30), L(60).
3.1</p>
      <sec id="sec-3-1">
        <title>Method</title>
        <p>The study was structured as a repeated measures design, with
four iterations of the same tasks, each with a different
combination of the independent variables (e.g., US-LL-UL-LS). To
counter the effects of habituation, we used four topics
covering a spectrum of cultural, technical and scientific content:
women in the workforce (WW), robotics (RO), augmented
reality (AR), circular economy (CE). Each of these topics has
a well defined wikipedia page, which was used as seed to
retrieve a collection from a federated system. The system
creates a query from the text of the page and forwards it to
a number of content providers. The result is a joint list of
items from each provider. The federated system cannot
establish how relevant the items are. Furthermore, the resulting
collection refers to the whole text, but there is no indication
of subtopics. We collected sets of 60 and 30 items as static
datasets for each topic. We simulated the proposed scenario
of reorganizing the collection by choosing subtopics for each
task in the study. The combinations were randomized and
assigned using a balanced Latin Square.</p>
        <p>Each condition had two fundamental tasks: find items most
relevant to a set of given keywords, find items most relevant
to a short text. In the former, participants were given the
keywords and they just had to explore the collection. There were
two iterations of this task for each condition. The short text
task required participants to come up with the keywords
describing the topic by themselves.</p>
        <p>Twenty four (24) participants took part in the study (11
fem., 13 m., between 22 and 37 years old). They were
recruited from the medical university and from computer
science university graduate population. None of them is
majoring in the topic areas selected for the study.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Procedure</title>
        <p>A study session started with an intro video, which explained
the functionality of the tool. Each participant got exactly the
same instructions. There was a short training session on a
dummy dataset to let participants familiarize with the tool.
Thereafter, the first condition started. The system showed a
short text to introduce the topic. After reading the text,
participants pressed start, opening the interface for the first task. At
the beginning of the task, the items in the collection were
ordered randomly, ensuring that an item would not appear in the
same position again. The instructions for the task were shown
in the upper part of the screen. In all conditions participants
were able to collect items and inspect their collections. In the
(L) condition the main interface was a list of items, whereas
the (U) condition used the proposed interface. Participants
had to click the finished button to conclude the task. It was
possible to finish without collecting all items. After each
condition, participants had to fill a NASA TLX questionnaire to
assess cognitive load, performance and effort among others.</p>
        <p>The procedure was repeated for each of the four iterations.
Thereafter participants were interviewed for comments.
3.2</p>
      </sec>
      <sec id="sec-3-3">
        <title>Results</title>
        <p>NASA TLX data were analyzed using a repeated measures
ANOVA with independent variables tool, and dataset size.
Post-hoc effects were computed using Bonferroni corrected
pairwise comparisons. The two by two experimental design
ensures that sphericity is necessarily met. A repeated
measures ANOVA revealed a significant effect of tool on
perceived workload F(1,23)=35.254, p &lt; 0:01; ϵ = 0:18. A
Post-hoc paired-samples t-test revealed a significantly lower
workload when using uRank (p &lt; 0:001). Further, repeated
measures ANOVA in each dimension of the workload
measure showed significant effects of tool in all dimensions as
shown in Table 1.</p>
        <p>To test the proposed recommender, we gathered and
compared for each topic (WW, RO, AR, CE) the most popular
items collected using the list (L-MP) and our approach
(UMP) with the scores received by our ranking algorithm (U).
We performed intra-class correlations (ICC), using a
twoway, consistency, average measures model. Results are
summarized in Table 2. For broad exploration (q1 &amp; q2), we
found good to excellent ICCs. A closer look at the
distribution of scores in Fig. 2 underlines the fact that high ranked
documents (U) were a popular choice with U MP and also
relatively popular with L MP. For q3, the ranking (U)
produced widespread scores with less individual favorites, items
L MP were generally least popular. U MP resulted in the
most focused of the three (less blocks with higher intensity).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Outlook</title>
      <p>Results show that the fast-ranking method in our content
recommender helps users quickly reorganize collections. The
preference elicitation method was well received and quickly
adopted. Participants experienced less effort and overall
workload using our tool. Still, they took time to check their
choices carefully in both U and L conditions. Comparing
most popular choices after the experiment reinforces our
assumptions: the fast-ranking method (U) correlates with most
popular choices made with the tool (U MP) but also without
it (L MP). Yet, widespread results in some cases call for a
personalized recommendation method. Our preference
elicitation forms the backbone of personalized recommendations.
In the future we will explore recommendations of related
items in context, showing keywords used to collect the item,
other items collected together and under which collections.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is partially funded by CONICET (project
VisualI-Lab, Res. 4050-13) and by Know-Center. Know-Center is
funded by the Austrian Research Promotion Agency (FFG)
under the COMET Program.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Blei</source>
          , 2012]
          <string-name>
            <given-names>David M.</given-names>
            <surname>Blei</surname>
          </string-name>
          .
          <article-title>Probabilistic topic models</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>55</volume>
          (
          <issue>4</issue>
          ):
          <fpage>77</fpage>
          -
          <lpage>84</lpage>
          ,
          <year>April 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Cremonesi et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          , Franca Garzottto, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Turrin</surname>
          </string-name>
          .
          <article-title>User effort vs. accuracy in rating-based elicitation</article-title>
          .
          <source>In Proc. of the Sixth ACM Conf. on Recommender Systems</source>
          , RecSys '12, pg.
          <fpage>27</fpage>
          -
          <lpage>34</lpage>
          , New York, NY, USA,
          <year>2012</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Kay</source>
          , 2006]
          <string-name>
            <given-names>Judy</given-names>
            <surname>Kay</surname>
          </string-name>
          .
          <article-title>Scrutable adaptation: Because we can and must</article-title>
          . In Vincent P. Wade, Helen Ashman, and Barry Smyth, editors,
          <source>AH</source>
          , volume
          <volume>4018</volume>
          of Lecture Notes in Computer Science, pg.
          <fpage>11</fpage>
          -
          <lpage>19</lpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Knijnenburg et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Bart P.</given-names>
            <surname>Knijnenburg</surname>
          </string-name>
          , Martijn C. Willemsen, Zeno Gantner, Hakan Soncu, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Newell</surname>
          </string-name>
          .
          <article-title>Explaining the user experience of recommender systems</article-title>
          .
          <source>User Modeling</source>
          and
          <string-name>
            <surname>User-Adapted</surname>
            <given-names>Interaction</given-names>
          </string-name>
          ,
          <volume>22</volume>
          (
          <issue>4-5</issue>
          ):
          <fpage>441</fpage>
          -
          <lpage>504</lpage>
          ,
          <year>October 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Pu et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Pearl</given-names>
            <surname>Pu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Li</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Rong</given-names>
            <surname>Hu</surname>
          </string-name>
          .
          <article-title>A usercentric evaluation framework for recommender systems</article-title>
          .
          <source>In Proc. of the Fifth ACM Conf. on Recommender Systems</source>
          , RecSys '11, pg.
          <fpage>157</fpage>
          -
          <lpage>164</lpage>
          , New York, NY, USA,
          <year>2011</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Ricci et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Ricci</surname>
          </string-name>
          , Lior Rokach, Bracha Shapira, and
          <string-name>
            <given-names>Paul B.</given-names>
            <surname>Kantor</surname>
          </string-name>
          . Introduction to Recommender Systems Handbook, pg. 1-
          <fpage>35</fpage>
          . Springer US,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Swearingen and Sinha</source>
          , 2001]
          <string-name>
            <given-names>K.</given-names>
            <surname>Swearingen</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sinha</surname>
          </string-name>
          .
          <article-title>Beyond algorithms: An hci perspective on recommender systems</article-title>
          .
          <source>In ACM SIGIR. Workshop on Recommender Systems</source>
          , volume Vol.
          <volume>13</volume>
          ,
          <string-name>
            <surname>Numbers</surname>
          </string-name>
          5-6, pg.
          <fpage>393</fpage>
          -
          <lpage>408</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Tintarev and Masthoff</source>
          , 2012]
          <string-name>
            <given-names>Nava</given-names>
            <surname>Tintarev</surname>
          </string-name>
          and
          <string-name>
            <given-names>Judith</given-names>
            <surname>Masthoff</surname>
          </string-name>
          .
          <article-title>Evaluating the effectiveness of explanations for recommender systems</article-title>
          .
          <source>User Modeling</source>
          and
          <string-name>
            <surname>User-Adapted</surname>
            <given-names>Interaction</given-names>
          </string-name>
          ,
          <volume>22</volume>
          (
          <issue>4-5</issue>
          ):
          <fpage>399</fpage>
          -
          <lpage>439</lpage>
          ,
          <year>October 2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>