<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OAUC at CLEF2016 SBS Lab: Using Appeal Elements to Improve Automatic Book Recommendation - Proof of Concept</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Preminger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gjertrud Fludal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Oslo and Akershus University College of Applied Science</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this article we describe the OAUC's participation in the CLEF 2016 SBS Search Suggestion track. We are trying to represent appeal elements, used in readers' advisory theory and practice, to see if they can be used in an automatic retrieval and recommendation context. We are still working with the pace appeal element, used in ction to capture how quickly the buildup of the story or the plot is. New this year is the use of intellectually coded appeal-element data done by EBSCO as part of the NoveList R service (our gratitude to EBSCO for providing the data).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        As pointed out in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], There are many qualities to books besides their formal
characteristics, such as title, author and subject (the latter being examples of
metadata). Books, particularly ction, also evoke the readers' emotions, which
is arguably their major mission. This article continues our exploration of how
the emotion-evoking as well as other subtle qualities can be discovered in user
generated data and subsequently used in a system for automatic classi cation
of books, as a part of an automatic recommender system. For this year's task
we are still focusing on the pace subtle element. The challenge is twofold: try to
identify certain emotion waking characteristics of books, and measure whether
identi cation of such characteristics helps us match readers' wishes based on
similar characterization of their recommendation requests. We are working on
operationalizing the pace using document model creation as well as occurrences
of adjectives / adjective types in fast-paced vs. leisurely-paced books.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Theoretic Approach and related work</title>
      <p>Emotion evoking characteristics are properties of books that are not usually a
part of the metadata, technically, because they are di cult to trace back. Even
though most people might agree on one or the other subtle property of a book,
there is potential for dispute</p>
      <p>
        In addition to Saricks work on appeal elements [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], introduced in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a couple
of works following it up are de nately worth mentioning.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Saricks framework of appeal</title>
        <p>
          [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] has developed a framework / terminology that enables librarians, or other
reading-promotors, to discuss books through short excerpts, user reviews and
the like, boiling down to "appeal". Appeal has a number of elements
Pace According to Saricks, pace is the most important appeal-element, and has
the best potential of distinguishing potential readers. Pace has to do with the
build up of the story / plot in a book, and how quickly the reader is drawn
into it. Some readers (in some situations) will prefer fast paced books, other
will rather endeavour on a slow-paced book. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] also have a pace value they call
"Intensifying", for which we do not have any books in our database.
Characterization This element has to do with the introversy or extroversy of
the characters in the book. Readers often remember the characters in the book
more easily than they remember the plot. Alas, the conception of a well developed
character varies greatly among users1, making this element less hospitable to
analysis of appeal than the case is for the pace element.
        </p>
        <p>Frame The frame is about the tone of a book (melancholic, positive), its feeling
(funny or romantic), and its atmosphere (menacing or elevating). though di cult
to de ne, this element is often decisive for the reader's choice. The book can be
amusing, bleak, bittersweet2
Storyline The storyline is of course dependent on the previously discussed
elements. But typical values3 will be Issue-oriented, Nonlinear or Open-ended.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Representing and modelling appeal elements</title>
        <p>The appeal elements are not directly manifest in the book text, let alone its
metadata, and we need to nd some representation so that a recommender
system can take them into account. To this end we need to nd some manifest
indicators that can automatically match a recommendation request and a book
using the appeal elements as evidence (in addition to other evidence), when
recommending a book based on this recommendation request.</p>
        <p>
          Finding and using such indicators is a challenge, which character di ers
among the elements. Being metadata of di erent kinds rather than full
content, the texts we have are sparse, but on the other hand (for a portion of the
books) include reader reviews, which should be a condense summarization of the
book done by readers, the target group of a recommender system.
1 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] lists 30 types of characters that can appear in ction
2 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]lists 58 categories of "Tone".
3 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] lists 9 types of storyline
        </p>
        <p>One way of implicitly representing an appeal element (element-name
elementvalue) is, using occurrences of sentences that are characteristic to some value of
an appeal element. Feeding these to a Natural Language Processing-system, the
NLP system may identify functionally similar sentences in any analyzed
bookreview to use in the classi cation of the books. In our implementation, a model of
an appeal element is a summary of sentences that are likely to appear in a review
of a book that has this value (or valence) of this element. This method has a
potential for accuracy, but needs quite a large set of reader reviews given to books
with known values (and valences) of the appeal element, and is extremely prone
to over tting. A simpler but less exact method will be identifying single words or
word combinations, particularly adjectives used by readers when reviewing books
of di erent values / valences of appeal elements. Such words need somehow to be
classi ed, so that a system looking for appeal elements in reviews has a broader
repertoire of words to look for than the one occurring in the training set.</p>
        <p>Matching can thereafter be done by attempted applying the same, or a
slightly di erent model to the recommendation request, assuming that a
recommendation request and a review belong to the same genre. Here we have
several options:
{ Retrieving books by a traditional retrieval model (using text-based metadata
elements for matching) and then reranking so that books with appeal element
values matching that of the request, rank ahead of other books
{ Weighing up books with matching appeal at retrieval time
{ Traditional retrieval accompanied by pseudo relevance feedback based on the
appeal models.</p>
        <p>As our current main experimentation line is around pace, we will be more
detailed discussing pace, than the other elements.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Pace</title>
        <p>Pace can be seen as a binary variable, either "High-paced" ("Fast-paced" in
NoveList terminology) or "Low-Paced" ("Leisurely-paced"), making it the
easiest element to model and represent, but at the same time less controllable.
Saricks poses some questions the answers to which may provide clues as to the
pacing:
{ Is the book densely written?
{ Are there short sentences / short paragraphs, short chapters?
{ Is there a straight line plot
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related work and our approach</title>
      <p>
        Our work belongs in the realm of content based recommender systems, like for
example [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The main advantage of such systems is their independence of users
and their history of reading and recommendations. As such, these system have
a better ability to recommend items not yet recommended to anyone, thereby
better supporting serendipity. They are also less likely to serve very close material
to what a reader already has read, thereby supporting novelty, but are prone to
over-specialization, which somehow counteracts the advantages above.
      </p>
      <p>Saricks' framework is reportedly being extensively used in libraries, and in
recent years it is starting to gain more systematic use, prominently in a Reader's
Advisory resource like NoveList. NoveList4 is a paid service by EBSCO, marketed
towards Reader Advisory (RA) services of libraries, active since the late 90's.
Among other book characteristics used for recommendation, They have, since
2010 also been recording and utilizing Saricks appeal elements.</p>
      <p>
        In a more research-related context, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] has developed a conceptual approach
(guiding the current research), of using Saricks elements in book
recommendations. As a part of a Phd-work, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have experimented with automatic
extraction of appeal elements from reviews using rules related to occurrences /
co-ocurrences of types of words from reader reviews. Both the design approach
and the evaluation approach are quite straight forward. The appeal element
extraction is a combination of a nite list of words (mostly adjectives) expanded by
wordnet-extracted synonyms, and rules for these words' occurrence in the
sentences of a review. The rules analyse governor - subordinate relations between
pairs of words.
      </p>
      <p>
        Interestingly, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have assessed the quality of their ABET extractor by directly
comparing its performance to NoveList's recommendations using appraisals by
Amazon Mechanical Turk workers as the gold standard, nding ABET more
accurate. They also compared the performance their entire system Rabbit (of
which ABET is a component) to other recommendation services by using
Mechanical Turk appraisers as gold standard when choosing new books that "best
relate" to each one from a sample of ten books. The evaluation strategy taken
here is very practical, and the results certainly promising. Still we feel that our
challenge here is di erent, as we wish to match books with recommendation
requests (not having other books to relate our recommendations to), and we
therefore feel that we need to take a slightly more general approach, which is
based on a broader classi cation of Parts of Speech, particularly adjectives.
      </p>
      <sec id="sec-3-1">
        <title>4 https://www.ebscohost.com/novelist</title>
        <p>
          Resembling [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], we will also need to take a part / whole approach, trying to see
(a) whether our NLP-classi cation has the potential to elicit individual appeal
elements (b) whether it is possible to classify recommendation requests the same
way as user reviews (whether or not those two types belong to the same genre)
and (c) whether correct identi cation indeed gives us better recommendations
on the basis of textual recommendation requests.
4
4.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Data, Experiments and Results</title>
      <sec id="sec-4-1">
        <title>Our Data</title>
        <p>The SBS Suggestion Track's (SST) data consist of metadata drawn from
LibraryThing and Amazon, describing about 2,8 million books, keyed by their
ISBN (meaning the number of distinct works is somewhat lower, as ISBN keys
manifestations of works). About half of these, (over 1.3 millions) have reader
reviews as a part of their metadata. It is these reviews (free texts) that constitute
the most important data of this paper.</p>
        <p>In order to prepare the data to adjective based analysis, we have so far been
taking the following steps:
{ POS-tagging of all free texts of the reviews using the Apache OpenNlp5
{ Collecting all adjectives, basic (&lt; J J &gt;), comparative (&lt; J J R &gt;) and
superlative (&lt; J J S &gt;)
{ normalizing the adjective-forms captured by the POS-tagger, and linking
each review to the normalized forms of the adjectives.</p>
        <p>As we were preparing this year's experiments, based on the approach we have
taken, we have seen that the crunching of the data for preparing the
adjectivebased is extremely time-consuming and at the moment of writing the data are
still in the preparation stage. Therefore we will, in this paper, limit ourselves to
experiments based on document categorization.</p>
        <p>In addition, we have also, with great gratitude, obtained EBSCO NoveList
data, categorizing books into value-categories based on several appeal elements.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Our Purpose and Overall Research Design</title>
        <p>As an overall, guiding design principle when approaching this issue, we intend
to assign values or valences of appeal elements to unseen books (represented
by their respective review texts), based on intellectually assigned values to a
subset of the books, and building models based on reviews of the latter ones
as described in the following subsections. We conduct a traditional text based
search into an index created by selected parts of the metadata, and rerank the
result so that books with appeal elements that match the appeal element of the
request are assigned higher prominence in the result set, as depicted in Figure
2.</p>
        <sec id="sec-4-2-1">
          <title>5 https://opennlp.apache.org/</title>
          <p>Training the models The most straight-forward way of training appeal-element
categorizers based on existing NLP-tools, is using reviews of books with known
values (previously assigned by experts) to build document-categorization
models, that can use reviews of unseen books (where available) to classify those into
appropriate categories (low vs. high valence, di erent intervals of element values
a.s.o). Classifying the recommendation requests as described in the previous
section, can provide us with an additional piece of evidence when matching those
requests to books.</p>
          <p>In the pace case, with two mutually exclusive values, we are using the Apache
OpenNLP (https://opennlp.apache.org/) Document Categorizer tool(https:
//opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.
doccat) in order to train a number of models for the ability to distinguish
(unseen) Fast-paced from Leisurely-paced books by their review texts. The trained
models di er by the number of training reviews (respectively 50, 75, 100, 150,
200, 250, 300, 400, 500, 750 and 1000 of each kind, respectively, 2000 reviews
used to build the largest model), in order to nd out the dependence of the
ability to discriminate fast or leisurely paced books on the model size (number
of reviews). The 2000 books (1000 of each value) comprising the largest model,
constitute a superset of all the other training sets (See illustration in Figure
3(a)). We use the remaining NoveList pace-categorized books as a test set.</p>
          <p>In Figure 3(b) we show the prediction power of the model as a function of
the number of documents in the training set used to create it. It is interesting
to see the trendwise increase in prediction power as a function of the size of
the model. This indicates the soundness of the approach, meaning that the pace
element tells something about the book, and that the reader review entails a kind
of a representation of the pace value. Also interesting is the better prediction
of the leisurely-paced books. This ows to the fact that in this case the test
set is closer in size to the training set, so more of the variability of the latter is
represented in the former. Paradoxically this also indicates the overall soundness
of the approach. We stress that the categorizer is used with default settings, and
that work still remains ne-tuning the tool for the categorization task.
(a) The EBSCO NoveList
pace-set</p>
          <p>(b) Prediction power as a function of size</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Using Pace in Recommendation / Retrieval To test the approach's po</title>
        <p>tential to assist in a recommendation situation, we employ a two-step procedure:
{ ranking documents with traditional retrieval methods
{ reranking afterwords by matching the paces of the recommendation request
and the book reviews</p>
        <p>The assumption is that the pace is additional evidence in the ranking process
for recommendation satisfaction, and its a ect is best measured at the primary
part of the ranking list. We reorder the rst n ranked documents returned for
each topic, so that documents which pace value matches that of the request (as
predicted by the procedure described below), are promoted to the top of the list,
keeping their mutual order as in the original list, to that end using a stable sort.
For the task of determining the pace of the topic itself, two approaches have
been tested:
{ Predicting the values of the works attached to the topic in the topic set and
applying a majority vote of those as the request value (Figure 4(a))
{ Treating the request text as a review text and predicting its value directly.</p>
        <p>(Figure 4(b))
As the direct prediction strategy (the second above) gives better results in the
current experiments, we only report the results of this strategy in the present
paper, but hold the other strategy viable for later experiments. The attached work
approach ( rst item above) also has the weakness that not all works attached to
the topic have reader reviews associated with them.</p>
        <p>Pace as evidence Appeal elements, pace among these, are expected to serve
as evidence, contributing to the successful recommendation. One way of
nding out the optimal contribution would be to rerank upper sublists of various
lengths of the original ranked list so that works whose pace match the request
pace (predicted by the direct approach from previous section) and measure the
performance of these reranked lists up against that of the original list (baseline).</p>
        <p>As seen in Figure 5, reranking only the sublists n=5-20 performs better than
the baseline (level of evidence 0), whereas reranking longer parts of the list gives
unpredictable results, introducing many books with matching pace but otherwise
irrelevant to the recommendation request.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>As already explained, we see this as a continuous research endeavor, where the
purpose is to directly utilize appeal elements in generating better
recommendations based on recommendation requests. We have started out trying to model
the pace appeal element in both books' reader reviews and the topics
(recommendation requests), trying to see if matching those can give better
recommendations. The result so far are promising in that they seem to indicate the overall
soundness of the approach, but a better baseline would be needed to test the
approach under more realistic conditions.</p>
      <p>
        We have so far been using the document categorization strategy as proposed
by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The next endeavour is based on occurrence of adjectives in reviews
(indirectly inspired by [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]).
      </p>
      <p>Owing to the relatively large collection of pace values intellectually assigned
to books, that we obtained from EBSCO, The current results are remarkably
better than the 2015 results. Further use of the data, combining also other appeal
elements than pace, indicate that the potential is far from exhausted.
(a) Pace predicted through attached works</p>
      <p>(b) Pace directly predicted from request</p>
      <p>Fig. 4. Overall design - two strategies of predicting request pace</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Fugleberg</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Preminger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Oauc's participation in the clef2016 sbs search suggestion track</article-title>
          . In: Working Notes of CLEF 2015 -
          <article-title>Conference and Labs of the Evaluation forum</article-title>
          . Volume
          <volume>1391</volume>
          ., http://ceur-ws.
          <source>org</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Saricks</surname>
          </string-name>
          , J.: Readers'
          <article-title>Advisory Service in the Public Library</article-title>
          . ALA editions. American Library Association (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Caplinger</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coleman</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coulter</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kage</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keyser</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morgan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reaser</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Young</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>The secret language of books, a guide to appeal</article-title>
          . http://www.ebsco.com/promo/novelist
          <article-title>-the-secret-language-of-books (2015) Promotion Brochure by Ebsco</article-title>
          .
          <source>Accessed: 2015-07-11.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Aciar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debenham</surname>
          </string-name>
          , J.:
          <article-title>Informed recommender: Basing recommendations on consumer product reviews</article-title>
          .
          <source>Intelligent Systems, IEEE</source>
          <volume>22</volume>
          (
          <issue>3</issue>
          ) (May
          <year>2007</year>
          )
          <volume>39</volume>
          {
          <fpage>47</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fugleberg</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Automatisk klassi kasjon av bker basert p brukeranmeldelser: Et konsept</article-title>
          .
          <source>Master's thesis, Hgskolen i Oslo og Akershus</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pera</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>Y.K.</given-names>
          </string-name>
          :
          <article-title>Automating readers' advisory to make book recommendations for k-12 readers</article-title>
          .
          <source>In: Proceedings of the 8th ACM Conference on Recommender Systems. RecSys '14</source>
          , New York, NY, USA, ACM (
          <year>2014</year>
          )
          <volume>9</volume>
          {
          <fpage>16</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>