OAUC at CLEF2016 SBS Lab: Using Appeal Elements to Improve Automatic Book Recommendation - Proof of Concept Michael Preminger1 and Gjertrud Fludal1 Oslo and Akershus University College of Applied Science Abstract. In this article we describe the OAUC’s participation in the CLEF 2016 SBS Search Suggestion track. We are trying to represent appeal elements, used in readers’ advisory theory and practice, to see if they can be used in an automatic retrieval and recommendation context. We are still working with the pace appeal element, used in fiction to capture how quickly the buildup of the story or the plot is. New this year is the use of intellectually coded appeal-element data done by EBSCO as part of the NoveList R service (our gratitude to EBSCO for providing the data). 1 Introduction As pointed out in [1], There are many qualities to books besides their formal characteristics, such as title, author and subject (the latter being examples of metadata). Books, particularly fiction, also evoke the readers’ emotions, which is arguably their major mission. This article continues our exploration of how the emotion-evoking as well as other subtle qualities can be discovered in user generated data and subsequently used in a system for automatic classification of books, as a part of an automatic recommender system. For this year’s task we are still focusing on the pace subtle element. The challenge is twofold: try to identify certain emotion waking characteristics of books, and measure whether identification of such characteristics helps us match readers’ wishes based on similar characterization of their recommendation requests. We are working on operationalizing the pace using document model creation as well as occurrences of adjectives / adjective types in fast-paced vs. leisurely-paced books. 2 Theoretic Approach and related work Emotion evoking characteristics are properties of books that are not usually a part of the metadata, technically, because they are difficult to trace back. Even though most people might agree on one or the other subtle property of a book, there is potential for dispute In addition to Saricks work on appeal elements [2], introduced in [1], a couple of works following it up are definately worth mentioning. 2.1 Saricks framework of appeal [2] has developed a framework / terminology that enables librarians, or other reading-promotors, to discuss books through short excerpts, user reviews and the like, boiling down to ”appeal”. Appeal has a number of elements Pace According to Saricks, pace is the most important appeal-element, and has the best potential of distinguishing potential readers. Pace has to do with the build up of the story / plot in a book, and how quickly the reader is drawn into it. Some readers (in some situations) will prefer fast paced books, other will rather endeavour on a slow-paced book. [3] also have a pace value they call ”Intensifying”, for which we do not have any books in our database. Characterization This element has to do with the introversy or extroversy of the characters in the book. Readers often remember the characters in the book more easily than they remember the plot. Alas, the conception of a well developed character varies greatly among users1 , making this element less hospitable to analysis of appeal than the case is for the pace element. Frame The frame is about the tone of a book (melancholic, positive), its feeling (funny or romantic), and its atmosphere (menacing or elevating). though difficult to define, this element is often decisive for the reader’s choice. The book can be amusing, bleak, bittersweet2 Storyline The storyline is of course dependent on the previously discussed elements. But typical values3 will be Issue-oriented, Nonlinear or Open-ended. 2.2 Representing and modelling appeal elements The appeal elements are not directly manifest in the book text, let alone its metadata, and we need to find some representation so that a recommender sys- tem can take them into account. To this end we need to find some manifest indicators that can automatically match a recommendation request and a book using the appeal elements as evidence (in addition to other evidence), when recommending a book based on this recommendation request. Finding and using such indicators is a challenge, which character differs among the elements. Being metadata of different kinds rather than full con- tent, the texts we have are sparse, but on the other hand (for a portion of the books) include reader reviews, which should be a condense summarization of the book done by readers, the target group of a recommender system. 1 [3] lists 30 types of characters that can appear in fiction 2 [3]lists 58 categories of ”Tone”. 3 [3] lists 9 types of storyline One way of implicitly representing an appeal element (element-name element- value) is, using occurrences of sentences that are characteristic to some value of an appeal element. Feeding these to a Natural Language Processing-system, the NLP system may identify functionally similar sentences in any analyzed book- review to use in the classification of the books. In our implementation, a model of an appeal element is a summary of sentences that are likely to appear in a review of a book that has this value (or valence) of this element. This method has a potential for accuracy, but needs quite a large set of reader reviews given to books with known values (and valences) of the appeal element, and is extremely prone to overfitting. A simpler but less exact method will be identifying single words or word combinations, particularly adjectives used by readers when reviewing books of different values / valences of appeal elements. Such words need somehow to be classified, so that a system looking for appeal elements in reviews has a broader repertoire of words to look for than the one occurring in the training set. Matching can thereafter be done by attempted applying the same, or a slightly different model to the recommendation request, assuming that a rec- ommendation request and a review belong to the same genre. Here we have several options: – Retrieving books by a traditional retrieval model (using text-based metadata elements for matching) and then reranking so that books with appeal element values matching that of the request, rank ahead of other books – Weighing up books with matching appeal at retrieval time – Traditional retrieval accompanied by pseudo relevance feedback based on the appeal models. As our current main experimentation line is around pace, we will be more detailed discussing pace, than the other elements. 2.3 Pace Pace can be seen as a binary variable, either ”High-paced” (”Fast-paced” in NoveList terminology) or ”Low-Paced” (”Leisurely-paced”), making it the eas- iest element to model and represent, but at the same time less controllable. Saricks poses some questions the answers to which may provide clues as to the pacing: – Is the book densely written? – Are there short sentences / short paragraphs, short chapters? – Is there a straight line plot 3 Related work and our approach Our work belongs in the realm of content based recommender systems, like for example [4]. The main advantage of such systems is their independence of users and their history of reading and recommendations. As such, these system have Fig. 1. Characteristics of the pace appeal elements a better ability to recommend items not yet recommended to anyone, thereby better supporting serendipity. They are also less likely to serve very close material to what a reader already has read, thereby supporting novelty, but are prone to over-specialization, which somehow counteracts the advantages above. Saricks’ framework is reportedly being extensively used in libraries, and in recent years it is starting to gain more systematic use, prominently in a Reader’s Advisory resource like NoveList. NoveList4 is a paid service by EBSCO, marketed towards Reader Advisory (RA) services of libraries, active since the late 90’s. Among other book characteristics used for recommendation, They have, since 2010 also been recording and utilizing Saricks appeal elements. In a more research-related context, [5] has developed a conceptual approach (guiding the current research), of using Saricks elements in book recommen- dations. As a part of a Phd-work, [6] have experimented with automatic ex- traction of appeal elements from reviews using rules related to occurrences / co-ocurrences of types of words from reader reviews. Both the design approach and the evaluation approach are quite straight forward. The appeal element ex- traction is a combination of a finite list of words (mostly adjectives) expanded by wordnet-extracted synonyms, and rules for these words’ occurrence in the sen- tences of a review. The rules analyse governor - subordinate relations between pairs of words. Interestingly, [6] have assessed the quality of their ABET extractor by directly comparing its performance to NoveList’s recommendations using appraisals by Amazon Mechanical Turk workers as the gold standard, finding ABET more accurate. They also compared the performance their entire system Rabbit (of which ABET is a component) to other recommendation services by using Me- chanical Turk appraisers as gold standard when choosing new books that ”best relate” to each one from a sample of ten books. The evaluation strategy taken here is very practical, and the results certainly promising. Still we feel that our challenge here is different, as we wish to match books with recommendation requests (not having other books to relate our recommendations to), and we therefore feel that we need to take a slightly more general approach, which is based on a broader classification of Parts of Speech, particularly adjectives. 4 https://www.ebscohost.com/novelist Resembling [6], we will also need to take a part / whole approach, trying to see (a) whether our NLP-classification has the potential to elicit individual appeal elements (b) whether it is possible to classify recommendation requests the same way as user reviews (whether or not those two types belong to the same genre) and (c) whether correct identification indeed gives us better recommendations on the basis of textual recommendation requests. 4 Data, Experiments and Results 4.1 Our Data The SBS Suggestion Track’s (SST) data consist of metadata drawn from Li- braryThing and Amazon, describing about 2,8 million books, keyed by their ISBN (meaning the number of distinct works is somewhat lower, as ISBN keys manifestations of works). About half of these, (over 1.3 millions) have reader re- views as a part of their metadata. It is these reviews (free texts) that constitute the most important data of this paper. In order to prepare the data to adjective based analysis, we have so far been taking the following steps: – POS-tagging of all free texts of the reviews using the Apache OpenNlp5 – Collecting all adjectives, basic (< JJ >), comparative (< JJR >) and superlative (< JJS >) – normalizing the adjective-forms captured by the POS-tagger, and linking each review to the normalized forms of the adjectives. As we were preparing this year’s experiments, based on the approach we have taken, we have seen that the crunching of the data for preparing the adjective- based is extremely time-consuming and at the moment of writing the data are still in the preparation stage. Therefore we will, in this paper, limit ourselves to experiments based on document categorization. In addition, we have also, with great gratitude, obtained EBSCO NoveList data, categorizing books into value-categories based on several appeal elements. 4.2 Our Purpose and Overall Research Design As an overall, guiding design principle when approaching this issue, we intend to assign values or valences of appeal elements to unseen books (represented by their respective review texts), based on intellectually assigned values to a subset of the books, and building models based on reviews of the latter ones as described in the following subsections. We conduct a traditional text based search into an index created by selected parts of the metadata, and rerank the result so that books with appeal elements that match the appeal element of the request are assigned higher prominence in the result set, as depicted in Figure 2. 5 https://opennlp.apache.org/ Fig. 2. Overall experiment design Training the models The most straight-forward way of training appeal-element categorizers based on existing NLP-tools, is using reviews of books with known values (previously assigned by experts) to build document-categorization mod- els, that can use reviews of unseen books (where available) to classify those into appropriate categories (low vs. high valence, different intervals of element values a.s.o). Classifying the recommendation requests as described in the previous sec- tion, can provide us with an additional piece of evidence when matching those requests to books. In the pace case, with two mutually exclusive values, we are using the Apache OpenNLP (https://opennlp.apache.org/) Document Categorizer tool(https: //opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools. doccat) in order to train a number of models for the ability to distinguish (un- seen) Fast-paced from Leisurely-paced books by their review texts. The trained models differ by the number of training reviews (respectively 50, 75, 100, 150, 200, 250, 300, 400, 500, 750 and 1000 of each kind, respectively, 2000 reviews used to build the largest model), in order to find out the dependence of the ability to discriminate fast or leisurely paced books on the model size (number of reviews). The 2000 books (1000 of each value) comprising the largest model, constitute a superset of all the other training sets (See illustration in Figure 3(a)). We use the remaining NoveList pace-categorized books as a test set. In Figure 3(b) we show the prediction power of the model as a function of the number of documents in the training set used to create it. It is interesting to see the trendwise increase in prediction power as a function of the size of the model. This indicates the soundness of the approach, meaning that the pace element tells something about the book, and that the reader review entails a kind of a representation of the pace value. Also interesting is the better prediction of the leisurely-paced books. This ows to the fact that in this case the test set is closer in size to the training set, so more of the variability of the latter is represented in the former. Paradoxically this also indicates the overall soundness of the approach. We stress that the categorizer is used with default settings, and that work still remains fine-tuning the tool for the categorization task. (a) The EBSCO NoveList (b) Prediction power as a function of size pace-set Fig. 3. The prediction power of the model in terms of prediction precision Using Pace in Recommendation / Retrieval To test the approach’s po- tential to assist in a recommendation situation, we employ a two-step procedure: – ranking documents with traditional retrieval methods – reranking afterwords by matching the paces of the recommendation request and the book reviews The assumption is that the pace is additional evidence in the ranking process for recommendation satisfaction, and its affect is best measured at the primary part of the ranking list. We reorder the first n ranked documents returned for each topic, so that documents which pace value matches that of the request (as predicted by the procedure described below), are promoted to the top of the list, keeping their mutual order as in the original list, to that end using a stable sort. For the task of determining the pace of the topic itself, two approaches have been tested: – Predicting the values of the works attached to the topic in the topic set and applying a majority vote of those as the request value (Figure 4(a)) – Treating the request text as a review text and predicting its value directly. (Figure 4(b)) As the direct prediction strategy (the second above) gives better results in the current experiments, we only report the results of this strategy in the present pa- per, but hold the other strategy viable for later experiments. The attached work approach (first item above) also has the weakness that not all works attached to the topic have reader reviews associated with them. Pace as evidence Appeal elements, pace among these, are expected to serve as evidence, contributing to the successful recommendation. One way of find- ing out the optimal contribution would be to rerank upper sublists of various lengths of the original ranked list so that works whose pace match the request pace (predicted by the direct approach from previous section) and measure the performance of these reranked lists up against that of the original list (baseline). As seen in Figure 5, reranking only the sublists n=5-20 performs better than the baseline (level of evidence 0), whereas reranking longer parts of the list gives unpredictable results, introducing many books with matching pace but otherwise irrelevant to the recommendation request. 5 Conclusion As already explained, we see this as a continuous research endeavor, where the purpose is to directly utilize appeal elements in generating better recommenda- tions based on recommendation requests. We have started out trying to model the pace appeal element in both books’ reader reviews and the topics (recom- mendation requests), trying to see if matching those can give better recommen- dations. The result so far are promising in that they seem to indicate the overall soundness of the approach, but a better baseline would be needed to test the approach under more realistic conditions. We have so far been using the document categorization strategy as proposed by [5] and [1]. The next endeavour is based on occurrence of adjectives in reviews (indirectly inspired by [6]). Owing to the relatively large collection of pace values intellectually assigned to books, that we obtained from EBSCO, The current results are remarkably better than the 2015 results. Further use of the data, combining also other appeal elements than pace, indicate that the potential is far from exhausted. References 1. Fugleberg, J.R., Preminger, M.: Oauc’s participation in the clef2016 sbs search suggestion track. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum. Volume 1391., http://ceur-ws.org (2015) 2. Saricks, J.: Readers’ Advisory Service in the Public Library. ALA editions. American Library Association (2005) (a) Pace predicted through attached works (b) Pace directly predicted from request Fig. 4. Overall design - two strategies of predicting request pace Fig. 5. Performance based on level of evidence (length of reordered upper sublist) 3. Caplinger, V., Coleman, E., Coulter, D., Gardner, L., Kage, L., Keyser, C., Mor- gan, A., Reaser, E., Young, R.: The secret language of books, a guide to appeal. http://www.ebsco.com/promo/novelist-the-secret-language-of-books (2015) Promotion Brochure by Ebsco. Accessed: 2015-07-11. 4. Aciar, S., Zhang, D., Simoff, S., Debenham, J.: Informed recommender: Basing recommendations on consumer product reviews. Intelligent Systems, IEEE 22(3) (May 2007) 39–47 5. Fugleberg, J.R.: Automatisk klassifikasjon av bker basert p brukeranmeldelser: Et konsept. Master’s thesis, Hgskolen i Oslo og Akershus (2014) 6. Pera, M.S., Ng, Y.K.: Automating readers’ advisory to make book recommendations for k-12 readers. In: Proceedings of the 8th ACM Conference on Recommender Systems. RecSys ’14, New York, NY, USA, ACM (2014) 9–16