BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval Exploring Choice Overload in Related-Article Recommendations in Digital Libraries Felix Beierle1 , Akiko Aizawa2 , and Joeran Beel2,3 1 Service-centric Networking Technische Universität Berlin / Telekom Innovation Laboratories Berlin, Germany beierle@tu-berlin.de 2 National Institute of Informatics (NII) Digital Content and Media Sciences Research Division Tokyo, Japan {aizawa,beel}@nii.ac.jp 3 Trinity College Dublin School of Computer Science and Statistics Intelligent Systems Discipline, Knowledge and Data Engineering Group ADAPT Centre Dublin, Ireland joeran.beel@adaptcentre.ie Abstract. We investigate the problem of choice overload – the difficulty of making a decision when faced with many options – when displaying related-article recommendations in digital libraries. So far, research re- garding to how many items should be displayed has mostly been done in the fields of media recommendations and search engines. We analyze the number of recommendations in current digital libraries. When brows- ing fullscreen with a laptop or desktop PC, all display a fixed number of recommendations. 72% display three, four, or five recommendations, none display more than ten. We provide results from an empirical eval- uation conducted with GESIS ’ digital library Sowiport, with recommen- dations delivered by recommendations-as-a-service provider Mr. DLib. We use click-through rate as a measure of recommendation effective- ness based on 3.4 million delivered recommendations. Our results show lower click-through rates for higher numbers of recommendations and twice as many clicked recommendations when displaying ten instead of one related-articles. Our results indicate that users might quickly feel overloaded by choice. Keywords: recommendation, recommender system, recommendations as a service, digital library, choice overload 1 Introduction More and more information is available online for academic researchers in dig- ital libraries [18]. One way to deal with the flood of information is to utilize 51 BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval recommender systems that filter information and recommend articles related to those ones a user liked previously or is currently reading. A major challenge in recommending a list of related articles is to decide how many related articles to recommend before a user becomes dissatisfied with the recommender system due to choice overload. Developing Mr. DLib (Machine-readable Digital Library)4 [6, 4], a recommendations-as-a-service (RaaS) provider, we currently deliver recommen- dations to the digital library Sowiport5 [12]. Soon, we will also deliver recom- mendations to JabRef6 [11]. Developing such a recommender system for digital libraries, currently, there are no information to be found about how many rec- ommendations to deliver and display. Figure 1 shows an example of using Mr. DLib in Sowiport. While a recommender system can filter for the most relevant Fig. 1. Screenshot of the Sowiport Digital Library showing related items on the left hand side. content for the user, the displayed recommended items can still be overwhelm- ing. Schwartz describes the issue as the ”tyranny of choice” [19]: Confronted with too many options, participants in studies tend to not decide for any option. 4 http://mr-dlib.org 5 http://sowiport.gesis.org 6 http://www.jabref.org 52 BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval In this paper, we investigate the influence of the size of the recommendation set on choice overload in digital libraries. In order to do so, we: 1. Examine how many items other recommender systems in digital libraries recommend 2. Conduct an empirical evaluation to see how different numbers of recommen- dations affect the clicks on related-article recommendations7 2 Related Work There have been several studies investigating choice overload with respect to consumer goods (for example [1, 10]). Based on the MovieLens dataset, Bollen et al. investigated the relationship between item set variety, item set attrac- tiveness, choice difficulty, and choice satisfaction [8]. They suggest to diversify the recommendation set by including some lower quality recommended items in order to increase perceived recommendation variety and choice satisfaction. In another study, Willemsen et al. further analyzed the relationship of diversi- fication, choice difficulty, and satisfaction [21]. Here, the authors also used the MovieLens dataset. Other related studies looked into the number of search results to be dis- played. Jones et al. conclude that the screen size is a determining factor with respect to how many search result items users interact with [14], which is con- firmed in a newer study by Kim et al. [16]. Linden reports that although Google users claimed to want more search results, traffic dropped with the display of an increased number of search results [17]. Google suspected the extra loading time to be play a role in this. Azzopardi and Zuccon developed a cost model for browsing search results, taking into account screen size and search results page size [2]. They concluded that displaying 10 results is close to the minimum cost. Kelly and Azzopardi studied the effects of displaying different sizes of search result pages [15]. In their study, they used three, six, and ten search results. One of their main findings is that subjects of the study who were shown ten search results per page viewed and saved significantly more documents, while more time is spent on earlier search results, if the number of results per page is less. While Chiravirakul and Payne’s study suggests that choice dissatisfac- tion happens when there is a lack of time for choosing links [9], the study by Oulasvirta et al. suggests that it is the search result page size that is causing choice overload or the ”paradox of choice.” Oulasvirta et al. conducted a user study with 24 participants and looked into the user’s satisfaction with the results when displaying six or 24 results. For future work, they suggest also looking into objective behavior measurements like click-through rates. In our work, we focus on a different domain. We will investigate the problem of choice overload in digital libraries – in contrast to movie recommendations 7 All data relating to this paper is available on http://datasets.mr-dlib.org, in- cluding a table of the delivered and clicked recommendations, the information about the investigated digital libraries, and the figures presented in this paper. 53 BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval and search results. The question of how many recommendations to display and choice overload has not been studied in the domain of digital libraries, to the best of our knowledge. In a recent literature survey on more than 200 articles about research paper recommender systems, no one discussed or researched this topic [5]. Furthermore, instead of user interviews or small scale studies, we investigate the real clicks logged in an actively used system. We will explore, to what extend click-through rates (CTR) reflect the findings of the cited studies. 3 Methodology In order to investigate choice overload in digital libraries, we first examine how many items other recommender systems in digital libraries display. We investi- gated 63 digital libraries8 and reference managers with search interfaces9 . We considered recommendations that are being displayed when an item from the search results is selected. In a few cases, the number of displayed recommenda- tions of related items was dependent on the size of the browser window. For the numbers given in the following, we assumed a full screen browser window on a laptop computer (13” display with 1280x800 resolution). In a second step, we conducted an experimental evaluation to investigate how different numbers of recommendations affect the click rates on related-article recommendations. Click-through rates are a good way to study the users’ actual behavior when displaying recommended items in real situations. We analyze data from 3.4 million recommendations. The data was obtained from users of the academic search engine Sowiport10 , which is run by GESIS - Leibniz-Institute for the Social Sciences 11 , which is the largest infrastructure institution for the Social Sciences in Germany. Sowiport contains about 9.6 million literature references and 50,000 research projects from 18 different databases, mostly relating to the social and political sciences. Literature references usually cover keywords, classifications, author(s), and journal or conference information, and if available: citations, references, and links to full texts. Sowiport co-operates with Mr. DLib, an open Web Service to provide schol- arly literature-recommendations-as-a-service (Figure 2). This means that all computations relating to the recommendations run on Mr. DLibs servers, while the presentation takes place on Sowiports website. Our recommender system shows related-article recommendations on each articles detail page in Sowiport (see Figure 1). Whenever such a detail page is requested by a user, the rec- ommender system randomly chooses one of four recommendation approaches to generate recommendations: 1. stereotype recommendations, 2. most popu- lar recommendations, 3. content-based filtering (CBF), and 4. random recom- mendations. We measured the effectiveness of the recommendation approaches 8 Most of them listed on https://en.wikipedia.org/wiki/List_of_digital_ library_projects 9 See http://datasets.mr-dlib.org for detailed results. 10 Some explanations about Sowiport and Mr. DLib are from [3]. 11 http://www.gesis.org 54 BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval Fig. 2. The recommendation process of Sowiport and Mr. DLib. with click-through rate (CTR). CTR describes the ratio of delivered to clicked recommendations. For instance, when 1,000 recommendations were delivered, and 8.4 of these recommendations were clicked, the average CTR would be 8.4/1,000=0.84%. The assumption is that the higher the CTR, the more effec- tive is the recommendation approach. There is some discussion to what extend CTR is appropriate for measuring recommendation effectiveness, but overall it has been demonstrated to be a meaningful and well-suited metric [13, 7, 20]. For our evaluation, we randomly displayed one to fifteen recommendations. Our expectation is that at first, by increasing the number of displayed rec- ommendations, there will be an increase in the CTR. By displaying more and more recommendations, we expect to reach a maximum in CTR at some point. After that, when displaying more recommendations, we expect the CTR to drop, indicating choice overload. Similarly, for the clicks, we would expect to find a maximum at a certain number of displayed recommendations. Plotting the CTR and the average clicks we show this expectation in Figure 3. Here, the CTR, the orange line, increases with the number of displayed recommendations, reaches a maximum at four, and decreases afterwards, indicating choice overload. The clicks, the gray line, reach a maximum at five displayed recommendations. In that case, we would decide for four (maximum CTR) or five (maximum clicks) recommendations. Alternatively, we would have expected results as in Figure 4. Here, the CTR declines with increasing number of displayed recommendations. The absolute clicks have a maximum at four. The question then would be: What is better – a higher CTR or the maximum of absolute clicks? Depending on the answer, we would choose a recommendation set size of between one and four. As we will show in the following section, the results are quite different from our expectations. 55 BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval 1.00% 0.90% 40 45 0.80% 0.90% 40 0.80% 0.60% 36 36 35 0.70% 0.50% 30 0.60% 0.80% 25 Clicks 24 CTR 0.50% 0.60% 21 20 0.40% 16 15 0.30% 0.30% 12 0.20% 9 10 0.20% 0.10% 5 0.10% 5 5 0.00% 0.05% 0 1 2 3 4 5 6 7 8 9 10 Number of Recommendations per Recommendation Set Click-Through Rate Clicks per 1,000 Rec. Sets Fig. 3. Expected click-through rate and clicks by displayed number of recommenda- tions. 0.80% 0.70% 0.67% 25 22 0.70% 20 0.60% 0.63% 19 18 0.50% 15 Clicks 13 0.56% CTR 0.40% 0.30% 10 7 0.35% 0.20% 4 4 4 3 3 5 0.10% 0.07% 0.00% 0.05% 0.05% 0.04% 0.03% 0 1 2 3 4 5 6 7 8 9 10 Number of Recommendations per Recommendation Set Click-Through Rate Clicks per 1,000 Rec. Sets Fig. 4. Alternative expected click-through rate and clicks by displayed number of rec- ommendations. 4 Results 4.1 Number of Recommendations in Existing Digital Libraries 19 (30%) of the 63 digital libraries displayed recommendations for related items. Figure 5 shows the distribution of recommendations for those 19 libraries. Most libraries (72% of those that display recommendations) display three, four, or five recommendations, none is displaying more than 10 or less than three. It is also notable that they all always show a fixed number of related-articles, one could also imagine displaying varying numbers. Looking for related items in the database, there can be varying scores of relevance. One way of displaying a varying amount of related-article recommendations would be to take into account such relevance scores, e.g., giving 10 recommendations if there are 10 highly 56 BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval 30% 28% 28% Percentage of Digital Libraries 25% 20% 16% 15% 12% 10% 8% 4% 4% 5% 0% 1 2 3 4 5 6 7 8 9 10 Number of Related-Article Recommenations Fig. 5. Number of displayed recommendations in current digital libraries. relevant related-articles, and giving only two recommendations, if there are only two related-articles that have a relevance score above a certain threshold. We do not know how the numbers of related articles in the reviewed recom- mender systems were chosen by the operators. We assume that they either just arbitrarily chose the numbers, or did some experiments but did not publish the results. In the following, by measuring CTRs, we want to investigate and mea- sure the effect the number of displayed recommendations has and if the CTR indicates choice overload for certain numbers of displayed recommendations. 4.2 Experiment with Varying Number of Recommendations The solid orange line in Figure 6 shows the CTR by the number of displayed recommendations. The higher the number of recommendations, the lower the overall CTR is. The dashed line shows the average absolute number of clicked recommendations (per 1,000 recommendations). The bigger the recommendation set size is, the higher the number of absolute clicks is. When only one recommendation was displayed, the CTR was 0.84% on av- erage. This means, when our recommender system delivered 1,000 times one recommendation each, 8.4 recommendations were clicked. For two displayed rec- ommendations, the CTR was only 0.49% on average. This means, when our recommender system delivered 1,000 times two recommendations each (2,000 in total), 9.8 recommendations were clicked (half clicked on the first one, half on the second one). Overall, if one or two recommendations are displayed, does not make a big difference in the absolute number of clicks, it only increases by 17% (8.4 to 9.8). When fifteen recommendations were shown, the CTR was at the minimum of 0.14% on average, while the absolute number of clicks was at the maximum of 21.4. Comparing these results with our expectations given in Section 3, we can see that the CTR unexpectetly decreases rapidly and has a clear maximum at 57 BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval 0.90% 0.84% 25 21.4 0.80% 20.2 20.1 0.70% 17.7 20 17.4 15.7 15.7 16.0 0.60% 14.2 0.49% 15 0.50% Clicks 13.5 CTR 12.6 0.40% 11.7 10.0 10 9.8 0.30% 8.4 0.25% 0.33% 0.20% 0.17% 0.16% 0.16% 0.14% 0.20% 0.29% 5 0.23% 0.20% 0.10% 0.16% 0.15% 0.14% 0.00% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of Recommendations per Recommendation Set Click-Through Rate Clicks per 1,000 Rec. Sets Fig. 6. Click-through rates (solid) and average absolute number of clicked recommen- dations (dashed) with respect to the number of displayed recommendations. one displayed recommendation. Furthermore, contrary to our expectations, the absolute clicks keep increasing instead of having a maximum value at a few recommendations. The results show an under-proportional increase in average clicks on the displayed recommendations. Displaying twice as many recommen- dations does not double the clicks. In order for the absolute number of clicked recommendations (per 1,000) to double from 8.4 (for one displayed recommen- dation) to 17, the number of displayed related-articles has to be raised to 10 or 11. When 15 recommendations are displayed, only 2.5 as many recommendations are clicked compared to displaying a single recommendation. Regarding choice overload this implies that having more recommendations to choose from does, in general, only create a small incentive for the user to click on more of the dis- played recommended items. There are some points to consider when interpreting these results. Many documents in Sowiport only have sparse information, thus they might not be interesting for the users. Another possible option why the experimental results are so unclear about the number of recommendations to give is that the relevance of the recommendations might have been too low, so that many users did not click further recommendations after clicking the first one. In that case, the research should be repeated when we are able to deliver better recommendations. The session-length is another aspect to consider. For instance, if one user visits two pages and on gets 15 recommendations on each page, we assume the CTR will be higher than for a user who looks at 10 pages and gets 15 recommendations on each page. An additional aspect of sessions is that so far, we did not filter for recommendations that have already been shown to a user. So, if a user looks at 15 detail pages and gets 10 related-article recom- 58 BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval mendations on each, most likely there will be duplicate recommendations, and hence the CTR decreases the more recommendations are shown. Another aspect to consider in the results’ interpretation is how users might use Sowiport. If the user clicks on a recommendation, a new tab is opened. If the recommendation was good, she might forget about the other open tab – especially if there are further good recommendations shown in the new tab. 5 Conclusion and Future Work The average clicks on displayed recommendations under-proportionally increase with the number of displayed items. In order for the clicks to double, the size of the recommendation set has to be increased from one to 10 or 11. These numbers might imply that the users quickly feel overloaded by choice. The results differ from our expectations. Just based on these numbers, we could conclude that we should only display one recommendation because the CTR is highest – or to keep displaying even more than 15 recommendations until the number of absolute clicks does not increase anymore. Further research will be necessary to determine a good number of recommended items to display. Our results are based on Sowiport. Further research is necessary to confirm if our findings also apply to other digital libraries. We therefore plan to repeat our research, for instance, with JabRef and the library of the Technical Univer- sity of Munich. Future work could also include using other evaluation methods and metrics than CTR (e.g., a user study, or user ratings, or tracking which recommended item were actually exported or saved) or making a survey and asking the operators of the other digital libraries how they decided the number of displayed recommendations. Furthermore, a question to be discussed is what recommender systems in digital libraries should try to achieve, e.g., maximizing CTR, maximizing the number of clicked recommendations, etc. Acknowledgments. This work has received funding from project DYNAMIC12 (grant No 01IS12056), which is funded as part of the Software Campus initia- tive by the German Federal Ministry of Education and Research (BMBF). This work was also supported by a fellowship within the FITweltweit programme of the German Academic Exchange Service (DAAD). This publication also has emanated from research conducted with the financial support of Science Foun- dation Ireland (SFI) under Grant Number 13/RC/2106. We are further grateful for the support provided by Sophie Siebert. References 1. Arunachalam, B., Henneberry, S.R., Lusk, J.L., Norwood, F.B.: An Empirical In- vestigation into the Excessive-Choice Effect. American Journal of Agricultural Eco- nomics 91(3), 810–825 (Aug 2009) 12 http://www.dynamic-project.de 59 BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval 2. Azzopardi, L., Zuccon, G.: Two Scrolls or One Click: A Cost Model for Brows- ing Search Results. In: Advances in Information Retrieval. pp. 696–702. Springer, Cham (Mar 2016) 3. Beel, J., Dinesh, S., Mayr, P., Carevic, Z., Raghvendra, J.: Stereotype and Most- Popular Recommendations in the Digital Library Sowiport. In: Proceedings of the 15th International Symposium of Information Science (ISI). pp. 96–108. Verlag Werner Hülsbusch, Glückstadt, Germany (2017) 4. Beel, J., Gipp, B., Aizawa, A.: Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia (Pre-print). In: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) (2017) 5. Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research paper recommender systems: A literature survey. International Journal on Digital Libraries pp. 1–34 (2016) 6. Beel, J., Gipp, B., Langer, S., Genzmehr, M., Wilde, E., Nürnberger, A., Pitman, J.: Introducing Mr. DLib, a Machine-readable Digital Library. In: Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL‘11). pp. 463– 464. ACM (2011), available at http://docear.org 7. Beel, J., Langer, S.: A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems. In: Research and Advanced Technology for Digital Libraries. pp. 153–168. Springer, Cham (Sep 2015) 8. Bollen, D., Knijnenburg, B.P., Willemsen, M.C., Graus, M.: Understanding Choice Overload in Recommender Systems. In: Proceedings of the Fourth ACM Confer- ence on Recommender Systems. pp. 63–70. RecSys ’10, ACM (2010) 9. Chiravirakul, P., Payne, S.J.: Choice Overload in Search Engine Use? In: Pro- ceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems. pp. 1285–1294. CHI ’14, ACM (2014) 10. Fasolo, B., McClelland, G.H., Todd, P.M.: Escaping the tyranny of choice: When fewer attributes make choice easier. Marketing Theory 7(1), 13–26 (Mar 2007) 11. Feyer, S., Siebert, S., Gipp, B., Aizawa, A., Beel, J.: Integration of the Scientific Recommender System Mr. DLib into the Reference Manager JabRef. In: Proceed- ings of the 39th European Conference on Information Retrieval (ECIR) (2017) 12. Hienert, D., Sawitzki, F., Mayr, P.: Digital Library Research in Action: Supporting Information Retrieval in Sowiport. D-Lib Magazine 21(3/4) (Mar 2015) 13. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately Interpret- ing Clickthrough Data As Implicit Feedback. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval. pp. 154–161. SIGIR ’05, ACM (2005) 14. Jones, M., Marsden, G., Mohd-Nasir, N., Boone, K., Buchanan, G.: Improving Web interaction on small displays. Computer Networks 31(11–16), 1129–1137 (May 1999) 15. Kelly, D., Azzopardi, L.: How Many Results Per Page? A Study of SERP Size, Search Behavior and User Experience. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 183–192. SIGIR ’15, ACM (2015) 16. Kim, J., Thomas, P., Sankaranarayana, R., Gedeon, T., Yoon, H.J.: Eye-Tracking Analysis of User Behavior and Performance in Web Search on Large and Small Screens. Journal of the Association for Information Science and Technology 66(3), 526–544 (Mar 2015) 17. Linden, G.: Marissa Mayer at Web 2.0. http://glinden.blogspot.com/2006/11/ marissa-mayer-at-web-20.html (Nov 2006) 60 BIR 2017 Workshop on Bibliometric-enhanced Information Retrieval 18. Noorden, R.V.: Global scientific output doubles every nine years. Nature Blog, http://blogs.nature.com/news/2014/05/ global-scientific-output-doubles-every-nine-years.html (2014) 19. Schwartz, B.: The tyranny of choice. Scientific American 290(4), 70–75 (2004) 20. Schwarzer, M., Schubotz, M., Meuschke, N., Breitinger, C., Markl, V., Gipp, B.: Evaluating Link-based Recommendations for Wikipedia. In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL). pp. 191–200 (Jun 2016) 21. Willemsen, M.C., Graus, M.P., Knijnenburg, B.P.: Understanding the role of latent feature diversification on choice difficulty and satisfaction. User Modeling and User- Adapted Interaction 26(4), 347–389 (Oct 2016) 61