Recommending Interesting Writing using a Controllable, Explanation-Aware Visual Interface Rohan Bansal Jordan Olmstead Uri Bram The Browser The Browser The Browser rohan@thebrowser.com jordan@thebrowser.com uri@thebrowser.com Robert Cottrell Gabriel Reder Jaan Altosaar The Browser Stanford University Princeton University robert@thebrowser.com gkreder@stanford.edu altosaar@princeton.edu Visual Human Data Collection RankFromSets Interface Evaluation Figure 1: End-to-end pipeline for recommending writing to editors at The Browser with a controllable, explanation-aware visual interface. The RANKFROMSETS recommendation model [2] is trained on data consisting of positive examples from the editors’ history of curated articles, and negative examples from news sources. After training and offline evaluation of the recommendation model, RANKFROMSETS is deployed as a microservice on Amazon Web Services Lambda, with the visual interface hosted on Github Pages. Editors can control the recommender system using the visual interface, which can aid in their decision-making. The editors’ interrogation of the recommendation model informs further data collection and training. ABSTRACT lic demonstration,1 data collection, training and deployment We build a visual interface for recommending articles to edi- scripts, and model parameters.2 tors at The Browser, a curation service for interesting writing. Author Keywords From a large list of candidates, editors decide which articles are selected and shared with subscribers. To aid the editors content-based recommendation, open source, visual interface in this decision-making task, we build a visual interface for CCS Concepts a recommendation model, RANKFROMSETS (RFS) [2], that •Applied computing → Document searching; •Computing classifies articles based on their words. Control of the recom- methodologies → Learning from implicit feedback; Please mendation model is built into the visual interface. For example, use the 2012 Classifiers and see this link to embed them in the an editor can use a topic slider to receive a new list of recom- text: https://dl.acm.org/ccs/ccs_flat.cfm mendations according to topical words in articles. These topic sliders might be used to increase or decrease the ranking of INTRODUCTION articles with words related to crime, business, or technology. Creative nonfiction, longform journalism, and blog posts are The visual interface is also designed to be explanation-aware: examples of the types of articles curated by The Browser’s words that contribute positively or negatively to an article’s team of editors. The editors read a large number of articles ranking are displayed. For the backend of the visual interface, from various publications to select content to recommend to RFS is trained on historical data. In an offline empirical study, subscribers. we find that RFS outperforms BERT [4], a competitive classifi- In building a recommender system to help editors sift through cation model, in terms of recall. Further, we measure RFS to many documents, it is motivating to highlight the trade-off in be 10 times faster to train and to return predictions 2000 times user privacy intrinsic to recommender systems. A machine faster than BERT. This speed is a beneficial property for the learning model must exploit information about a user. How- visual interface, and we demonstrate that RFS can be deployed ever, the incentive structures of operating a recommender sys- on the free tier of AWS Lambda using a short python script tem within a business can influence decisions around privacy and numpy dependency. For reproducibility, transparency, and and transparency [5]. For example, business models that rely trust of the visual interface, we open source and release a pub- 1 https://the-browser.github.io/ Copyright (c) 2020 for this paper by its authors. Use permitted under Creative Com- recommending-interesting-writing/ mons License Attribution 4.0 International (CC BY 4.0). 2 https://github.com/the-browser/ IntRS ’20 - Joint Workshop on Interfaces and Human Decision Making for Recom- mender Systems, September 26, 2020, Virtual Event recommending-interesting-writing negative examples (articles seen but not selected by the ed- itors). Further, as our goal is to build an explanation-aware visual interface that can also serve to control recommenda- tions, and RFS is fast, interpretable, and simple to integrate into a user interface as we describe later. RFS is a recommendation model defined by a binary classifier. For a user u and item m with attributes xm (the set of unique words in an article), RFS is described by the probability of yum = 1 (user u consuming item m): p(yum = 1 | u, m) = σ ( f (u, xm )) , Figure 2: Visual interface to RANKFROMSETS includes topic where σ is the sigmoid function. To parameterize the binary sliders for setting user preferences as well as most important classifier in RFS, we use an inner product architecture: topic words for each found article ! > 1 f (u, xm ) = θu ∑ βj . (1) |xm | j∈xm on online advertising may engender recommender systems that upweight attention-grabbing content and hence time spent In this architecture, the user embedding θu includes a dimen- looking at ads. Such content might maximize a user’s time sion that is fixed to unity. Word embeddings β j (including a spent with a service over time at the expense of long-term bias dimension for every word) and the publication embed- user experience or consent. In comparison, privacy-preserving ding are fit with maximum likelihood estimation, and negative and open source tools such as the Signal encrypted messaging examples are sampled uniformly at random to balance positive service3 may provide improved user experience in terms of examples [1]. privacy-preserving, transparent, and explainable algorithms VISUAL INTERFACE and visual interfaces [3]. But the incentive structures for re- The visual interface is designed with RFS as the backend rec- leasing recommender systems and visual interfaces that exploit ommendation model. We describe how the inner product archi- private information about users are poor. There are few exam- tecture for RFS enables a visual interface that is interpretable ples of end-to-end, open source, free-to-deploy pipelines for to provide explanations for why an item is recommended, and recommending content to users using a visual interface. This enables control so users can filter recommendations to help motivates building and deploying a recommendation model with decision-making. and corresponding explanation-aware visual interface to give users control, and inform them about how data is being used Explanation-aware recommendation The user embedding to make recommendations. θu and word embeddings β j in Equation (1) can be used to interpret a recommendation. The logit for a given document We build an end-to-end recommender system visual in- with a set of words xm is the sum of per-word logits, which terface to address two aims: (1) to aid editors at The are computed as the inner product of the user embedding and Browser in their decision-making task, and give them con- word embedding. The per-word contribution of a word in a trol through an explanation-aware interface, and (2) to release document to the logit that determines the document’s ranking a lightweight, performant, open-source visual interface frame- in a list of recommendations is work for explanation-aware recommender systems for docu- ment recommendation. In an offline evaluation, we show that wu j = θu> β j . (2) the recommendation model we use for the visual interface out- performs BERT, a competitive document classification model. This weight wu j helps explain why a document was recom- In a qualitative study, the control and explanations provided mended, using information about both the user u and the word by the visual interface help editors in their decision-making j. In the visual interface, words in a document are first sorted and help find bugs in the recommendation model. by their contributions to a document’s logit wu j , and the top words are displayed. Similarly, words that lower a document’s RECOMMENDATION MODEL ranking are also displayed, to inform a user of which words RANKFROMSETS ( RFS ) is the recommendation model that detract from the recommendation of a document. powers the visual interface; the main part of the pipeline illus- trated in Figure 1. RFS scales to large numbers of articles, and Interface for controlling recommendations In a decision- can maximize the evaluation metric of recall [1, 2]. Recall, or making task, a user such as an editor for The Browser may the fraction of true positives returned by a recommendation wish to filter recommendations according to topics such as model, is an appropriate evaluation metric for recommending crime, technology, or business. The recommendations output interesting writing to editors at The Browser. A recommenda- by RFS can be controlled, by altering the per-word contribu- tion model such as RFS can be readily backtested with recall tions in Equation (2) according to whether a word is topical. as an evaluation metric, as historical data contains positive This is accomplished by first calculating words related to a examples (articles selected by the editors) but rarely contains topic word using pre-trained word embeddings from BERT [4, 7]. Words related to a topic are defined by a heuristic: the 3 https://signal.org/ cosine similarity between all words and a topic word such Recommendation Model Recall @ 1000 (%) 60% RANKFROMSETS 53.1 Recall BERT 46.6 40% RankFromSets 20% BERT Table 1: RFS outperforms BERT in an offline evaluation, on a task of predicting which articles editors at The Browser would 0% feature based on words in the articles. 0 500 1000 1500 Time (seconds) Figure 3: RANKFROMSETS achieves better performance faster as ‘business’ are computed, and the top 15 words closest in than BERT in terms of validation recall during training. cosine distance are stored as topical words. Then, a slider in a visual interface is used to increase or decrease the per-word contributions of topical words to a document’s logit. Let the warmup steps, with a batch size of 32 and maximum input user-input slider value be α, and the set of topical word in- length of 512 as in Devlin et al. [4] and Wolf et al. [7]. A grid dices be T . Then the user-controlled version of Equation (1) search is performed over learning rates of {2, 3, 4, 5} × 10−5 , becomes of {102 , 103 , 104 }, and total training steps of ! 2 3steps warmup 4 , 105 } × 5. The model is trained on an NVIDIA 1 {10 , 10 , 10 f (u, xm ) = θu> (1 − I[ j ∈ T ])β j + I[ j ∈ T ]αsgn(wu j )β j Tesla . |x | ∑ m j∈xm V100 GPU. (3) The best-performing model of RFS is selected for deployment, The sign function sgn( · ) is applied to the per-word contribu- and recall is evaluated on the test set, after using early stopping tion to a document’s logit. This is included since a word might according to validation recall. The results are shown in Table 1, contribute negatively to a document’s logit, yet a user may and RFS outperforms BERT by 14%. Further, RFS achieves wish to increase the weight of a related topical word. better performance ten times faster than BERT, as shown in Figure 3. In a test to measure the speed of recommending 104 EVALUATION held-out articles, RFS ranked all articles in 120 ms on a CPU, We conduct an offline empirical study of the performance of while BERT took 4 m 54 s to rank the articles on an NVIDIA RANKFROMSETS to assess its performance as a recommenda- Tesla V100 GPU. This represents a 2000-fold improvement in tion model. Then we qualitatively evaluate the visual interface speed, which is beneficial for the controllable visual interface to study whether the explanation-aware, controllable interface that requires Equation (3) to be quickly computed in response enabled by RFS can help make editors at The Browser make to user input. better decisions. Qualitative evaluation In a user study, editors at The Data collection and preprocessing For positive examples, Browser provided feedback that they used the visual interface we use the historical set of articles curated by editors at The to choose articles, and found this to be an improved workflow. Browser. We augment the training data with articles selected The control over recommendations, and explanation-aware by the editors of other curation services, and treat all positively- visual interface provided by RFS helped elicit bugs in data col- labeled examples curated by editors as data from a single user lection (such as foreign language sources, or fiction writing) due to a paucity of data. We use articles from news websites as and provides an enjoyable user experience. examples with negative labels, and collect additional articles with negative labels from websites most-featured by the editors DEPLOYMENT to mimic the editorial process of reading a large swath of The visual interface is deployed on Github Pages, with the articles in a feed and distilling an article list to a select few. backend, RFS, deployed as a microservice on Amazon Web For preprocessing the data we use the tokenizer released by Services Lambda. Equation (3) is cheap to compute, so the Devlin et al. [4] and discard words not recognized by the lambda function is a short python script that requires numpy tokenizer. This procedure results in a dictionary with 30k as a dependency, compared to BERT which would require a words, and 646k datapoints with 27k positive labels. hosted GPU solution. RFS recommends recent articles from the editors’ reading list of feeds. As a proof of concept, we Metrics Performance of the recommendation models is as- include a tab for coronavirus-related articles that users can sessed with recall, and 15% of the datapoints are held out for search through using the sliders and Equation (3). the validation and test sets respectively. DISCUSSION Experimental setup: RankFromSets We cross-validate us- We built a visual interface for a recommender system powered ing the RMSProp optimizer [6] with a momentum of 0.9 and by RFS, a flexible recommendation model. Empirically, we grid search over learning rates of {10−2 , 10−3 , 10−4 , 10−5 }, whether or not to initialize from pre-trained BERT embed- demonstrated that RFS outperforms BERT in an offline evalu- dings [7], and embedding sizes of {10, 25, 50, 100, 500, 1000}. ation, while being orders of magnitude faster during training This model is trained on an NVIDIA Tesla P100 GPU. and recommendation. By deploying RFS to AWS Lambda and hosting the visual interface on Github Pages, we demonstrated Experimental setup: BERT To fine-tune BERT, we use the a fully open-source pipeline for creating an explanation-aware, AdamW optimizer with a linear learning rate scheduler and controllable visual interface for document recommendation for editorial decision-making. Future work includes studying whether the transparency and control provided by open-source recommendation systems can improve user experience and inform users as to how recommendation models influence attention online. Acknowledgments The authors are grateful to Christian Bjartli for help with data collection. References [1] Jaan Altosaar. “Probabilistic Modeling of Structure in Science: Statistical Physics to Recommender Systems”. PhD thesis. Princeton University, 2020. [2] Jaan Altosaar, Wesley Tansey, and Rajesh Ranganath. “RankFromSets: Scalable Set Recommendation with Op- timal Recall”. In: American Statistical Association Sym- posium on Data Science & Statistics (2020). [3] K. Cohn-Gordon et al. “A Formal Security Analysis of the Signal Messaging Protocol”. In: 2017 IEEE European Symposium on Security and Privacy (EuroS P). 2017, pp. 451–466. [4] Jacob Devlin et al. “BERT: Pre-training of Deep Bidi- rectional Transformers for Language Understanding”. In: Association for Computational Linguistics. Minneapolis, Minnesota: Association for Computational Linguistics, June 2019, pp. 4171–4186. DOI: 10.18653/v1/N19-1423. URL : https://www.aclweb.org/anthology/N19-1423. [5] Nicholas Diakopoulos. “Oxford Handbook of Ethics and AI”. In: ed. by Markus Dubber, Frank Pasquale, and Sunit Das. Oxford University Press, 2020. Chap. Accountabil- ity, Transparency, and Algorithms. [6] Tijmen Tieleman and Geoffrey Hinton. “Lecture 6.5- rmsprop: Divide the gradient by a running average of its recent magnitude.” In: COURSERA: Neural Networks for Machine Learning (2012). [7] Thomas Wolf et al. “HuggingFace’s Transformers: State- of-the-art Natural Language Processing”. In: ArXiv abs/1910.03771 (2019).