Recommending Interesting Writing using a Controllable,
           Explanation-Aware Visual Interface
               Rohan Bansal                                           Jordan Olmstead                                Uri Bram
                The Browser                                              The Browser                                The Browser
           rohan@thebrowser.com                                    jordan@thebrowser.com                        uri@thebrowser.com

             Robert Cottrell                                           Gabriel Reder                               Jaan Altosaar
               The Browser                                           Stanford University                        Princeton University
          robert@thebrowser.com                                     gkreder@stanford.edu                      altosaar@princeton.edu


                                                                                           Visual                              Human
 Data Collection                                 RankFromSets
                                                                                           Interface                           Evaluation

Figure 1: End-to-end pipeline for recommending writing to editors at The Browser with a controllable, explanation-aware visual
interface. The RANKFROMSETS recommendation model [2] is trained on data consisting of positive examples from the editors’
history of curated articles, and negative examples from news sources. After training and offline evaluation of the recommendation
model, RANKFROMSETS is deployed as a microservice on Amazon Web Services Lambda, with the visual interface hosted on
Github Pages. Editors can control the recommender system using the visual interface, which can aid in their decision-making. The
editors’ interrogation of the recommendation model informs further data collection and training.

ABSTRACT                                                                              lic demonstration,1 data collection, training and deployment
We build a visual interface for recommending articles to edi-                         scripts, and model parameters.2
tors at The Browser, a curation service for interesting writing.
                                                                                      Author Keywords
From a large list of candidates, editors decide which articles
are selected and shared with subscribers. To aid the editors                          content-based recommendation, open source, visual interface
in this decision-making task, we build a visual interface for                         CCS Concepts
a recommendation model, RANKFROMSETS (RFS) [2], that                                  •Applied computing → Document searching; •Computing
classifies articles based on their words. Control of the recom-                       methodologies → Learning from implicit feedback; Please
mendation model is built into the visual interface. For example,                      use the 2012 Classifiers and see this link to embed them in the
an editor can use a topic slider to receive a new list of recom-                      text: https://dl.acm.org/ccs/ccs_flat.cfm
mendations according to topical words in articles. These topic
sliders might be used to increase or decrease the ranking of                          INTRODUCTION
articles with words related to crime, business, or technology.                        Creative nonfiction, longform journalism, and blog posts are
The visual interface is also designed to be explanation-aware:                        examples of the types of articles curated by The Browser’s
words that contribute positively or negatively to an article’s                        team of editors. The editors read a large number of articles
ranking are displayed. For the backend of the visual interface,                       from various publications to select content to recommend to
RFS is trained on historical data. In an offline empirical study,                     subscribers.
we find that RFS outperforms BERT [4], a competitive classifi-                        In building a recommender system to help editors sift through
cation model, in terms of recall. Further, we measure RFS to                          many documents, it is motivating to highlight the trade-off in
be 10 times faster to train and to return predictions 2000 times                      user privacy intrinsic to recommender systems. A machine
faster than BERT. This speed is a beneficial property for the                         learning model must exploit information about a user. How-
visual interface, and we demonstrate that RFS can be deployed                         ever, the incentive structures of operating a recommender sys-
on the free tier of AWS Lambda using a short python script                            tem within a business can influence decisions around privacy
and numpy dependency. For reproducibility, transparency, and                          and transparency [5]. For example, business models that rely
trust of the visual interface, we open source and release a pub-
                                                                                      1 https://the-browser.github.io/
Copyright (c) 2020 for this paper by its authors. Use permitted under Creative Com-   recommending-interesting-writing/
mons License Attribution 4.0 International (CC BY 4.0).                               2 https://github.com/the-browser/
IntRS ’20 - Joint Workshop on Interfaces and Human Decision Making for Recom-
mender Systems, September 26, 2020, Virtual Event
                                                                                      recommending-interesting-writing
                                                                    negative examples (articles seen but not selected by the ed-
                                                                    itors). Further, as our goal is to build an explanation-aware
                                                                    visual interface that can also serve to control recommenda-
                                                                    tions, and RFS is fast, interpretable, and simple to integrate
                                                                    into a user interface as we describe later.
                                                                    RFS is a recommendation model defined by a binary classifier.
                                                                    For a user u and item m with attributes xm (the set of unique
                                                                    words in an article), RFS is described by the probability of
                                                                    yum = 1 (user u consuming item m):
                                                                                  p(yum = 1 | u, m) = σ ( f (u, xm )) ,

Figure 2: Visual interface to RANKFROMSETS includes topic           where σ is the sigmoid function. To parameterize the binary
sliders for setting user preferences as well as most important      classifier in RFS, we use an inner product architecture:
topic words for each found article                                                                               !
                                                                                                  >    1
                                                                                    f (u, xm ) = θu         ∑ βj .           (1)
                                                                                                     |xm | j∈xm
on online advertising may engender recommender systems
that upweight attention-grabbing content and hence time spent       In this architecture, the user embedding θu includes a dimen-
looking at ads. Such content might maximize a user’s time           sion that is fixed to unity. Word embeddings β j (including a
spent with a service over time at the expense of long-term          bias dimension for every word) and the publication embed-
user experience or consent. In comparison, privacy-preserving       ding are fit with maximum likelihood estimation, and negative
and open source tools such as the Signal encrypted messaging        examples are sampled uniformly at random to balance positive
service3 may provide improved user experience in terms of           examples [1].
privacy-preserving, transparent, and explainable algorithms         VISUAL INTERFACE
and visual interfaces [3]. But the incentive structures for re-     The visual interface is designed with RFS as the backend rec-
leasing recommender systems and visual interfaces that exploit      ommendation model. We describe how the inner product archi-
private information about users are poor. There are few exam-       tecture for RFS enables a visual interface that is interpretable
ples of end-to-end, open source, free-to-deploy pipelines for       to provide explanations for why an item is recommended, and
recommending content to users using a visual interface. This        enables control so users can filter recommendations to help
motivates building and deploying a recommendation model             with decision-making.
and corresponding explanation-aware visual interface to give
users control, and inform them about how data is being used         Explanation-aware recommendation The user embedding
to make recommendations.                                            θu and word embeddings β j in Equation (1) can be used to
                                                                    interpret a recommendation. The logit for a given document
We build an end-to-end recommender system visual in-                with a set of words xm is the sum of per-word logits, which
terface to address two aims: (1) to aid editors at The              are computed as the inner product of the user embedding and
Browser in their decision-making task, and give them con-           word embedding. The per-word contribution of a word in a
trol through an explanation-aware interface, and (2) to release     document to the logit that determines the document’s ranking
a lightweight, performant, open-source visual interface frame-      in a list of recommendations is
work for explanation-aware recommender systems for docu-
ment recommendation. In an offline evaluation, we show that                                  wu j = θu> β j .                   (2)
the recommendation model we use for the visual interface out-
performs BERT, a competitive document classification model.         This weight wu j helps explain why a document was recom-
In a qualitative study, the control and explanations provided       mended, using information about both the user u and the word
by the visual interface help editors in their decision-making       j. In the visual interface, words in a document are first sorted
and help find bugs in the recommendation model.                     by their contributions to a document’s logit wu j , and the top
                                                                    words are displayed. Similarly, words that lower a document’s
RECOMMENDATION MODEL                                                ranking are also displayed, to inform a user of which words
RANKFROMSETS ( RFS ) is the recommendation model that               detract from the recommendation of a document.
powers the visual interface; the main part of the pipeline illus-
trated in Figure 1. RFS scales to large numbers of articles, and    Interface for controlling recommendations In a decision-
can maximize the evaluation metric of recall [1, 2]. Recall, or     making task, a user such as an editor for The Browser may
the fraction of true positives returned by a recommendation         wish to filter recommendations according to topics such as
model, is an appropriate evaluation metric for recommending         crime, technology, or business. The recommendations output
interesting writing to editors at The Browser. A recommenda-        by RFS can be controlled, by altering the per-word contribu-
tion model such as RFS can be readily backtested with recall        tions in Equation (2) according to whether a word is topical.
as an evaluation metric, as historical data contains positive       This is accomplished by first calculating words related to a
examples (articles selected by the editors) but rarely contains     topic word using pre-trained word embeddings from BERT [4,
                                                                    7]. Words related to a topic are defined by a heuristic: the
3 https://signal.org/                                               cosine similarity between all words and a topic word such
       Recommendation Model           Recall @ 1000 (%)                             60%
        RANKFROMSETS                            53.1


                                                                           Recall
        BERT                                    46.6                                40%                                   RankFromSets
                                                                                    20%                                   BERT
Table 1: RFS outperforms BERT in an offline evaluation, on a
task of predicting which articles editors at The Browser would                      0%
feature based on words in the articles.
                                                                                          0    500     1000    1500
                                                                                              Time (seconds)
                                                                         Figure 3: RANKFROMSETS achieves better performance faster
as ‘business’ are computed, and the top 15 words closest in              than BERT in terms of validation recall during training.
cosine distance are stored as topical words. Then, a slider in a
visual interface is used to increase or decrease the per-word
contributions of topical words to a document’s logit. Let the            warmup steps, with a batch size of 32 and maximum input
user-input slider value be α, and the set of topical word in-            length of 512 as in Devlin et al. [4] and Wolf et al. [7]. A grid
dices be T . Then the user-controlled version of Equation (1)            search is performed over learning rates of {2, 3, 4, 5} × 10−5 ,
becomes                                                                                    of {102 , 103 , 104 }, and total training steps of
                                                                        ! 2 3steps
                                                                         warmup
                                                                                       4 , 105 } × 5. The model is trained on an NVIDIA
                     1                                                   {10  , 10 , 10
f (u, xm ) = θu>          (1 − I[ j ∈ T ])β j + I[ j ∈ T ]αsgn(wu j )β j Tesla
                                                                          .
                   |x | ∑
                    m j∈xm
                                                                                V100 GPU.
                                                            (3)          The best-performing model of RFS is selected for deployment,
The sign function sgn( · ) is applied to the per-word contribu-          and recall is evaluated on the test set, after using early stopping
tion to a document’s logit. This is included since a word might          according to validation recall. The results are shown in Table 1,
contribute negatively to a document’s logit, yet a user may              and RFS outperforms BERT by 14%. Further, RFS achieves
wish to increase the weight of a related topical word.                   better performance ten times faster than BERT, as shown in
                                                                         Figure 3. In a test to measure the speed of recommending 104
EVALUATION                                                               held-out articles, RFS ranked all articles in 120 ms on a CPU,
We conduct an offline empirical study of the performance of              while BERT took 4 m 54 s to rank the articles on an NVIDIA
RANKFROMSETS to assess its performance as a recommenda-                  Tesla V100 GPU. This represents a 2000-fold improvement in
tion model. Then we qualitatively evaluate the visual interface          speed, which is beneficial for the controllable visual interface
to study whether the explanation-aware, controllable interface           that requires Equation (3) to be quickly computed in response
enabled by RFS can help make editors at The Browser make                 to user input.
better decisions.
                                                                         Qualitative evaluation In a user study, editors at The
Data collection and preprocessing For positive examples,                 Browser provided feedback that they used the visual interface
we use the historical set of articles curated by editors at The          to choose articles, and found this to be an improved workflow.
Browser. We augment the training data with articles selected             The control over recommendations, and explanation-aware
by the editors of other curation services, and treat all positively-     visual interface provided by RFS helped elicit bugs in data col-
labeled examples curated by editors as data from a single user           lection (such as foreign language sources, or fiction writing)
due to a paucity of data. We use articles from news websites as          and provides an enjoyable user experience.
examples with negative labels, and collect additional articles
with negative labels from websites most-featured by the editors          DEPLOYMENT
to mimic the editorial process of reading a large swath of               The visual interface is deployed on Github Pages, with the
articles in a feed and distilling an article list to a select few.       backend, RFS, deployed as a microservice on Amazon Web
For preprocessing the data we use the tokenizer released by              Services Lambda. Equation (3) is cheap to compute, so the
Devlin et al. [4] and discard words not recognized by the                lambda function is a short python script that requires numpy
tokenizer. This procedure results in a dictionary with 30k               as a dependency, compared to BERT which would require a
words, and 646k datapoints with 27k positive labels.                     hosted GPU solution. RFS recommends recent articles from
                                                                         the editors’ reading list of feeds. As a proof of concept, we
Metrics Performance of the recommendation models is as-                  include a tab for coronavirus-related articles that users can
sessed with recall, and 15% of the datapoints are held out for           search through using the sliders and Equation (3).
the validation and test sets respectively.
                                                                         DISCUSSION
Experimental setup: RankFromSets We cross-validate us-
                                                                         We built a visual interface for a recommender system powered
ing the RMSProp optimizer [6] with a momentum of 0.9 and
                                                                         by RFS, a flexible recommendation model. Empirically, we
grid search over learning rates of {10−2 , 10−3 , 10−4 , 10−5 },
whether or not to initialize from pre-trained BERT embed-                demonstrated that RFS outperforms BERT in an offline evalu-
dings [7], and embedding sizes of {10, 25, 50, 100, 500, 1000}.          ation, while being orders of magnitude faster during training
This model is trained on an NVIDIA Tesla P100 GPU.                       and recommendation. By deploying RFS to AWS Lambda and
                                                                         hosting the visual interface on Github Pages, we demonstrated
Experimental setup: BERT To fine-tune BERT, we use the                   a fully open-source pipeline for creating an explanation-aware,
AdamW optimizer with a linear learning rate scheduler and                controllable visual interface for document recommendation
for editorial decision-making. Future work includes studying
whether the transparency and control provided by open-source
recommendation systems can improve user experience and
inform users as to how recommendation models influence
attention online.
Acknowledgments
The authors are grateful to Christian Bjartli for help with data
collection.
References
[1] Jaan Altosaar. “Probabilistic Modeling of Structure in
    Science: Statistical Physics to Recommender Systems”.
    PhD thesis. Princeton University, 2020.
[2] Jaan Altosaar, Wesley Tansey, and Rajesh Ranganath.
    “RankFromSets: Scalable Set Recommendation with Op-
    timal Recall”. In: American Statistical Association Sym-
    posium on Data Science & Statistics (2020).
[3] K. Cohn-Gordon et al. “A Formal Security Analysis of
    the Signal Messaging Protocol”. In: 2017 IEEE European
    Symposium on Security and Privacy (EuroS P). 2017,
    pp. 451–466.
[4] Jacob Devlin et al. “BERT: Pre-training of Deep Bidi-
    rectional Transformers for Language Understanding”. In:
    Association for Computational Linguistics. Minneapolis,
    Minnesota: Association for Computational Linguistics,
    June 2019, pp. 4171–4186. DOI: 10.18653/v1/N19-1423.
    URL : https://www.aclweb.org/anthology/N19-1423.
[5] Nicholas Diakopoulos. “Oxford Handbook of Ethics and
    AI”. In: ed. by Markus Dubber, Frank Pasquale, and Sunit
    Das. Oxford University Press, 2020. Chap. Accountabil-
    ity, Transparency, and Algorithms.
[6] Tijmen Tieleman and Geoffrey Hinton. “Lecture 6.5-
    rmsprop: Divide the gradient by a running average of its
    recent magnitude.” In: COURSERA: Neural Networks
    for Machine Learning (2012).
[7] Thomas Wolf et al. “HuggingFace’s Transformers: State-
    of-the-art Natural Language Processing”. In: ArXiv
    abs/1910.03771 (2019).