<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Joint Optimization of Profit and Relevance for Recommendation Systems in E-commerce∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Raphael Louca</string-name>
          <email>rlouca@etsy.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moumita Bhattacharya</string-name>
          <email>mbhattacharya@etsy.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diane Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liangjie Hong</string-name>
          <email>lhong@etsy.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>New York</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>U.S.A</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Revenue Optimization</institution>
          ,
          <addr-line>Recommendation systems, e-commerce</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>20</volume>
      <issue>2019</issue>
      <abstract>
        <p>Traditionally, recommender systems for e-commerce platforms are designed to optimize for relevance (e.g., purchase or click probability). Although such recommendations typically align with users' interests, they may not necessarily generate the highest profit for the platform. In this paper, we propose a novel revenue model which jointly optimizes both for probability of purchase and profit. The model is tested on a recommendation module at Etsy.com, a two-sided marketplace for buyers and sellers. Notably, optimizing for profit, in addition to purchase probability, benefits not only the platform but also the sellers. We show that the proposed model outperforms several baselines by increasing ofline metrics associated with both relevance and profit.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Computer systems organization → Embedded systems;
Redundancy; Robotics; • Networks → Network reliability.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>In recent years, online e-commerce platforms such as Amazon, Ebay,
and Etsy have seen tremendous growth. Unlike traditional brick and
mortar stores, such platforms do not manufacture, store, or source
products, rather they operate as a two-sided marketplace between
buyers and sellers, facilitating a convenient and safe transaction
process. In exchange, they collect a percentage of the transaction
amount as a fee. Because of the large selection of products available,
such platforms rely predominantly on recommendation systems
to help users find items that appeal to their tastes and interests.
Traditionally, these recommendation systems focus on optimizing
for relevance by predicting the purchase or click probability of an
item. This relevance-centric approach manifests itself in increased
conversion rates. However, it does not explicitly maximize the profit
generated for the platform, or the sellers. Thus, the question is, how
∗Copyright 2019 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).</p>
      <p>
        Presented at the RMSE workshop held in conjunction with the 13th ACM Conference
on Recommender Systems (RecSys), 2019, in Copenhagen, Denmark.
*These authors have equally contributed to this work.
can we design recommendation systems that jointly optimize for
relevance and profit?
Summary of Contributions: In this work, we propose a novel revenue
model, which optimizes both probability of purchase and profit.
Specifically, we show that the proposed model can make strategic
recommendations by surfacing items that are both relevant to users
and profit-maximizing for Etsy. To the best of our knowledge, no
current studies have jointly optimized for both objectives, although
a few have directly optimized just for profit [
        <xref ref-type="bibr" rid="ref3 ref4 ref8">3, 4, 8</xref>
        ]. In addition
to several well-studied metrics, we propose two new metrics to
evaluate the eficacy of the revenue model. Our results show that
the model achieves statistically significantly higher (p-value &lt; 0.05)
performance compared to multiple baselines.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        Because of the propensity to optimize for user relevance when
designing recommendation systems, only a few works thus far
have proposed methods that optimize the profit generated for the
e-commerce platform by the recommendation system [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref8">2–4, 8</xref>
        ]. One
such study by Chen et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] propose a simple profit-aware
recommendation system, where candidate items are ranked in decreasing
order of expected profit. The expected profit of a candidate item
is computed by simply multiplying the probability of purchase of
said item with its price. In our work, we observed that this
approach tends to rank items according to decreasing order of price
(cf. Section 4). In another study, Das et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] propose an
optimization problem, which maximizes the expected profit subject to
constraints, which ensure that the similarity (as defined by the Dice
or Jaccard measure) between the vector of ratings of recommended
items and the user’s true rating vector is less than a certain
threshold. Essentially, the authors develop a model that maximizes the
vendor’s expected profit while maintaining a level of “trust” with
the customer.
      </p>
      <p>
        In a separate line of work, Lu et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] propose a dynamic model
that takes into account a variety of factors including prices,
valuations, saturation efects, and competition amongst products to
recommend items. Their work is orthogonal to ours as the model
ifnds a recommendation strategy that maximizes the expected total
revenue over a given time horizon. In all of these studies, however,
it is assumed that the e-commerce platform has access to a model
that optimizes for relevance and yields either a set of purchase
probabilities or a ‘true’ rating vector for each user. Our work is
diferent in that it proposes a model that jointly optimizes for both
relevance and profit.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>METHODS</title>
      <p>In this section, we propose a novel objective function that optimizes
both the likelihood of purchase as well as profit. Below, we describe
the revenue model, the baselines, and the evaluation metrics.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Revenue Model</title>
      <p>Suppose that we are given training data collected from user sessions
at Etsy.com. Each training instance i = 1, . . . , m is described by
feature vector xi ∈ Rn , and a label yi ∈ {−1, 1} indicating whether the
corresponding recommended item has been purchased. We assume
that each Bernoulli random variable yi (random, that is, before we
observe the results) can be modeled by a logistic regression model,
where</p>
      <p>prob[yi = 1|xi ; w] = σ (w⊤xi + b) = 1/(1 + exp(−(w⊤xi + b)))
and σ : Rn → R is the sigmoid function. Traditionally, the objective
is to find a maximum likelihood estimate of the model parameters
(w, b), which requires solving the following convex optimization
problem:
maximize
w ∈Rn, b ∈R</p>
      <p>m
ℓ(w, b) := − Õ log(1 + exp(−yi (w⊤xi + b))). (1)</p>
      <p>
        i=1
This objective function, however, does not explicitly maximize for
profit. This naturally points in the direction of designing a custom
objective function that yields parameters that trade-of between
optimizing for probability of purchase and for profit. Let πi denote
the price of item i . The expected revenue generated by a set of m
recommended items is given by
ρ(w, b) :=
where the last equality follows by the fact that yi is a Bernoulli
random variable that takes values in {−1, 1}. This gives rise to the
following optimization problem
maximize
w ∈Rn, b ∈R
ℓ(w, b) + µρ (w, b),
(2)
whose objective is to find parameters that fit the data (via ℓ(w, b))
while maximizing the expected revenue (via ρ(w, b)). Here, µ ≥ 0 is
a hyperparameter of the model that controls the tradeof between
the two objectives. Because this is a maximization problem and
πi ≥ 0 for all i, the model in (2) will find parameters (w, b) that
increase σ (w⊤xi + b) for higher-priced items i while ensuring that
said parameters are able to explain the data. It is to be noted that the
log-likelihood function ℓ(w, b) is concave in (w, b) and can
therefore be maximized. The expected revenue term ρ(w, b), however, is
a weighted sum of sigmoid functions, which is known to be
nonconvex [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Therefore, the solution to problem (2) is only guaranteed
to be locally optimal. In our experiments, we use interior point
methods [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to obtain a solution for (2).
      </p>
      <p>Once the optimal parameters are learned, we use them to rank a
set of candidate items. In particular we consider two rankers, one
is based on probabilities and the other on the expected revenue as
follows:
(1) Raw Ranker (RR): Ranks a set of K items according to
increasing value of xi⊤w∗ + b∗, i = 1, . . . , K . Because the sigmoid
function is an increasing function of (w, b), the RR is
equivalent to a probability ranker, which ranks items according to
increasing value of σ (xi⊤w∗ + b∗).
(2) Expected Revenue Ranker (ER): Ranks a set of K items
according to increasing value of πi · σ (xi⊤w∗ + b∗), i = 1, . . . , K .
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Baselines</title>
      <p>We compare the revenue model (2) with the following baselines:
• Logistic Regression (LOR): Obtained from the revenue model
by setting µ = 0.
• Weighted Logistic Regression (WLOR): A variant of LR,
where purchased items are weighted by their price. In
particular, let P = {i | item i is purchased}. WLR is formulated
as
maximize
w ∈Rn, b ∈R
−</p>
      <p>πi log(1 + exp(−yi (w⊤xi + b))
− Õ log(1 + exp(−yi (w⊤xi + b)).</p>
      <p>Õ
i ∈ P
i&lt;P
• Linear Regression (LIR): We consider a linear regression
model where the label yi of item i is equal to the profit
generated by item i. More precisely, yi = πi 1i ∈ P . In linear regression,
the optimal parameters are chosen to minimize the squared
error between predictions and labels, i.e.,
minimize
w ∈Rn, b ∈R
m
Õ
i=1</p>
      <p>(xi⊤w + b − yi )2.</p>
      <p>The above optimization problem admits a closed form solution.
3.3</p>
    </sec>
    <sec id="sec-7">
      <title>Evaluation Metrics</title>
      <p>We use the following metrics to evaluate the performance of the
proposed model and baselines. Let r be a ranking such that r1 ≥
r2 ≥ · · · ≥ rK . Also, let π = [π1, . . . , πK ]⊤ be the vector of prices
of items 1, . . . , K .</p>
      <p>
        • Profit@ k: Given a position k ∈ NK and a ranking r , the
profit@ k is defined as the profit generated by the k highest
ranked items. More precisely, it is given by Ík
i=1 πri 1ri ∈ P ,
where 1 is the indicator function and P is the set of purchased
items.
• Average Price (AP) @k: Given a position k ∈ NK and a
ranking r , the AP@k is defined to be equal to the average price of
the k highest ranked items. It is given by Ík
i=1 πri .
• Price-Based Normalized Discounted Cumulative Gain @k
(P-NDCG) P-NDCG@k is defined as NDCG@ k [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where the
gain of item ri is equal to πri /∥π ∥.
• Area Under the Curve (AUC): AUC [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is the area under the
receiver operating characteristic curve, and it can be interpreted
as the probability that the classifier will rank a randomly
chosen positive instance higher than a randomly chosen negative
instance. We use AUC to measure relevance.
4 EXPERIMENTS AND DISCUSSION
In this section, we present ofline experiments to evaluate the
performance of the proposed revenue model (2). We use a training set
consisting of implicit feedback data collected over a day from an
item-to-item recommendation module that is placed on item pages
at Etsy.com. An example of the this module is shown in Figure 1
The training data is sampled so that 40% of the data are positive
instances and the remaining are negative. Our feature set consists
of item features (e.g., item purchase count) and cross features
between the target and candidate items (e.g., tfidf similarity between
the two items). We evaluate the model on next day’s data collected
from the same recommendation module. It is to be noted that for
the revenue model, we do not rank according to expected revenue
(ER) because such ranking is meaningful only if the underlying
model maximizes just for the likelihood of purchase (e.g., logistic
regression model (1)).
      </p>
      <p>In Figure 3, we plot the distribution of predictions returned by
the optimal parameters (w, b) of problem (2) as function of µ . It
is worth noting that for µ = 0, the boxplot depicts the
distribution of predictions for the logistic regression model. Compared to
this distribution, we observe that for µ = 100, 10000, the spread
in the distribution of predictions induced by the optimal
parameters increases while for µ = 1 it decreases. The median (red line
in boxplot) is observe to increase for all values of µ &gt; 0. This
is expected since πi ≥ 0 and the revenue term is maximizing
πi prob[yi = 1|xi ; w] = πi σ (xi⊤w + b).</p>
      <p>In Table 1, we observe that the proposed revenue model attains the
highest AUC and profit@ k, among all other models, for all three
values of k. In particular, we observe a 3.57% increase in AUC and
9.50% in profit@ 1 compared to the LR model using the raw ranker
(LOR/RR). It is also worth noting that compared to LOR/RR, the
proposed revenue model also increases P-NDCG@k and AP@k for
all k by at least 3.57% and 23.08%. Therefore, the proposed model
ranks relevant but high-priced items higher. Similar comparisons
can be made between the revenue model and the WLOR model
using RR. In Table 1, we also observe that the LOR model using the
expected revenue ranker (LOR/ER) attains the highest values for
PNDCG@k and AP@k. Thus, it favors higer-priced items. However,
unlike our model, this model results in a 10.76% and 16.06% decrease
in AUC compared to the LOR/RR model and the revenue model
respectively.</p>
      <p>The results shown in Table 1 are further supported by Figure 2,
which shows the six candidate items rankned in the order that is
generated by the LOR/RR (1st row), LOR/ER (2nd row), and revenue
according to decreasing order of price, thus attaining the highest
possible AP@k and P-NDCG@k for any k (i.e., P-NDCG@k=1, for
all k = 1, . . . , 6). It is also straightforward to verify that in this
example, the revenue model outperforms the LOR/RR model both
in terms of P-NDCG@k and AP@k for all values of k.
5</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSION</title>
      <p>In this prelimimary study we propose a novel model that optimizes
both profit and probablity of purchase, while generating
recommendations. We show that the recommendations produced by our
model is able to increase profit for the platform while retaining high
relevancy for users. In future work we plan to train our model on
much larger datasets and assess its performance in the face of real
user-trafic in Etsy.com by launcihing an online A/B experiment.
model (3rd row). The item in the blue dashed-lined box is the item
that was purchased. As shown in the figure, the revenue model is
able to assign the highest rank to the purchased item while the
LOR/RR and LOR/ER models rank that item last and second,
respectively. It is worth noting that the revenue model ranks the second
highest-priced item first, which is also the one being purchased.
Therefore, our model generates a ranking that trades-of between
optimizing for relevance and profit. We can also observe from the
ifgure that the ranking obtained by the LOR/ER model sorts items
1
0.8
0.6
0.4
0.2
0
0
0.01
1
100
10000</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Boyd</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lieven</given-names>
            <surname>Vandenberghe</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Convex optimization</article-title>
          . Cambridge university press.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Long-Sheng</surname>
            <given-names>Chen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fei-Hao</surname>
            <given-names>Hsu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mu-Chen Chen</surname>
          </string-name>
          , and
          <string-name>
            <surname>Yuan-Chia Hsu</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Developing recommender systems with the consideration of product profitability for sellers</article-title>
          .
          <source>Information Sciences 178</source>
          ,
          <issue>4</issue>
          (
          <year>2008</year>
          ),
          <fpage>1032</fpage>
          -
          <lpage>1048</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Aparna</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Claire Mathieu</surname>
            , and
            <given-names>Daniel</given-names>
          </string-name>
          <string-name>
            <surname>Ricketts</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Maximizing profit using recommender systems</article-title>
          .
          <source>arXiv preprint arXiv:0908.3633</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Wei</given-names>
            <surname>Lu</surname>
          </string-name>
          , Shanshan Chen,
          <string-name>
            <given-names>Keqian</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <source>Laks VS Lakshmanan</source>
          .
          <year>2014</year>
          .
          <article-title>Show me the money: dynamic recommendations for revenue maximization</article-title>
          .
          <source>Proceedings of the VLDB Endowment 7</source>
          ,
          <issue>14</issue>
          (
          <year>2014</year>
          ),
          <fpage>1785</fpage>
          -
          <lpage>1796</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Manning</surname>
          </string-name>
          , Prabhakar Raghavan, and
          <string-name>
            <given-names>Hinrich</given-names>
            <surname>Schütze</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Introduction to information retrieval</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>16</volume>
          ,
          <issue>1</issue>
          (
          <year>2010</year>
          ),
          <fpage>100</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Kevin</surname>
            <given-names>P</given-names>
          </string-name>
          <string-name>
            <surname>Murphy</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Machine learning: a probabilistic perspective</article-title>
          . MIT press.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Madeleine</given-names>
            <surname>Udell</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Boyd</surname>
          </string-name>
          . [n. d.].
          <article-title>Maximizing a sum of sigmoids</article-title>
          .
          <source>([n. d.])</source>
          . http://www.stanford.edu/~boyd/papers/max_sum_sigmoids.html
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Peng</given-names>
            <surname>Ye</surname>
          </string-name>
          , Julian Qian, Jieying Chen,
          <string-name>
            <surname>Chen-hung Wu</surname>
            , Yitong Zhou, Spencer De Mars, Frank Yang,
            <given-names>and Li</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Customized Regression Model for Airbnb Dynamic Pricing</article-title>
          .
          <source>In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining. ACM</source>
          ,
          <volume>932</volume>
          -
          <fpage>940</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>