<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predicting Sales from the Language of Product Descriptions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Reid Pryzant</string-name>
          <email>rpryzant@stanford.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Young-joo Chung</string-name>
          <email>yjchung@acm.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dan Jurafsky</string-name>
          <email>jurafsky@stanford.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Rakuten Institute of Technology</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Stanford University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>What can a business say to attract customers? E-commerce vendors frequently sell the same items but use diferent marketing strategies to present their goods. Understanding consumer responses to this heterogeneous landscape of information is important both as business intelligence and, more broadly, a window into consumer attitudes. When studying consumer behavior, the existing literature is primarily concerned with product reviews. In this paper we posit that textual product descriptions are also important determinants of consumer choice. We mine 90,000+ product descriptions on the Japanese e-commerce marketplace Rakuten and identify actionable writing styles and word usages that are highly predictive of consumer purchasing behavior. In the process, we observe the inadequacies of traditional feature extraction algorithms, namely their inability to control for the implicit efects of confounds like brand loyalty and pricing strategies. To circumvent this problem, we propose a novel neural network architecture that leverages an adversarial objective to control for confounding factors, and attentional scores over its input to automatically elicit textual features as a domain-specific lexicon. We show that these textual features can predict the sales of each product, and investigate the narratives highlighted by these words. Our results suggest that appeals to authority, polite language, and mentions of informative and seasonal language win over the most customers.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Content analysis and feature
selection; • Computing methodologies → Information
extraction; Neural networks;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>The internet has dramatically altered consumer shopping habits.
Whereas customers of physical stores can physically manipulate,
test, and evaluate products before making purchasing decisions,
the remote nature of e-commerce renders such tactile evaluations
obsolete.</p>
      <p>
        In lieu of in-store evaluation, online shoppers increasingly rely
on alternative sources of information. This includes “word-of-mouth”
recommendations from outside sources [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and local product
reviews [
        <xref ref-type="bibr" rid="ref13 ref18 ref20">13, 18, 20</xref>
        ]. These factors, though well studied, are only
indirectly controllable from a business perspective [
        <xref ref-type="bibr" rid="ref25 ref52">25, 52</xref>
        ].
Business owners have considerably stronger control over their own
product descriptions. The same products may be sold by multiple
vendors, with each item having a diferent textual description (note
that we take product to mean a purchasable object, and item to mean
an individual e-commerce listing). Studying consumers’ reactions
to these descriptions is valuable both as business intelligence and
as a new window into consumer attitudes.
      </p>
      <p>
        The hypothesis that business-generated product descriptions
afect consumer behavior (manifested in sales) has received strong
support in prior empirical studies [
        <xref ref-type="bibr" rid="ref22 ref26 ref34 ref37 ref39">22, 26, 34, 37, 39</xref>
        ]. However, these
studies have only used summary statistics of these descriptions (i.e.
readability, length, completeness). We propose that embedded in
these product descriptions are narratives that afect shoppers, which
can be studied by examining the words in each description.
      </p>
      <p>Our hypothesis is that product descriptions are fundamentally
a kind of social discourse, one whose linguistic contents have real
control over consumer purchasing behavior. Business owners
employ narratives to portray their products, and consumers react
accordingly according to their beliefs and attitudes.</p>
      <p>To test this hypothesis, we mine 93,591 product descriptions and
sales records from the Japanese e-commerce website rakuten.co.jp
(“Rakuten”). We build models that can explain how the textual
content of product descriptions impacts sales. Second, we use these
models to conduct a explanatory analysis, identifying what
linguistic aspects of product descriptions are the most important
determinants of success.</p>
      <p>
        We seek to unearth actionable phrases that can help
ecommerce vendors increase their sales regardless of what’s
being sold. Thus, we want to study the efect of language on sales
in isolation, i.e. find textual features that are untangled from the
efects of pricing strategies [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], brand loyalty [
        <xref ref-type="bibr" rid="ref17 ref48">17, 48</xref>
        ], and product
identity. Choosing features for such a task is a challenging problem,
because product descriptions are embedded in a larger e-commerce
experience that leverages the shared power of these confounds to
market a product. For a not-so-subtle example, product
descriptions frequently boast “free shipping!”, overtly pointing to a pricing
strategy with known power over consumer choice [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>We develop a new text feature selection algorithm to operate
in this confound-controlled setting. This algorithm makes use of a
novel neural network architecture. The network uses attentional
scores over its input and an adversarial objective to select a
lexicon that is simultaneously predictive of consumer behavior and
controlled for confounds such as brand and price.</p>
      <p>
        We evaluate our feature selection algorithm on two pools of
feature candidates: morphemes obtained with the JUMAN tokenizer1,
and sub-word units obtained via byte-pair encoding (“BPE”) [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ].
From these pools we select features with either (1) our proposed
neural network, (2) odds ratios [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], (3) mutual information [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ],
and (4) the features with nonzero coeficients of a L1 regularized
linear regression. Our results suggest that lexicons produced by the
neural model are both less correlated with confounding factors and
the most powerful predictors of sales.
      </p>
      <p>In summary, our contributions are as follows:
• We demonstrate that the narratives embedded in e-commerce
product descriptions influence sales.
• We propose a novel neural architecture to mine features
for the task.
• We discover actionable writing styles and words that have
especially high influence on these outcomes.
2</p>
    </sec>
    <sec id="sec-3">
      <title>PREDICTING SALES FROM DESCRIPTIONS</title>
      <p>Our task is to predict consumer demand (measured in log(sales))
from the narratives embedded in product descriptions. To do so,
we mine features from these textual data and fit a statistical model.
In this section, we review our feature-mining baselines, present
our novel approach to feature-mining, and outline our statistical
technique for predicting sales from these features while accounting
for confounding factors like brand loyalty and product identity.
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>Feature Mining Preliminaries</title>
      <p>We approach the featurization problem by first segmenting
product descriptions into sequences of tokens, then selecting tokens
from the vocabulary of tokens that are predictive of high sales.
We take subsets of these vocabularies (rather than one feature per
vocabulary item) because (1) we need to be able to examine the
linguistic contents of the resulting feature sets, and (2) we need
models that are highly generalizable, and not too closely adapted
to the peculiarities of these data’s vocabulary distributions.</p>
      <p>We select predictive subsets of the data’s tokenized vocabularies
in four ways. Three of these (Section 2.2) are traditional feature
selection methods that serve as strong baselines for our proposed
method (Section 2.3).
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Traditional Feature Mining</title>
      <p>Odds Ratio (OR) finds words that are over-represented in a
particular copora when compared to another (e.g. descriptions of high
selling items verses those of low-selling counterparts). Formally,
this is:
pi /(1 − pi )
pj /(1 − pj )
(1)
where pi is the probability of the word in copora i (e.g high-selling
descriptions) and pj is the probability of the word in copora j
1JUMAN (a User-Extensible Morphological Analyzer for Japanese), http://nlp.ist.i.
kyoto-u.ac.jp/EN/index.php?JUMAN
(2)
(3)
(4)
(e.g low-selling descriptions). Note that this method requires
dichotomized targets, which we discuss further in Section 3.1.</p>
      <p>Mutual information (MI) is a measurement of how informative
the presence of a token is to making correct classification decisions.
Formally, the mutual information MI (t , c) of a token t and binary
class c is</p>
      <p>MI (t , c) =
Õ</p>
      <p>Õ
It ∈ {1,0} It ∈ {1,0}</p>
      <p>P (It , Ic ) log</p>
      <p>P (It , Ic )
P (It )P (IC )
where It and Ic are indicators on term presence and class label for
a given description. Like OR, this method requires dichotomized
sales targets.</p>
      <p>
        Lasso Regularization (L1) can perform variable selection on a
linear regression model [
        <xref ref-type="bibr" rid="ref51">51</xref>
        ] by including a regularization term to
the least squares objective. This term penalizes the L1 norm of the
model parameters:
arg min
n ÕN
i=1
yi − β0 −
      </p>
      <p>j
subject to Õ
Õ</p>
      <p>o
βj xi j ,
j
|βj | ≤ α
.</p>
      <p>Where yi is the ith target, β0 is an intercept, βj is the jth coeficient
of the ith predictor xi . α is pre-specified parameter that determines
the amount of regularization. The parameter α can be obtained by
minimizing the error in cross-validation.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Deep Adversarial Feature Mining</title>
      <p>An important limitation of all the aforementioned feature selection
methods is that they are incapable of selecting features that are
decorrelated from confounds like brand and price. Recall from
Section 1 the price-related example of “free shipping!”. Consider
the brand-related example of “the quality you know and love from
Daison”. Though efective marketing tools, these phrases leverage
the power of pricing strategies and brand loyalty, factors with
known power over consumers. We wish to study the impact of
linguistic structures in product descriptions in isolation, beyond
those indicators of price or branding. Thus, we consider brand,
product, and price information as confounding factors that confuse
the efect of language on consumers.</p>
      <p>As a solution to this problem, we propose a novel feature-selecting
neural network (RNN+/-GF), sketched in Figure 1. The model uses
an attention mechanism to produce estimates for log(sales), brand,
and price. We omit product because it is only present in our test
data; see Section 3.1 for details. During training, the model uses
an adversarial objective to discourage feature efectiveness with
respect to two of these prediction targets: brand and price. That is,
the model finds features that are good at predicting sales, and bad
at predicting brand and price.</p>
      <p>Deep learning review. Before we describe the model, we review
its primary building blocks.</p>
      <sec id="sec-6-1">
        <title>Feedforward Neural Networks (FFNNs) are composed of a</title>
        <p>series of fully connected layers, where each layer takes on the form
y = f (W x + b).</p>
        <p>Note that x ∈ Rn is a vector of inputs (e.g. from a previous layer),
W ∈ Ry×n is a matrix of parameters, b ∈ Ry is a vector of biases,
y ∈ Ry is an output vector, and f (·) is some nonlinear activation
function, e.g. the ReLU: ReLU (x ) = max{0, x }.</p>
        <p>
          Recurrent Neural Networks (RNNs) are efective tools for
learning structure from sequential data [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. RNNs take a vector
xt at each timestep. They compute a hidden state vector ht ∈ Rh
at each timestep by applying nonlinear maps to the the previous
hidden state ht −1 and the current input xt (note that h0 is initialized
to 0®):
ht = σ W (hx )xt + W (hh)ht −1 .
(6)
        </p>
        <p>
          W (hx ) ∈ Rh×n , W (hh) ∈ Rx ×h are parameterized matrices. We
use Long Short-Term Memory Network (LSTM) cells, a variant of
the traditional RNN cell that can more efectively model long-term
temporal dependencies [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
        </p>
        <p>
          Attention mechanisms. Attentional mechanisms allow neural
models to focus on parts of the encoded input before producing
predictions. We calculate Bahdanau-style attentional contexts [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
because these have been shown to perform well for other tasks
like translation and language modeling [
          <xref ref-type="bibr" rid="ref11 ref31">11, 31</xref>
          ], and preliminary
experiments suggested that this mechanism worked best for our
problem setting.
        </p>
        <p>Bahdanau-style attention computes the attentional context as a
weighted average of hidden states. The weights are computed as
follows: pass each hidden state hi through a fully-connected neural
network, then compute a dot product with a vector of parameters to
produce an intermediary scalar aˆi (eq. 7). Next, the aˆi ’s are scaled
by a softmax function so that they map to a distribution over hidden
states (eq. 8). Finally, this distribution is used to compute a weighted
average of hidden states c (eq. 9). Formally, this can be written as:
aˆi = va⊺ tanh(Wahi )
a = softmax( aˆ)
c = Õ ajhj
j
(7)
(8)
(9)</p>
        <p>Our model. We continue by describing our adversarial feature
mining model. The process of obtaining features from the model can
be thought of as a three-stage algorithm: (1) forward pass, where
predictions are generated, (2) backward pass, where parameters
are updated, and, after repeated iterations of 1 and 2, (3) feature
selection, where we use attentional scores to elicit lexicons.</p>
        <p>The forward pass operates as follows:
(1) The segmented input is fed into an LSTM to produce hidden
state encodings for each timestep.
(2) We compute an attentional summary of these hidden states
to obtain a single vector encoding of the input.
(3) We feed this encoding into three FFNNs. One is a
regression network that tries to minimize L = ||yˆ − x ||2, the
squared loss between the predicted and true log(price).
The second and third are classification networks, which
predict a likelihood distribution over all possible labels,
and are trained to minimize L = − log p(y), the negative
log probability of the correct class label. We attach
classiifcation networks for brand id and a dochotomization of
price (see Section 3.1 for details). We dichotomized sales in
this way to create a fair comparison between this method
and the baselines: other feature selection algorithms (OR,
MI) are not so flexible and require dichotomized targets.</p>
        <p>
          The backward pass draws on prior work in leveraging
adversarial objective functions to match feature distributions in diferent
settings [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]. In particular, we draw from a line of research in the
style of [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], and [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. This method involves passing gradients
through a gradient reversal layer, which multiplies gradients by
a negative constant, i.e. -1, as they propagate back through the
network. Intuitively, this encourages parameters to update away
from the optimization objective.
        </p>
        <p>If Lsal es , Lbr and , Lpr ice are the regression and classification
losses from each prediction network, then the final loss we are
optimizing is L = Lsal es +Lbr and +Lpr ice . However, when
backpropagating from each prediction network to the encoder, we reverse
the gradients of the networks that are predicting confounds. This
means that the prediction networks still learn to predict brand and
price, but the encoder is forced to learn brand- and price-invariant
representations which are not useful to these downstream tasks.
We hope that such representations encourage the model to attend
to confound-decorrelated tokens.</p>
        <p>The lexicon induction stage uses a trained model defined above
to select textual features that are predictive of sales, but control for
the influence of brand and price. This stage operates as follows:
(1) Generate predictions for each test item, but rather than
saving those predictions, save the attentional distribution
over each source sequence.
(2) Standardize these distributions. For each input i,
standardize the distribution over timesteps p(i) by computing
z(i) =
p(i) − µ (pi)
σp(i)
(10)
(3) Merge these standardized distribution over each input
sequence. If there is a word collision (i.e. we observe the same
token in multiple input sequences and the model assigned
each observation a diferent z-score), take the max of those
words’ z-scores.
(4) Select the k tokens with highest z-scores. This is our
induced lexicon.
2.4</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Using Features to Predict Sales</title>
      <p>
        Once we have mined textual features from product descriptions, we
need a statistical model that accounts for the efects of confounding
variables like product identity and brand loyalty in predicting the
sales of each item. We use a mixed-efects model, a type of
hierarchical regression that assumes observations can be explained with
two types of categorical variables: fixed efect variables and random
efect variables [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>We model textual features as fixed efects. We take the product
that each item corresponds to and the brand selling each item as
random efects. Thus, we force the model to assume that product
and brand information is decorrelated from everything else, and we
expect to observe the explanatory power of text features without
the influence of brand or product. Note that the continuous nature
of the “price” confound precludes our ability to model it (Section
3.1).
(11)
(12)
(13)
(14)
(15)
(16)
(17)</p>
      <p>Conditional Rc2 is the R2 of the entire model (text + product +
brand). It conditions on the variances of the random factors we are
controlling for (product and brand):
3</p>
    </sec>
    <sec id="sec-8">
      <title>EXPERIMENTS</title>
      <p>We now detail a series of experiments that were conducted to
evaluate the efectiveness of each feature set, and, more generally, to test
the hypothesis that narratives embedded in product descriptions
are indeed predictive of sales.
3.1</p>
    </sec>
    <sec id="sec-9">
      <title>Product and Sales Data</title>
      <p>We obtained data on e-commerce product descriptions, sales,
vendors, and prices from a December 2012 snapshot of the Rakuten
marketplace2. We focused on items belonging to two product
categories: chocolate and health. These two categories are both popular
on the marketplace, but their characteristics are diferent. There is
more variability among chocolate products than health products;
many vendors are boutiques that sell handmade goods. Health
vendors, on the other hand, are often large providers of pharmaceuticals
goods, sometimes wholesale.</p>
      <p>We segment product descriptions two ways. First, we tokenize
descriptions into morphological units (morphemes) with the JUMAN
tokenizer3. Second, we break descriptions into frequently occurring
2Please refer to https://rit.rakuten.co.jp/opendata.html for details on data acquisition.
3Using JUMAN (a User-Extensible Morphological Analyzer for Japanese),
http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?JUMAN</p>
      <p>We proceed with a formal description of our mixed-efects model.
Let yi jk be the log(sales) of item i, which is product j and sold
by brand k. The description for this item is written as xi jk , and
each x (h)</p>
      <p>i jk ∈ xi jk is the hth feature of this description. With these
definitions, we can write our mixed-efects model model as
Õ</p>
      <p>βhxi(hjk) + γj + αk + ϵi jk
where γj and αk are the random efects of product and brand,
respectively, and ϵi jk is an item-specific efect, i.e. this item’s deviation
from the mean item sales.</p>
      <p>
        Nakagawa and Schielzeth [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ] introduced the marginal and
conditional R2 (Rm2 and Rc2) as summary statistics of mixed-efects
models. Marginal Rm2 is the R2 of the textual efects only. It reports the
proportion of variance in the model’s predictions can be explained
with fixed efects variables
      </p>
      <p>xi(hjk) . It is written as;
yi jk = β0 +
γj ∼ N(0, σγ2 )
αk ∼ N(0, σα2 )
ϵi jk ∼ N(0, σϵ2)</p>
      <p>h
Rm2 =
σf2 = var
σf2</p>
      <p>2 ,
σf2 + σγ2 + σα2 + σϵ
Õ
h</p>
      <p>!
βhxi(hjk) .</p>
      <p>Rc2 =
σf2 + σγ2 + σα
2</p>
      <p>2 .
σf2 + σγ2 + σα2 + σϵ
sub-word units 4. From here on we refer to the morpheme features
as “morph”, and sub-word features as “BPE”.</p>
      <p>Details of these data can be found in Table 1. Notably, the ratio
of the size of vocabulary (unique keywords) to the size of tokens
(occurrence of keywords) in the chocolate category is twice as large
as that of the health category as listed in (%) in Table 1. This implies
that product descriptions in the chocolate category are written with
more diverse language.</p>
      <p>Recall that some feature selection algorithms (OR, MI) require
dichotomized prediction targets. Thus, we dichotomized the data
on log(sales), taking the top-selling 30% and bottom-selling 30% as
positive and negative examples, respectively. Our textual features
were selected using these dichotomized data.</p>
      <p>In order to evaluate mixed-efects regression models on these
data, we consider the vendor selling an item as its “brand identifier”
(vendors have unique branding on the Rakuten platform). We also
need to know what product each item corresponds to, something
not present in the data. Thus, we hand-labeled 2,131 items with
product identifiers and separated these into a separate dataset for
testing (Table 2). Our experimental results are reported on this test
data set.</p>
    </sec>
    <sec id="sec-10">
      <title>Experimental Protocol</title>
      <p>
        All deep learning models were implemented using the
Tensorlfow framework [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In order to obtain features from the proposed
RNN+GF model, we conducted a brief hyperparameter search on a
held-out development set. This set consisted of of 2,000 examples
randomly drawn from the pool of training data. The final model
used 32-dimensional word vectors, an LSTM with 64-dimensional
4Using https://github.com/google/sentencepiece
hidden states, and 32-dimensional intermediate Bahdanau vectors
as described in Figure 1. Dropout at a rate of 0.2 was applied to the
input of each LSTM cell. We optimized using Adam, a batch size
of 128, and a learning rate of 0.0001 [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]. All models took
approximately three hours to reach convergence on a Nvidia TITAN X
GPU.
      </p>
      <p>
        The L1 regularization parameter α was obtained with the
scikitlearn library [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ] by minimizing the error in the four-fold cross
validation on training set.
      </p>
      <p>
        In all of our experiments, we analyzed the log(sales) of an item
as a function of textual description features. We used mixed-efects
regression to model the relationship between these two entities.
We included linguistic features obtained by the methods of Section
2.2 and 2.3 as fixed efect variables, and the confounding
product/vendor identifiers in the test set as random efect variables. We
used the “lme4” package in the R software environment v. 3.3.3
to perform these analyses [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. To evaluate feature efectiveness
and goodness of fit, we obtained conditional and marginal R2
values with the “MuMIn” R package [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We also performed t-tests
to obtain significance measurements on the model’s fitted
parameters. For this we obtained degrees of freedom with Satterthwaite
approximations [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ] with the “lmerTest” R package[
        <xref ref-type="bibr" rid="ref32">32</xref>
        ].
      </p>
      <p>In addition to keywords, we experimented with two additional
types of features: description length in number of keywords and
part-of-speech tags obtained with JUMAN.
3.3</p>
    </sec>
    <sec id="sec-11">
      <title>Experimental Results</title>
      <p>Influence of narratives. Figure 2 depicts the performance of
mixed-efects regression models fitted with the top 500 features
from each approach. Overall, these results strongly support the
hypothesis that narrative elements of product descriptions are
predictive of consumer behavior. Adding text features to the model
increased its explanatory power in all settings. The marginal Rm2 ’s of
each approach are listed on Table 3. The RNN+GF method selected
features superior in both marginal and conditional R2. This implies
that it could select features that perform well in both isolated and
confound-combined settings.</p>
      <p>To investigate whether the high performance of RNN+GF
features is simply a phenomenon of model capacity, we compared
RNN+GF and one of the best-performing baselines, that of the lasso.
We varied the number of features each algorithm is allowed to
select and compared the resulting conditional R2 values, finding
that RNN+GF features are consistently on-par with or outperform
that of the lasso, regardless of feature count as shown in Figure 3.</p>
      <p>Efect of gradient reversal To determine the role of gradient
reversal in the eficacy of the RNN+GF features, we conducted
an ablation test, toggling the gradient reversal layer of our model
and observing the performance of the elicited features. From
Table 4, it is apparent that the confound-invariant representations
encouraged by gradient reversal lead to more efective features
being selected. Apart from summary statistics, this observation
can be seen in the features themselves. For example, one of the
highest scoring morphemes without gradient reversal was 無料</p>
      <p>Comparison of diferent feature mining strategies. To
investigate whether the proposed method successfully discovered
features that are simultaneously explanatory of sales and
untangled from the confounding efects of product, brand, and price, we
computed the correlations between BPE tokens selected by
diferent methods and these non-linguistic confounds. For each feature
set, the average per-feature Cramer’s V was computed for product
and brand, while the average per-feature point-biserial correlation
coeficient was computed for price. Our results indicate that the
RNN+GF features are less correlated with these confounds than
any other method (Table 5).
(“free”). The RNN+GF features, on the other hand, are devoid of
words relating to brand/vendor/price.</p>
      <p>Examining the keywords selected by diferent methods suggests
the same story as Table 5. Morpheme features with high importance
values are listed in Table 6. Note that the RNN+GF approach was the
only method that did not select any keywords correlated with
product, brand, or price. Additionally, every method except RNN+GF
selected pecan (ピーカン・ペカン). Lalala’s pecan chocolate is
one of the most popular products on the marketplace. Although
it is understandable that these tokens contribute to sales, they are
product-specific and thus not generalizable. On the other hand,
RNN+GF gave high scores to location-related words. Similar
tendencies were observed in the health category. BPE tokens, though
not listed, followed similar patterns.
3.4</p>
    </sec>
    <sec id="sec-12">
      <title>Analysis</title>
      <p>Influential words. To investigate the influence of keywords on
sales, we performed t-tests on the coeficients of mixed-efects
models trained with RNN+GF-selected features (both morphemes and
BPE). We found out that influential descriptions generally contained
words in the following four categories:
• Informativeness This includes informative appeals to
logos with language other than raw product attributes
(i.e. brand name, product name, ingredients, price, and
shipping). Words like “family size” (ファミリーサイズ),
“package design” (パッケージデザイン), “souvenir” (お
土産), delimiters of structured information (“】【”, “★”,
“●”), and indicators of detail (“x2”, “70%”, etc.) belong to
this category.
• Authority This includes appeals to authority, in the form
of authoritative figures or long-standing tradition. Words
such as “staf” ( スタッフ), “old-standing shop” (老舗),
and “doctor” (お医者様) belong to this category.
• Seasonality These words suggest seasonal dependencies.</p>
      <p>Words such as “Christmas” (クリスマス), “Mother’s day”
(母の日), and “year-end gift” (歳暮) belong to this category.
Note that words related to out-of-season events had low
influence on sales.
• Politeness These expressions show politeness,
respectfulness, and humbleness. Honorific Japanese (special words
and conjugations reserved for polite contexts) such as “ing”
(しており), “will do” (致します), “receive” (いただく)
belong to this category.</p>
      <p>The following are two difering descriptions of the exact same
product. Words with high coeficients are shown in bold.</p>
      <p>Royce’s chocolate has become a standard Hokkaido
souvenir. They are packaged one by one so your
hands won’t get dirty! Also, our staf recommends
this product!
北海道のお土産で定番品となっているロイ
ズ. 手が汚れないように1本ずつパッケージさ
れているのもありがたい! 当店 スタッフもお
すすめするロイズの自信作です！
Four types of nuts: almonds, cashews, pecans, macadamia,
as well as cookie crunch and almond puf were
packed carefully into each chocolate bar. This item
is shipped with a refrigerated courier service
during the summer.
アーモンド、カシュー、ペカン、マカダミ
ア の4種 類 の ナ ッ ツ と ク ッ キ ー ク ラ ン チ や
アーモンドパフを一本のチョコレートバー
にぎっしり詰め込みました。こちらは夏期
クール便発送商品です。</p>
      <p>The item with the former description was preferred by customers.
It contains words suggestive of authority (“standard”, “staf”),
informativeness (“package”, “souvenir”), and concern for the customer
while the latter description is primarily concerned with ingredients.</p>
      <sec id="sec-12-1">
        <title>Influential part-of-speech tags. We found a large number of</title>
        <p>adjectives and adverbs in our influential word lists. This agrees with
the influential word categories mentioned previously, because
adjectives and adverbs can be indicative of informativeness. We found
that adjectives were more frequently influential in the chocolate
category while adverbs were more common in the health category.
Adjectives describing additional information such as “loved”(大好
きだ), “healthy”(健康だ), and “perfect for”(ぴったりだ) had high
coeficients in the chocolate category. Adverbs describing
symptoms or efect such as “irritated”( イライラ) and “vigorously” (ガ
ンガン) appeared in the health category.
4</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>RELATED WORK</title>
      <p>In using large-scale text mining to characterize the behavior of
e-commerce consumers, we draw on a large body of prior work in
the space. Our inspiration comes from research on (i) unearthing
the drivers of purchasing behavior in e-commerce, (ii) modeling the
relationship between product presentations and business outcomes,
and (iii) text mining and feature discovery in a confound-controlled
setting.</p>
      <p>
        There is an extensive body of literature on the progenitors of
e-commerce purchasing behavior. Classic work in psychology has
shown that human and judgment and behavior influenced by
persuasive rhetoric [
        <xref ref-type="bibr" rid="ref12 ref49">12, 49</xref>
        ]. When our notions of human behavior
are narrowed to purchasing decisions on the internet, despite the
extreme diversity of online shoppers [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ], prior work suggests
that vendor-disseminated information exhibits a strong
persuasive influence. In fact, vendor-disseminated information afects
purchase likelihood just as much as user-generated information
like word-of-mouth reviews [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The work of [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] incorporated
vendor-disseminated product information into a model of customer
satisfaction, a precursor of purchasing behavior [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Similar work
has shown that product presentation (which entails textual
descriptions) has a significant impact on perceived convenience [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] and
credibility [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ].
      </p>
      <p>
        We also draw from prior research concerned with mining
ecommerce information and predicting sales outcomes. Most of the
work in this space is concerned with product reviews, not
descriptions. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] mined product reviews for textual features that
are predictive of economic outcomes. This research used summary
statistics of review text like length, Flesch-Kincaid readability scores
[
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], or, in the paradigm of [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], cluster membership in a semantic
embedding space. Similar to us, [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] used product reviews to
generate a domain-specific lexicon. However, this lexicon was used to
predict sentiment, and then sales was predicted from sentiment. Some
research has incorporated information from textual descriptions,
but the best of these authors knowledge, the efect of descriptions
alone is not studied. [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ] used human subjects to illicit preferences
between descriptions and actual products, but did not compare
between descriptions. [
        <xref ref-type="bibr" rid="ref53">53</xref>
        ] tagged product descriptions with
sentiment information and used this alongside review information to
predict sales. Similarly, [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and [
        <xref ref-type="bibr" rid="ref54">54</xref>
        ] used description labellings and
summary statistics alongside other features to predict purchasing
intent. Importantly, none of the prior work in this space seeks to
untangle the influence of confounding hidden variables (e.g. brand
loyalty, pricing strategies) from mined features.
      </p>
      <p>
        Another body of research we draw from is that concerned with
text mining and lexicon discovery in a confound-controlled setting.
Using odds ratios to select features and hierarchical regression to
determine their importance is a canonical technique in the
computational linguistics literature [
        <xref ref-type="bibr" rid="ref19 ref28">19, 28</xref>
        ]. In general, alternative feature
mining methods for downstream regression or classification tasks
are rarely explored. [
        <xref ref-type="bibr" rid="ref50">50</xref>
        ] began with a set of hand-compiled corpora,
then ran t-tests to prune these corpora of insignificant keywords.
[
        <xref ref-type="bibr" rid="ref43">43</xref>
        ] developed a neural architecture that picks out keywords from
a passage. However, this group did not use an attention mechanism
to pick these words, and the model was developed for
summarization applications. In the e-commerce literature, work alternatives
to odds-ratio still rely on uncontrolled co-occurrence statistics [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-14">
      <title>CONCLUSION</title>
      <p>In this paper, we discovered that that seasonal, polite,
authoritative and informative product descriptions led to the best business
outcomes in Japanese e-commerce.</p>
      <p>In making these observations, we presented a statistical method
that infers consumer demand from e-commerce product
descriptions. We showed for the first time that words in the embedded
narratives of product descriptions are important determinants of
sales, even when accounting for the influence of factors like brand
loyalty and item identity.</p>
      <p>In the process, we noted the inadequacies of traditional text
feature-selection algorithms, namely their ability to select features
that are decorrelated from these factors. To this end we presented
a novel neural network feature selection method. The features
generated by this model are both high-performance and
confounddecorrelated.</p>
      <p>There are many directions for future work. These include
extending our feature selectors to the broader setting of generalized
lexicon induction, and applying our statistical models to e-commerce
markets in other consumer cultures.</p>
    </sec>
    <sec id="sec-15">
      <title>ACKNOWLEDGMENTS</title>
      <p>We are grateful to David Jurgens and Will Hamilton for their advice.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Martín</given-names>
            <surname>Abadi</surname>
          </string-name>
          , Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis,
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Matthieu</given-names>
            <surname>Devin</surname>
          </string-name>
          , and others.
          <source>2016</source>
          .
          <article-title>Tensorflow: Large-scale machine learning on heterogeneous distributed systems</article-title>
          .
          <source>arXiv preprint arXiv:1603.04467</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Nikolay</given-names>
            <surname>Archak</surname>
          </string-name>
          , Anindya Ghose, and Panagiotis G Ipeirotis.
          <year>2011</year>
          .
          <article-title>Deriving the pricing power of product features by mining consumer reviews</article-title>
          .
          <source>Management Science</source>
          <volume>57</volume>
          ,
          <issue>8</issue>
          (
          <year>2011</year>
          ),
          <fpage>1485</fpage>
          -
          <lpage>1509</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Dzmitry</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          , Kyunghyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>International Conference on Learning Representations (ICLR)</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Billy</given-names>
            <surname>Bai</surname>
          </string-name>
          , Rob Law, and
          <string-name>
            <given-names>Ivan</given-names>
            <surname>Wen</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>The impact of website quality on customer satisfaction and purchase intentions: Evidence from Chinese online visitors</article-title>
          .
          <source>International journal of hospitality management 27</source>
          ,
          <issue>3</issue>
          (
          <year>2008</year>
          ),
          <fpage>391</fpage>
          -
          <lpage>402</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Kamil</given-names>
            <surname>Bartoń</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>MuMIn: Multi-model inference</article-title>
          .
          <source>R package version 1</source>
          .9. 13.
          <string-name>
            <surname>The Comprehensive R Archive Network</surname>
          </string-name>
          (CRAN), Vienna, Austria (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Douglas</given-names>
            <surname>Bates</surname>
          </string-name>
          , Martin Mächler, Ben Bolker, and
          <string-name>
            <given-names>Steve</given-names>
            <surname>Walker</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Fitting Linear Mixed-Efects Models Using lme4</article-title>
          .
          <source>Journal of Statistical Software</source>
          <volume>67</volume>
          ,
          <issue>1</issue>
          (
          <year>2015</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Douglas</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Bates</surname>
          </string-name>
          .
          <year>2010</year>
          . lme4:
          <article-title>Mixed-efects modeling with R. (</article-title>
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Shai</given-names>
            <surname>Ben-David</surname>
          </string-name>
          , John Blitzer, Koby Crammer, Fernando Pereira, and others.
          <year>2007</year>
          .
          <article-title>Analysis of representations for domain adaptation</article-title>
          .
          <source>Advances in neural information processing systems</source>
          <volume>19</volume>
          (
          <year>2007</year>
          ),
          <fpage>137</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Barbara</given-names>
            <surname>Bickart and Robert M Schindler</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Internet forums as influential sources of consumer information</article-title>
          .
          <source>Journal of interactive marketing 15</source>
          ,
          <issue>3</issue>
          (
          <year>2001</year>
          ),
          <fpage>31</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J Martin</given-names>
            <surname>Bland and Douglas G Altman</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>The odds ratio</article-title>
          .
          <source>Bmj</source>
          <volume>320</volume>
          ,
          <issue>7247</issue>
          (
          <year>2000</year>
          ),
          <fpage>1468</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Denny</surname>
            <given-names>Britz</given-names>
          </string-name>
          , Anna Goldie, Thang Luong, and
          <string-name>
            <given-names>Quoc</given-names>
            <surname>Le</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Massive Exploration of Neural Machine Translation Architectures</article-title>
          .
          <source>arXiv preprint arXiv:1703.03906</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Shelly</surname>
            <given-names>Chaiken</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Mark P Zanna</given-names>
            ,
            <surname>James M Olson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C Peter</given-names>
            <surname>Herman</surname>
          </string-name>
          .
          <year>1987</year>
          .
          <article-title>The heuristic model of persuasion</article-title>
          .
          <source>In Social influence: the ontario symposium</source>
          , Vol.
          <volume>5</volume>
          . Hillsdale, NJ: Lawrence Erlbaum,
          <fpage>3</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Judith</surname>
            <given-names>A Chevalier</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Dina</given-names>
            <surname>Mayzlin</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>The efect of word of mouth on sales: Online book reviews</article-title>
          .
          <source>Journal of marketing research 43</source>
          ,
          <issue>3</issue>
          (
          <year>2006</year>
          ),
          <fpage>345</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>L</given-names>
          </string-name>
          <string-name>
            <surname>Elman</surname>
          </string-name>
          .
          <year>1990</year>
          .
          <article-title>Finding structure in time</article-title>
          .
          <source>Cognitive science 14</source>
          ,
          <issue>2</issue>
          (
          <year>1990</year>
          ),
          <fpage>179</fpage>
          -
          <lpage>211</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Richard</surname>
            <given-names>Friberg</given-names>
          </string-name>
          , Mattias Ganslandt, and
          <string-name>
            <given-names>Mikael</given-names>
            <surname>Sandström</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Pricing strategies in e-commerce: Bricks vs</article-title>
          .
          <source>clicks. Technical Report</source>
          . IUI working paper.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Yaroslav</surname>
            <given-names>Ganin</given-names>
          </string-name>
          , Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and
          <string-name>
            <given-names>Victor</given-names>
            <surname>Lempitsky</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Domain-adversarial training of neural networks</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>17</volume>
          ,
          <issue>59</issue>
          (
          <year>2016</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>David</given-names>
            <surname>Gefen</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Customer loyalty in e-commerce</article-title>
          .
          <source>Journal of the association for information systems 3</source>
          ,
          <issue>1</issue>
          (
          <year>2002</year>
          ),
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Anindya</given-names>
            <surname>Ghose</surname>
          </string-name>
          and Panagiotis G Ipeirotis.
          <year>2011</year>
          .
          <article-title>Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>23</volume>
          ,
          <issue>10</issue>
          (
          <year>2011</year>
          ),
          <fpage>1498</fpage>
          -
          <lpage>1512</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Anindya</given-names>
            <surname>Ghose</surname>
          </string-name>
          and
          <string-name>
            <given-names>Arun</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Evaluating pricing strategy using e-commerce data: Evidence and estimation challenges</article-title>
          .
          <source>Statist. Sci</source>
          . (
          <year>2006</year>
          ),
          <fpage>131</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>David</given-names>
            <surname>Godes</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dina</given-names>
            <surname>Mayzlin</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Using online conversations to study word-of-mouth communication</article-title>
          .
          <source>Marketing science 23</source>
          ,
          <issue>4</issue>
          (
          <year>2004</year>
          ),
          <fpage>545</fpage>
          -
          <lpage>560</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Dennis</given-names>
            <surname>Herhausen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jochen</given-names>
            <surname>Binder</surname>
          </string-name>
          , Marcus Schoegel, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Herrmann</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Integrating bricks with clicks: retailer-level and channel-level outcomes of online-ofline channel integration</article-title>
          .
          <source>Journal of Retailing 91</source>
          ,
          <issue>2</issue>
          (
          <year>2015</year>
          ),
          <fpage>309</fpage>
          -
          <lpage>325</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Chin-Fu Ho</surname>
          </string-name>
          and
          <string-name>
            <surname>Wen-Hsiung Wu</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Antecedents of customer satisfaction on the Internet: an empirical study of online shopping</article-title>
          .
          <source>In Systems Sciences, 1999. HICSS-32. Proceedings of the 32nd Annual Hawaii International Conference on. IEEE</source>
          , 9-pp.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jürgen</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9</source>
          ,
          <issue>8</issue>
          (
          <year>1997</year>
          ),
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Minqing</given-names>
            <surname>Hu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bing</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Mining and summarizing customer reviews</article-title>
          .
          <source>In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM</source>
          ,
          <volume>168</volume>
          -
          <fpage>177</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Nan</surname>
            <given-names>Hu</given-names>
          </string-name>
          , Jie Zhang, and Paul A Pavlou.
          <year>2009</year>
          .
          <article-title>Overcoming the J-shaped distribution of product reviews</article-title>
          .
          <source>Commun. ACM</source>
          <volume>52</volume>
          ,
          <issue>10</issue>
          (
          <year>2009</year>
          ),
          <fpage>144</fpage>
          -
          <lpage>147</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Ling</surname>
            <given-names>Jiang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Zhilin</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Minjoon</given-names>
            <surname>Jun</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Measuring consumer perceptions of online shopping convenience</article-title>
          .
          <source>Journal of Service Management</source>
          <volume>24</volume>
          ,
          <issue>2</issue>
          (
          <year>2013</year>
          ),
          <fpage>191</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Fredrik</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Johansson</surname>
            , Uri Shalit, and
            <given-names>David</given-names>
          </string-name>
          <string-name>
            <surname>Sontag</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Learning Representations for Counterfactual Inference</article-title>
          .
          <source>In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16)</source>
          . JMLR.org,
          <volume>3020</volume>
          -
          <fpage>3029</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Dan</surname>
            <given-names>Jurafsky</given-names>
          </string-name>
          , Victor Chahuneau,
          <string-name>
            <surname>Bryan R Routledge</surname>
          </string-name>
          ,
          <source>and Noah A Smith</source>
          .
          <year>2014</year>
          .
          <article-title>Narrative framing of consumer sentiment in online restaurant reviews</article-title>
          .
          <source>First Monday</source>
          <volume>19</volume>
          ,
          <issue>4</issue>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J Peter</given-names>
            <surname>Kincaid</surname>
          </string-name>
          , Robert P Fishburne Jr, Richard L Rogers, and Brad S Chissom.
          <year>1975</year>
          .
          <article-title>Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel</article-title>
          .
          <source>Technical Report. DTIC Document.</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Diederik</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>International Conference for Learning Representations</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Ryan</surname>
            <given-names>Kiros</given-names>
          </string-name>
          , Ruslan Salakhutdinov, and Richard S Zemel.
          <year>2014</year>
          .
          <article-title>Unifying visualsemantic embeddings with multimodal neural language models</article-title>
          .
          <source>arXiv preprint arXiv:1411.2539</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Alexandra</surname>
            <given-names>Kuznetsova</given-names>
          </string-name>
          ,
          <source>Per Bruun Brockhof, and Rune Haubo Bojesen Christensen</source>
          .
          <year>2015</year>
          . Package 'lmerTest'.
          <source>R package version 2</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Raymond</surname>
            <given-names>YK Lau</given-names>
          </string-name>
          , Wenping Zhang,
          <string-name>
            <surname>Peter D Bruza</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kam-Fai Wong</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Learning domain-specific sentiment lexicons for predicting product sales</article-title>
          . In e-Business
          <string-name>
            <surname>Engineering</surname>
          </string-name>
          (ICEBE),
          <source>2011 IEEE 8th International Conference on. IEEE</source>
          ,
          <fpage>131</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Eun-Ju Lee</surname>
            and
            <given-names>Soo Yun</given-names>
          </string-name>
          <string-name>
            <surname>Shin</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>When do consumers buy online product reviews? Efects of review quality, product type, and reviewer's photo</article-title>
          .
          <source>Computers in Human Behavior</source>
          <volume>31</volume>
          (
          <year>2014</year>
          ),
          <fpage>356</fpage>
          -
          <lpage>366</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Lee</surname>
          </string-name>
          and
          <string-name>
            <surname>Eric T Bradlow</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Automatic construction of conjoint attributes and levels from online customer reviews</article-title>
          . University Of Pennsylvania, The Wharton School Working Paper (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>Ziqi</given-names>
            <surname>Liao</surname>
          </string-name>
          and Michael Tow Cheung.
          <year>2001</year>
          .
          <article-title>Internet-based e-shopping and consumer attitudes: an empirical study</article-title>
          .
          <source>Information &amp; management 38</source>
          ,
          <issue>5</issue>
          (
          <year>2001</year>
          ),
          <fpage>299</fpage>
          -
          <lpage>306</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Moez</surname>
            <given-names>Limayem</given-names>
          </string-name>
          , Mohamed Khalifa, and
          <string-name>
            <given-names>Anissa</given-names>
            <surname>Frini</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>What makes consumers buy from Internet? A longitudinal study of online shopping</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans</source>
          <volume>30</volume>
          ,
          <issue>4</issue>
          (
          <year>2000</year>
          ),
          <fpage>421</fpage>
          -
          <lpage>432</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Ying</surname>
            <given-names>Liu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Hong</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Geng</given-names>
            <surname>Peng</surname>
          </string-name>
          , Benfu Lv,
          <string-name>
            <given-names>and Chong</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Online purchaser segmentation and promotion strategy selection: evidence from Chinese E-commerce market</article-title>
          .
          <source>Annals of Operations Research</source>
          <volume>233</volume>
          ,
          <issue>1</issue>
          (
          <year>2015</year>
          ),
          <fpage>263</fpage>
          -
          <lpage>279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Gerald L Lohse and Peter Spiller</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Quantifying the efect of user interface design features on cyberstore trafic and sales</article-title>
          .
          <source>In Proceedings of the SIGCHI conference on Human factors in computing systems</source>
          . ACM Press/Addison-Wesley Publishing Co.,
          <fpage>211</fpage>
          -
          <lpage>218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Lowd</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Meek</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Adversarial learning</article-title>
          .
          <source>In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM</source>
          ,
          <volume>641</volume>
          -
          <fpage>647</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <surname>Christopher</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>Hinrich</given-names>
          </string-name>
          <string-name>
            <surname>Schütze</surname>
          </string-name>
          , and others.
          <source>1999</source>
          .
          <article-title>Foundations of statistical natural language processing</article-title>
          . Vol.
          <volume>999</volume>
          . MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>Deborah</surname>
            <given-names>Brown McCabe</given-names>
          </string-name>
          and
          <string-name>
            <surname>Stephen M Nowlis</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>The efect of examining actual products or product descriptions on consumer preference</article-title>
          .
          <source>Journal of Consumer Psychology</source>
          <volume>13</volume>
          ,
          <issue>4</issue>
          (
          <year>2003</year>
          ),
          <fpage>431</fpage>
          -
          <lpage>439</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <surname>Rui</surname>
            <given-names>Meng</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Sanqiang</given-names>
            <surname>Zhao</surname>
          </string-name>
          , Shuguang Han,
          <string-name>
            <given-names>Daqing</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter Brusilovsky</surname>
            , and
            <given-names>Yu</given-names>
          </string-name>
          <string-name>
            <surname>Chi</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Deep Keyphrase Generation. Annual Meeting of the Association for Computational Linguistics (</article-title>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>Shinichi</given-names>
            <surname>Nakagawa</surname>
          </string-name>
          and
          <string-name>
            <given-names>Holger</given-names>
            <surname>Schielzeth</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A general and simple method for obtaining R2 from generalized linear mixed-efects models</article-title>
          .
          <source>Methods in Ecology and Evolution 4</source>
          ,
          <issue>2</issue>
          (
          <year>2013</year>
          ),
          <fpage>133</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <surname>Franklin</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Satterthwaite</surname>
          </string-name>
          .
          <year>1946</year>
          .
          <article-title>An approximate distribution of estimates of variance components</article-title>
          .
          <source>Biometrics bulletin 2</source>
          ,
          <issue>6</issue>
          (
          <year>1946</year>
          ),
          <fpage>110</fpage>
          -
          <lpage>114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <surname>Rico</surname>
            <given-names>Sennrich</given-names>
          </string-name>
          , Barry Haddow, and
          <string-name>
            <given-names>Alexandra</given-names>
            <surname>Birch</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Neural Machine Translation of Rare Words with Subword Units</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12</source>
          ,
          <year>2016</year>
          , Berlin, Germany.
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <surname>Srini</surname>
            <given-names>S Srinivasan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rolph Anderson</surname>
            , and
            <given-names>Kishore</given-names>
          </string-name>
          <string-name>
            <surname>Ponnavolu</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Customer loyalty in e-commerce: an exploration of its antecedents and consequences</article-title>
          .
          <source>Journal of retailing 78</source>
          ,
          <issue>1</issue>
          (
          <year>2002</year>
          ),
          <fpage>41</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <surname>Brian</surname>
            <given-names>Sternthal</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ruby</given-names>
            <surname>Dholakia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Clark</given-names>
            <surname>Leavitt</surname>
          </string-name>
          .
          <year>1978</year>
          .
          <article-title>The persuasive efect of source credibility: Tests of cognitive response</article-title>
          .
          <source>Journal of Consumer research 4</source>
          ,
          <issue>4</issue>
          (
          <year>1978</year>
          ),
          <fpage>252</fpage>
          -
          <lpage>260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <surname>Chenhao</surname>
            <given-names>Tan</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Bo</given-names>
            <surname>Pang</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The efect of wording on message propagation: Topic-and author-controlled natural experiments on Twitter. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</article-title>
          . ACL, Baltimore, Maryland,
          <fpage>175</fpage>
          -
          <lpage>185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>Robert</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          .
          <year>1996</year>
          .
          <article-title>Regression shrinkage and selection via the lasso</article-title>
          .
          <source>Journal of the Royal Statistical Society. Series B (Methodological)</source>
          (
          <year>1996</year>
          ),
          <fpage>267</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <surname>Lou</surname>
            <given-names>W</given-names>
          </string-name>
          <string-name>
            <surname>Turley and Ronald E Milliman</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Atmospheric efects on shopping behavior: a review of the experimental evidence</article-title>
          .
          <source>Journal of business research 49</source>
          ,
          <issue>2</issue>
          (
          <year>2000</year>
          ),
          <fpage>193</fpage>
          -
          <lpage>211</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <surname>Hui</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , Wei Xu,
          <string-name>
            <given-names>Qian</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Raymond</given-names>
            <surname>Lau</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Topic sentiment mining for sales performance prediction in e-commerce</article-title>
          .
          <source>Annals of Operations Research</source>
          (
          <year>2017</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [54]
          <string-name>
            <surname>Cai-Nicolas</surname>
            <given-names>Ziegler</given-names>
          </string-name>
          , Lars Schmidt-Thieme, and
          <string-name>
            <given-names>Georg</given-names>
            <surname>Lausen</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Exploiting semantic product descriptions for recommender systems</article-title>
          .
          <source>In Proceedings of the 2nd ACM SIGIR Semantic Web and Information Retrieval Workshop</source>
          . 25-
          <fpage>29</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>