Ranking sentences from product description & bullets for better
                          search
                  Prateek Verma                                      Aliasgar Kutiyanawala                                   Ke Shen
            Jet.com and Walmart Labs                                  Jet.com and Walmart Labs                     Jet.com and Walmart Labs
               Hoboken, New Jersey                                       Hoboken, New Jersey                          Hoboken, New Jersey
             prateek.verma@jet.com                                         aliasgar@jet.com                             ke.shen@jet.com

ABSTRACT
Products in an ecommerce catalog contain information-rich fields
like description and bullets that can be useful to extract entities
(attributes) using NER based systems. However, these fields are
often verbose and contain lot of information that is not relevant
from a search perspective. Treating each sentence within these
fields equally can lead to poor full text match and introduce prob-
lems in extracting attributes to develop ontologies, semantic search
etc. To address this issue, we describe two methods based on ex-
tractive summarization with reinforcement learning by leveraging
information in product titles and search click through logs to rank
sentences from bullets, description, etc. Finally, we compare the
precision of these two models.
                                                                                                 Figure 1: Sample SKU Image and Bullets
CCS CONCEPTS
• Information systems → Information extraction; Summa-
rization.                                                                             present in the sentence and how well it describes the item, we con-
                                                                                      sider third bullet as more relevant to the item than the second bullet
KEYWORDS                                                                              from search perspective. Tokens highlighted in Figure 1 with red
Search and Ranking, Information Retrieval, Extractive Summariza-                      and green color denote irrelevant and relevant features respectively
tion, Reinforcement Learning, E-Commerce, Information Extraction                      for the SKU.
ACM Reference Format:
Prateek Verma, Aliasgar Kutiyanawala, and Ke Shen. 2019. Ranking sen-                 Typically, this problem tends to appear in fields like product descrip-
tences from product description & bullets for better search. In Proceedings           tion and bullets which are often verbose and contain information
of the SIGIR 2019 Workshop on eCommerce (SIGIR 2019 eCom), 6 pages.                   about the SKU (stock keeping unit, a term used to describe item
                                                                                      sold on the site) that is not pertinent to the item. Circuitous de-
1 INTRODUCTION                                                                        scriptions of the product and Keyword stuffing are real concern
Many search engine frameworks like Solr [7] and ElasticSearch [6]                     in ecommerce. Keyword stuffing refers to the practice of loading
treat each sentence within a field in the document equally and this                   product data with keywords that may not be relevant with the item
can lead to irrelevant documents present in the recall set. Consider                  being sold. Figure 2, which is a description of a SKU, illustrates this.
Figure 1 which shows a sample item and some information associ-
ated in bullet form from an ecommerce website. The second bullet                      Product descriptions also tend to contain negations. That is, they
contains the terms "soups", "casseroles" and "meat" because of                        describe what the product is NOT and what it is not suitable for.
which, the item (mushroom) will be present in the recall set for the                  These kind of sentences are technically legitimate but poses a chal-
search queries containing tokens like "soups" and "casseroles"                        lenge for search engines and have the effect of returning misleading
due to full text match, leading to poor search relevancy. Relevant                    or irrelevant results.
features for this SKU can be thought of as attributes that could be
used in a search query to find this product. Thus, " gluten free"                     A naive solution is to ignore these fields completely for search.
and "non-GMO" are considered relevant. Based on the attributes                        While this may improve precision, it would be at the cost of recall,
                                                                                      as relevant information might be lost. Such relevancy problems
                                                                                      are mitigated by having semantic search using methods like query
Copyright © 2019 by the paper’s authors. Copying permitted for private and academic   understanding. However, they require SKUs to have relevant at-
purposes.
In: J. Degenhardt, S. Kallumadi, U. Porwal, A. Trotman (eds.):                        tributes (atomic phrases that provide more information about an
Proceedings of the SIGIR 2019 eCom workshop, July 2019, Paris, France, published at   item [10]) present in them to match it with user’s intent. Thus,
http://ceur-ws.org                                                                    attribute extraction from the catalog data is often done in order to
                                                                                      enrich SKUs (documents) with relevant attributes.
SIGIR 2019 eCom, July 2019, Paris, France                                             Prateek Verma, Aliasgar Kutiyanawala, and Ke Shen


                                                                         Extractive Summarization was traditionally done using hand en-
                                                                        gineered features, such as sentence position, length [21], words
                                                                        present in the sentence, their part of speech tags, frequency etc [18].
                                                                        However, with the recent success of encoder-decoder model, it is
                                                                        being used in Extractive Summarization as well, such as [3] [17]
                                                                        and [16]. In [3], Cheng et al. developed a framework composed of
                                                                        hierarchical document encoder and attention based extractor for ex-
                                                                        tractive summarization. In [17], Narayan et al. used the hierarchical
                                                                        encoder and attention based decoder to leverage side informations
                                                                        like title, image caption etc. and in [16] they introduced a new ob-
                                                                        jective function based on ROUGE and used reinforcement learning
                                                                        to optimize it.
              Figure 2: Sample SKU Description
                                                                        In this paper, we try to rank sentences using summarization tech-
                                                                        niques for the purpose of improving search relevancy. There hasn’t
                                                                        been lot of work done in this area. One of the work that is aligned
In this paper we describe a method to rank sentences based on if        with our objective is from Ryen et. al [26] published in 2002. They
they are relevant from search perspective, and select top K sen-        use statistical measures like frequency of query terms present in the
tences for search from these fields. Top K ranked set of sentences      sentence to rank them, and recommend user documents from the
can lead to better full text match and can also help in extracting      recall set by presenting them with ranked set of sentences for web
attributes for developing the ontology for semantic search [10] as      search. However, our work focuses on ecommerce setting where
higher ranked sentences would have larger probability of attributes     we leverage Reinforcement Learning paradigm to rank sentences
correctly describing the product. In our experiment, we limit K         with the purpose of improving search by affecting recall/precision.
to 3. Thus, given a description of length greater than three, we
always pick top three sentences generated by the model as our final
                                                                        3    PROBLEM FORMULATION
summary.
                                                                        Our objective is to rank sentences in product description and bullets
Our contribution in this paper is, we demonstrate how Extractive        from a search perspective. Search perspective means that when we
Summarization can be used to rank sentences present in product          extract attributes from sentences, they are relevant to the item and
description and bullets using product title and user queries obtained   are likely to be used in a search query for that item. Methods like
from click through log. One of the benefits of this method is, cost     query understanding can benefit from ranked sentences as they use
of obtaining training data is cheap and the model can be run on         attributes in SKU to match with the user’s intent. Higher ranked
items that have little or no click data associated with it. We also     sentences are more likely to contain relevant attribute than a lower
provide comparison of the two models by measuring precision@k           ranked sentences. Having a set of top ranked sentences would also
of relevant sentences in the summary.                                   help in full text match by avoiding queries to match with irrelevant
                                                                        sentences. We use Extractive Summarization to achieve this. Our
                                                                        work is based on [16] which treats summarization task as a ranking
2   RELATED WORK                                                        problem and training is done by optimizing combination of ROUGE
                                                                        metric and cross entropy using reinforcement learning (described
Summarization is the process of shortening a text document in order     in 3.2). ROUGE stands for Recall-Oriented Understudy for Gisting
to create a summary while retaining major points of the original doc-   Evaluation. It is a metric to compare automatically generated sum-
ument. There are two kinds of summarization techniques: Abstrac-        mary with the reference summary. ROUGE makes use of the count
tive and Extractive summarization. Abstractive summarization in-        of overlapping units such as N-gram between the two summaries
volves using internal semantic representation and natural language      to measure the quality of system generated summary [13]. Here we
generation techniques to create the summary [2] [23], [24]. Ex-         specifically use F1 score of three ROUGE scores mentioned below:
tractive summarization involves selecting existing subset of words,
phrases and sentences in the original text to generate the sum-
mary [5], [14], [28].                                                       • ROUGE-1: refers to the overlap of 1-gram between candidate
                                                                              summary and the reference summary (in our case title and
Recently, a lot of work has been done on Abstractive Summarization            queries)
using attentional encoder-decoder model that was proposed by                • ROUGE-2: refers to the overlap of bi-gram
Sutskever et. al in [25]. In [15], Nallapati et al. modeled abstrac-        • ROUGE-L: measures Longest Common Subsequence based
tive summarization using Attentional Encoder Decoder Recurrent                statistics to compute similarity between the two summaries
Neural Networks. While in [20], Paulus et. al introduced a new
objective function that combined cross entropy loss with rewards        We use ROUGE because it is well aligned with our objective of
from policy gradient reinforcement learning which improved state        finding relevant sentences from SKU description and bullets that is
of the art in abstractive summarization.                                similar to the title and user engagement data (queries). It is the eval-
                                                                        uation metric used in most summarization system, and training the
Ranking sentences from product description & bullets for better search                            SIGIR 2019 eCom, July 2019, Paris, France


model on a combination of ROUGE and cross entropy is shown to be         demonstrated in previous work [25], [11], [17].
superior than using just cross entropy [16]. REINFORCE algorithm
is shown to improve sequence to sequence based text rewriting            Finally, Sentence extracter sequentially labels each sentence as 1 or 0
systems by optimizing non-differentiable objective function like         depending upon if the sentence is relevant or not. It is implemented
ROUGE [22] [12], so we use reinforcement learning to optimize our        using RNN with LSTM cells and a softmax layer. At time ti , it makes
reward function.                                                         a binary prediction conditioned on the document representation
                                                                         and previously labelled sentences. This lets it identify locally and
We use title and queries obtained from click through log as part         globally important sentences. Sentences are then ranked by the
of the target summary. Title is one of the key fields in ecommerce       score p(yi = 1|si , D, θ ). Here si is i th sentence, D is the document,
catalog provided by the merchant, it captures essential informa-         θ is the model parameter and p(yi = 1|si , D, θ ) is the probability
tion about the item and queries can be thought of as keywords            of sentence si being included in the summary. We learn to rank
users think are relevant attributes for the product. The intuition       by training the network in a reinforcement learning framework
is, having them in the target summary would allow the model to           optimizing ROUGE.
capture important sentences present in the description and bullets.
We create two models, one that uses just the title as target summary     We use a combination of maximum likelihood cross entropy loss and
and the second model that uses top five queries that led to clicks       rewards from policy gradient reinforcement learning as objective
on the item, along with the title as target summary.                     function to globally optimize ROUGE. This lets the model optimize
                                                                         the evaluation metric directly and makes it better at discrimating
Finally, we choose top K sentences as determined by the model as         sentences i.e it ranks the sentence higher if it appears often in the
our final summary. Since, ecommerce product description tend to          summary.
be short and less repetitive, the issue of repetition and diversity in
not a concern in our summarization task.                                 3.2    Policy Learning
                                                                         Reinforcement Learning is an area of machine learning where a
                                                                         software agent learns to take actions in an environment to maxi-
3.1    Network Architecture                                              mize cumulative reward. It differs from supervised learning in the
                                                                         way that labelled input/output pairs need not be provided nor are
Figure 3 depicts network architecture of the extractive summarizer.      sub-optimal actions need to be explicity corrected. Rather, the focus
It aims to extract sentences {s 1 ..sm } from a document D composed      is on the balance between exploration and exploitation. Exploitation
of sentences {s 1 ..sn } where n > m and labels them 1 or 0 based        is the act of preferring an action that it has tried in the past and was
on if they should be included in the summary or not. It learns           found to be effective, whereas exploration is the act of discovering
to assign a score p(yi |si , D, θ ) to each sentence which is directly   such actions, i.e. trying out actions that it has not selected before.
proportional to its relevance within the summary. Here, θ denotes
the model parameter, si denotes the i t h sentence and D represents      We conceptualize the summarization model in a reinforcement
the document. Summary is chosen by selecting the sentences with          learning paradigm. The model can be thought of as an agent inter-
top p(yi |si , D, θ ) score. Our network and the objective function is   acting with the environment, which consists of documents. The
based on the paper [16]. We choose a sequence to sequence network        agent reads the document D and assigns a score to each sentence
which is composed of three main components: sentence encoder,            si ∈ D using the policy p(yi |si , D, θ ). We then rank and get the
document encoder and sentence extractor.                                 sampled sentences as the summary. The agent is then given a re-
                                                                         ward based on how close the generated summary is with the gold
These components are described in detail below:                          standard summary. We use F1 score of ROUGE-1, ROUGE-2, and
                                                                         ROUGE-L as the reward r . In our case, gold standard summary is
Sentence encoder is composed of convolutional encoder which en-          the title and user queries. Agent is then updated based on the re-
codes a sentence into a continuous representation and is shown           ward using the REINFORCE algorithm [27]. REINFORCE algorithm
to capture salient features [4], [9], [8]. The encoding is performed     minimizes negative expected reward:
using kernel filter K of width h over a window of h words present
in the sentence s. This is applied to each possible window of words
                                                                                                L(θ ) = − Eŷ∼pθ [r (ŷ)]
in the sentence s to produce a feature map f ∈ R k −h+1 , where k
is the length of the sentence. Then max pooling is performed over        Here, pθ stands for p(y|D, θ ), where θ is the model parameter, D is
time on the feature maps and max value is taken corresponding to         the document and r is the reward.
this particular filter K. Specifically, we use filter of size 2 and 4.
                                                                         REINFORCE algorithm is based on the fact that the expected reward
Document encoder: The output of sentence encoder is fed to doc-          function of a non differentiable function can be computed as:
ument encoder. It composes sequence of sentences to obtain a
document representation. We use LSTM to achieve this. Given a                          ▽L(θ ) = − Eŷ∼pθ [r (ŷ) ▽ log p(ŷ|D, θ )]
document D and sequence of sentence (s 1 . . . sn ) we feed sentences    Calculating expected gradient in the above expression can be ex-
in reverse order to the model. This approach allows the model            pensive as each document can have very large number of candidate
to learn that the top sentences are more important and has been          summaries. It can be approximated by taking single sample ŷ from
SIGIR 2019 eCom, July 2019, Paris, France                                                Prateek Verma, Aliasgar Kutiyanawala, and Ke Shen


                                                           Figure 3: Network architecture


Figure 4: Retrieval by matching query understanding with
SKU understanding


pθ for each training example in a batch, following which the above                     Figure 5: Precision @ k for the baseline
expression gets simplified to:
                                                                            generating the reference summary. For input, we use product de-
            ▽L(θ )   ≈         −r (ŷ) ▽ log p(ŷ|D, θ )
                                                                            scriptions and bullets for both the models.
                         −r (ŷ) ni=1 ▽ log p(yˆi |si , D, θ )
                                Í
                     ≈
                                                                            We preprocess the title, decription and queries before passing them
Since the REINFORCE algorithm starts with a random policy, and
                                                                            to the model. Preprocessing step consists of sentence segmentation,
because our task can involve large number of candidate summaries
                                                                            tokenization, conversion of tokens into vocabulary id, truncation
for the document, training the model can be time consuming. So,
                                                                            and padding to a fixed length. We use SKUs from grocery category
we limit the search space ŷ with smaller number of high probability
                                                                            of our catalog to evaluate the models. For Model 1 we used all the
samples Ŷ consisting of top k extracts. The way we choose these
                                                                            SKUs from the grocery category and for Model 2 we used a subset
top k extracts is, we select p sentences which have highest ROUGE
                                                                            of SKUs from the category which had engagement above a certain
scores on its own and then generate all possible set of combination
                                                                            threshold. Though Model 2 had fewer training data, it was richer
using these p sentences with the constraint that maximum length of
                                                                            since it had queries (top 5) associated with each SKU as part of the
the extract can be m. We rank these against the gold summary using
                                                                            summary. One advantage of both methods is, it requires almost no
F1 score by taking mean of ROUGE-1, ROUGE-2 and ROUGE-L. We
                                                                            manual effort to get the training data, thus is very cheap. Figure 4
choose top k of these ranked summaries as Ŷ. During training, we
                                                                            describes how the two models are set up for training.
sample ŷ from Ŷ instead of p(ŷ|θ, D).
                                                                            Since our objective is to have better full text match or attributes
3.3    Input Data for model                                                 from the ranked set of sentences, each sentence can be independent
We create two summarization models, one with title as its target            of each other. This insight is well aligned with the framework of
summary (Model 1) and the other with title plus top five queries            reinforcement learning based extractive summarization that opti-
for which the product was clicked as the target summary (Model              mizes ROUGE.
2). Title and each query are treated as independent sentences when
Ranking sentences from product description & bullets for better search                            SIGIR 2019 eCom, July 2019, Paris, France


4   BASELINE MODEL
Tfidf is one of the commonly used frequency driven approches for
weighting terms to measure importance of a sentence for extractive
summarization [1], [19]. It measures the importance of words and
identifies very common words in the documents by giving low
weights to words appearing in most documents. The wieght of each
word is computed by the formula:

                    tfidf(t, d, D) = tf(t, d) · idf(t, D)
                                            N
                      idf(t, D) =
                                   |{d ∈ D : t ∈ d}|
Here, tf(t, d) is the count of the term t in the document d.
idf(t, D) is the inverse document frequency. N is the total num-
ber of documents in the corpus. |{d ∈ D : t ∈ d}| is the number of
documents where the term t appears. If the term is not present
in the corpus, it will lead to division by zero. To avoid this, it is a
common practice to adjust the denominator to 1 + |{d ∈ D : t ∈ d}|.
                                                                          Figure 6: Precision @ k for Model 1(Title only), Model 2 (Ti-
                                                                          tle and queries) and the baseline
For baseline, we use tfidf based model. Our baseline consists of
three aproaches that utilizes tfidf to score the sentences to select
top K. For the first approach, we sum up (unweighted) tfidf score
of the words to measure importance of a sentence and then select          as our baseline, as it has the best performance.
top K as the summary. Here, tf is computed at the sentence level
and idf is across all the SKUs (documents).                               Figure 6 shows precision@k for the two sequence to sequence
                                                                          based model and the baseline. Blue line indicates the model that
For the second approach (weighted), we weigh the tfidf score of           was trained using just the title as target summary (Model 1), orange
tokens in the description that also appear in the title by multiplying    line indicates the model that was trained using title and top five
it with a factor of w i . The optimal wieght w i was found by using       queries that led to clicks on the SKU (Model 2) while, gray line is
grid search method. In our case, it was found to be 2.                    the precision@k for the baseline. We found that both Model 1 and
                                                                          Model 2 outperform the baseline. Model 2 was better by 3.125% and
For the third approach (filtered), we sum up the tfidf score of only      12.08% over Model 1 for precision@2 and precision@3 respectively.
those tokens in description that appear in the title.                     We believe the reason for Model 2 to outperform Model 1 is that
                                                                          queries provide additional context regarding which sentences are
Figure 5 shows precision@k for the three models. As we can see            important and captures key information of the product, which is
from the graph, the weighted approach has highest precision@k,            key to summarization.
this shows that the words present in title does indicate which sen-
tences are of relatively higher importance. However, it is also not a     This demonstrates that words present in title capture key informa-
right strategy to exclude all the other words, as demonstrated by the     tion of the product being sold. Title is provided by the merchant,
higher precision@k of unweighted model over filtered model. Thus,         so it provides merchant’s point of view regarding what aspect of
in summary, boosting words present in title while also retaining          the product is important. Whereas, words present in user queries
other words for the computation of tfidf score of a sentence seems        indicate the attributes of product that the user cares about. So com-
to yield best result among all the baseline approaches.                   bining these two sources of information is a good way to infer
                                                                          relevant sentences of description from a search perspective. Also,
5   EVALUATION                                                            since not all SKUs (documents) have user clicks or may have com-
Our purpose of ranking is to find sentences that are relevant to the      paratively less engagement data associated with it, creating a model
product and contain attributes of the product that customers might        leveraging title and click through log to find relevant sentences
use in their search queries. This will improve results of full text       provides a way to generalize it to SKUs (documents) that have little
match as well as query understanding, since it depends on matching        or no engagement data.
user’s intent with attributes extracted from the SKU. To analyze
this, we reviewed 100 SKUs randomly sampled from the grocery              We provide one instance from our evaluation set as an example.
category and manually labeled the sentences based on whether they         Figure 7 shows a sample product description that is fed to the model.
were relevant or not. We evaluated the model using precision@k,           Figures 8 and 9 show output of Model 1 and Model 2 respectively.
with k as 1,2 and 3.
                                                                          Sentences that have keyword stuffing tend to be grammatically
Based on the evaluation of the three tfidf based models as described      incorrect, structurally dissimilar to the title and generally longer.
in the section 4, we chose weighted Model (the second approach)           Thus, the intuition is that summarization models described above
SIGIR 2019 eCom, July 2019, Paris, France                                                               Prateek Verma, Aliasgar Kutiyanawala, and Ke Shen


                                                                                        [2] Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, and Hui Jiang. 2016.
                                                                                            Distraction-based neural networks for document summarization. arXiv preprint
                                                                                            arXiv:1610.08462 (2016).
                                                                                        [3] Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting
                                                                                            sentences and words. arXiv preprint arXiv:1603.07252 (2016).
                                                                                        [4] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu,
                                                                                            and Pavel Kuksa. 2011. Natural language processing (almost) from scratch.
                                                                                            Journal of machine learning research 12, Aug (2011), 2493–2537.
                                                                                        [5] Günes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality
                                                                                            as salience in text summarization. Journal of artificial intelligence research 22
       Figure 7: Input to the model: product description                                    (2004), 457–479.
                                                                                        [6] Clinton Gormley and Zachary Tong. 2015. Elasticsearch: The Definitive Guide: A
                                                                                            Distributed Real-Time Search and Analytics Engine. " O’Reilly Media, Inc.".
                                                                                        [7] Trey Grainger, Timothy Potter, and Yonik Seeley. 2014. Solr in action. Manning
                                                                                            Cherry Hill.
                                                                                        [8] Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional
                                                                                            neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014).
                                                                                        [9] Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv
                                                                                            preprint arXiv:1408.5882 (2014).
                                                                                       [10] Aliasgar Kutiyanawala, Prateek Verma, and Zheng Yan. 2018. Towards a sim-
                                                                                            plified ontology for better e-commerce search. CoRR abs/1807.02039 (2018).
                                                                                            arXiv:1807.02039 http://arxiv.org/abs/1807.02039
                   Figure 8: Model 1’s output (title)                                  [11] Jiwei Li, Minh-Thang Luong, and Dan Jurafsky. 2015. A hierarchical neural
                                                                                            autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057
                                                                                            (2015).
                                                                                       [12] Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky.
                                                                                            2016. Deep reinforcement learning for dialogue generation. arXiv preprint
                                                                                            arXiv:1606.01541 (2016).
                                                                                       [13] Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries.
                                                                                            Text Summarization Branches Out (2004).
                                                                                       [14] Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent
                                                                                            neural network based sequence model for extractive summarization of documents.
             Figure 9: Model 2’s output (title + query)                                     In Thirty-First AAAI Conference on Artificial Intelligence.
                                                                                       [15] Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. 2016. Ab-
                                                                                            stractive text summarization using sequence-to-sequence rnns and beyond. arXiv
                                                                                            preprint arXiv:1602.06023 (2016).
would rank such sentences lower.                                                       [16] Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Ranking sentences
                                                                                            for extractive summarization with reinforcement learning. arXiv preprint
                                                                                            arXiv:1802.08636 (2018).
                                                                                       [17] Shashi Narayan, Nikos Papasarantopoulos, Shay B Cohen, and Mirella Lapata.
6    CONCLUSION AND FUTURE WORK                                                             2017. Neural extractive summarization with side information. arXiv preprint
                                                                                            arXiv:1704.04530 (2017).
We implemented a framework to rank sentences from product de-                          [18] Ani Nenkova, Lucy Vanderwende, and Kathleen McKeown. 2006. A composi-
scription & bullets based on Extractive Summarization that uses                             tional context sensitive multi-document summarizer: exploring the factors that
                                                                                            influence summarization. In Proceedings of the 29th annual international ACM
reinforcement learning to optimize ROUGE and maximum like-                                  SIGIR conference on Research and development in information retrieval. ACM,
lihood cross entropy, thus enabling the model to learn rank the                             573–580.
                                                                                       [19] Joel Larocca Neto, Alexandre D Santos, Celso AA Kaestner, Neto Alexandre, D
sentences. We compare two models, one that uses just the title                              Santos, et al. 2000. Document clustering and text summarization. (2000).
and the other that uses queries from click through log along with                      [20] Romain Paulus, Caiming Xiong, and Richard Socher. 2017. A deep reinforced
the title. We show that these two models have higher precision in                           model for abstractive summarization. arXiv preprint arXiv:1705.04304 (2017).
                                                                                       [21] Dragomir R Radev, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda
finding relevant sentences than the baseline which is a tf-idf based                        Celebi, Stanko Dimitrov, Elliott Drabek, Ali Hakim, Wai Lam, Danyu Liu, et al.
method to select top sentences. Typically, in search engines, such                          2004. MEAD-a platform for multidocument multilingual text summarization.
fields (product descriptions, bullets etc.) are either ignored or given                     (2004).
                                                                                       [22] Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba.
a very low weight compared to fields like product title. Using this                         2015. Sequence level training with recurrent neural networks. arXiv preprint
framework that ranks the sentences, we can assign a higher weight                           arXiv:1511.06732 (2015).
                                                                                       [23] Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention
to ranked set of sentences. In addition, top N sentences from ranked                        model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685
set can also be used to extract attributes and help build the ontology.                     (2015).
                                                                                       [24] Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point:
                                                                                            Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368
Our future plan involves, 1) measuring the precision with two                               (2017).
separate models, one for description and one for bullets, as they                      [25] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning
tend to have different grammatical structure 2) investigate the effect                      with neural networks. In Advances in neural information processing systems. 3104–
                                                                                            3112.
of query length on the ranking of sentences 3) have an algorithmic                     [26] Ryen W White, Ian Ruthven, and Joemon M Jose. 2002. Finding relevant docu-
method to decide on the cut off (Top N) for selecting top sentences                         ments using top ranking sentences: an evaluation of two alternative schemes. In
                                                                                            Proceedings of the 25th annual international ACM SIGIR conference on Research
from each SKU. This is because, as length of the content in each                            and development in information retrieval. ACM, 57–64.
SKU varies, number of relevant sentences could be different.                           [27] Ronald J Williams. 1992. Simple statistical gradient-following algorithms for
                                                                                            connectionist reinforcement learning. Machine learning 8, 3-4 (1992), 229–256.
                                                                                       [28] Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srini-
REFERENCES                                                                                  vasan, and Dragomir Radev. 2017. Graph-based neural multi-document summa-
 [1] Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D           rization. arXiv preprint arXiv:1706.06681 (2017).
     Trippe, Juan B Gutierrez, and Krys Kochut. 2017. Text summarization techniques:
     a brief survey. arXiv preprint arXiv:1707.02268 (2017).