Ranking sentences from product description & bullets for better search Prateek Verma Aliasgar Kutiyanawala Ke Shen Jet.com and Walmart Labs Jet.com and Walmart Labs Jet.com and Walmart Labs Hoboken, New Jersey Hoboken, New Jersey Hoboken, New Jersey prateek.verma@jet.com aliasgar@jet.com ke.shen@jet.com ABSTRACT Products in an ecommerce catalog contain information-rich fields like description and bullets that can be useful to extract entities (attributes) using NER based systems. However, these fields are often verbose and contain lot of information that is not relevant from a search perspective. Treating each sentence within these fields equally can lead to poor full text match and introduce prob- lems in extracting attributes to develop ontologies, semantic search etc. To address this issue, we describe two methods based on ex- tractive summarization with reinforcement learning by leveraging information in product titles and search click through logs to rank sentences from bullets, description, etc. Finally, we compare the precision of these two models. Figure 1: Sample SKU Image and Bullets CCS CONCEPTS • Information systems → Information extraction; Summa- rization. present in the sentence and how well it describes the item, we con- sider third bullet as more relevant to the item than the second bullet KEYWORDS from search perspective. Tokens highlighted in Figure 1 with red Search and Ranking, Information Retrieval, Extractive Summariza- and green color denote irrelevant and relevant features respectively tion, Reinforcement Learning, E-Commerce, Information Extraction for the SKU. ACM Reference Format: Prateek Verma, Aliasgar Kutiyanawala, and Ke Shen. 2019. Ranking sen- Typically, this problem tends to appear in fields like product descrip- tences from product description & bullets for better search. In Proceedings tion and bullets which are often verbose and contain information of the SIGIR 2019 Workshop on eCommerce (SIGIR 2019 eCom), 6 pages. about the SKU (stock keeping unit, a term used to describe item sold on the site) that is not pertinent to the item. Circuitous de- 1 INTRODUCTION scriptions of the product and Keyword stuffing are real concern Many search engine frameworks like Solr [7] and ElasticSearch [6] in ecommerce. Keyword stuffing refers to the practice of loading treat each sentence within a field in the document equally and this product data with keywords that may not be relevant with the item can lead to irrelevant documents present in the recall set. Consider being sold. Figure 2, which is a description of a SKU, illustrates this. Figure 1 which shows a sample item and some information associ- ated in bullet form from an ecommerce website. The second bullet Product descriptions also tend to contain negations. That is, they contains the terms "soups", "casseroles" and "meat" because of describe what the product is NOT and what it is not suitable for. which, the item (mushroom) will be present in the recall set for the These kind of sentences are technically legitimate but poses a chal- search queries containing tokens like "soups" and "casseroles" lenge for search engines and have the effect of returning misleading due to full text match, leading to poor search relevancy. Relevant or irrelevant results. features for this SKU can be thought of as attributes that could be used in a search query to find this product. Thus, " gluten free" A naive solution is to ignore these fields completely for search. and "non-GMO" are considered relevant. Based on the attributes While this may improve precision, it would be at the cost of recall, as relevant information might be lost. Such relevancy problems are mitigated by having semantic search using methods like query Copyright © 2019 by the paper’s authors. Copying permitted for private and academic understanding. However, they require SKUs to have relevant at- purposes. In: J. Degenhardt, S. Kallumadi, U. Porwal, A. Trotman (eds.): tributes (atomic phrases that provide more information about an Proceedings of the SIGIR 2019 eCom workshop, July 2019, Paris, France, published at item [10]) present in them to match it with user’s intent. Thus, http://ceur-ws.org attribute extraction from the catalog data is often done in order to enrich SKUs (documents) with relevant attributes. SIGIR 2019 eCom, July 2019, Paris, France Prateek Verma, Aliasgar Kutiyanawala, and Ke Shen Extractive Summarization was traditionally done using hand en- gineered features, such as sentence position, length [21], words present in the sentence, their part of speech tags, frequency etc [18]. However, with the recent success of encoder-decoder model, it is being used in Extractive Summarization as well, such as [3] [17] and [16]. In [3], Cheng et al. developed a framework composed of hierarchical document encoder and attention based extractor for ex- tractive summarization. In [17], Narayan et al. used the hierarchical encoder and attention based decoder to leverage side informations like title, image caption etc. and in [16] they introduced a new ob- jective function based on ROUGE and used reinforcement learning to optimize it. Figure 2: Sample SKU Description In this paper, we try to rank sentences using summarization tech- niques for the purpose of improving search relevancy. There hasn’t been lot of work done in this area. One of the work that is aligned In this paper we describe a method to rank sentences based on if with our objective is from Ryen et. al [26] published in 2002. They they are relevant from search perspective, and select top K sen- use statistical measures like frequency of query terms present in the tences for search from these fields. Top K ranked set of sentences sentence to rank them, and recommend user documents from the can lead to better full text match and can also help in extracting recall set by presenting them with ranked set of sentences for web attributes for developing the ontology for semantic search [10] as search. However, our work focuses on ecommerce setting where higher ranked sentences would have larger probability of attributes we leverage Reinforcement Learning paradigm to rank sentences correctly describing the product. In our experiment, we limit K with the purpose of improving search by affecting recall/precision. to 3. Thus, given a description of length greater than three, we always pick top three sentences generated by the model as our final 3 PROBLEM FORMULATION summary. Our objective is to rank sentences in product description and bullets Our contribution in this paper is, we demonstrate how Extractive from a search perspective. Search perspective means that when we Summarization can be used to rank sentences present in product extract attributes from sentences, they are relevant to the item and description and bullets using product title and user queries obtained are likely to be used in a search query for that item. Methods like from click through log. One of the benefits of this method is, cost query understanding can benefit from ranked sentences as they use of obtaining training data is cheap and the model can be run on attributes in SKU to match with the user’s intent. Higher ranked items that have little or no click data associated with it. We also sentences are more likely to contain relevant attribute than a lower provide comparison of the two models by measuring precision@k ranked sentences. Having a set of top ranked sentences would also of relevant sentences in the summary. help in full text match by avoiding queries to match with irrelevant sentences. We use Extractive Summarization to achieve this. Our work is based on [16] which treats summarization task as a ranking 2 RELATED WORK problem and training is done by optimizing combination of ROUGE metric and cross entropy using reinforcement learning (described Summarization is the process of shortening a text document in order in 3.2). ROUGE stands for Recall-Oriented Understudy for Gisting to create a summary while retaining major points of the original doc- Evaluation. It is a metric to compare automatically generated sum- ument. There are two kinds of summarization techniques: Abstrac- mary with the reference summary. ROUGE makes use of the count tive and Extractive summarization. Abstractive summarization in- of overlapping units such as N-gram between the two summaries volves using internal semantic representation and natural language to measure the quality of system generated summary [13]. Here we generation techniques to create the summary [2] [23], [24]. Ex- specifically use F1 score of three ROUGE scores mentioned below: tractive summarization involves selecting existing subset of words, phrases and sentences in the original text to generate the sum- mary [5], [14], [28]. • ROUGE-1: refers to the overlap of 1-gram between candidate summary and the reference summary (in our case title and Recently, a lot of work has been done on Abstractive Summarization queries) using attentional encoder-decoder model that was proposed by • ROUGE-2: refers to the overlap of bi-gram Sutskever et. al in [25]. In [15], Nallapati et al. modeled abstrac- • ROUGE-L: measures Longest Common Subsequence based tive summarization using Attentional Encoder Decoder Recurrent statistics to compute similarity between the two summaries Neural Networks. While in [20], Paulus et. al introduced a new objective function that combined cross entropy loss with rewards We use ROUGE because it is well aligned with our objective of from policy gradient reinforcement learning which improved state finding relevant sentences from SKU description and bullets that is of the art in abstractive summarization. similar to the title and user engagement data (queries). It is the eval- uation metric used in most summarization system, and training the Ranking sentences from product description & bullets for better search SIGIR 2019 eCom, July 2019, Paris, France model on a combination of ROUGE and cross entropy is shown to be demonstrated in previous work [25], [11], [17]. superior than using just cross entropy [16]. REINFORCE algorithm is shown to improve sequence to sequence based text rewriting Finally, Sentence extracter sequentially labels each sentence as 1 or 0 systems by optimizing non-differentiable objective function like depending upon if the sentence is relevant or not. It is implemented ROUGE [22] [12], so we use reinforcement learning to optimize our using RNN with LSTM cells and a softmax layer. At time ti , it makes reward function. a binary prediction conditioned on the document representation and previously labelled sentences. This lets it identify locally and We use title and queries obtained from click through log as part globally important sentences. Sentences are then ranked by the of the target summary. Title is one of the key fields in ecommerce score p(yi = 1|si , D, θ ). Here si is i th sentence, D is the document, catalog provided by the merchant, it captures essential informa- θ is the model parameter and p(yi = 1|si , D, θ ) is the probability tion about the item and queries can be thought of as keywords of sentence si being included in the summary. We learn to rank users think are relevant attributes for the product. The intuition by training the network in a reinforcement learning framework is, having them in the target summary would allow the model to optimizing ROUGE. capture important sentences present in the description and bullets. We create two models, one that uses just the title as target summary We use a combination of maximum likelihood cross entropy loss and and the second model that uses top five queries that led to clicks rewards from policy gradient reinforcement learning as objective on the item, along with the title as target summary. function to globally optimize ROUGE. This lets the model optimize the evaluation metric directly and makes it better at discrimating Finally, we choose top K sentences as determined by the model as sentences i.e it ranks the sentence higher if it appears often in the our final summary. Since, ecommerce product description tend to summary. be short and less repetitive, the issue of repetition and diversity in not a concern in our summarization task. 3.2 Policy Learning Reinforcement Learning is an area of machine learning where a software agent learns to take actions in an environment to maxi- 3.1 Network Architecture mize cumulative reward. It differs from supervised learning in the way that labelled input/output pairs need not be provided nor are Figure 3 depicts network architecture of the extractive summarizer. sub-optimal actions need to be explicity corrected. Rather, the focus It aims to extract sentences {s 1 ..sm } from a document D composed is on the balance between exploration and exploitation. Exploitation of sentences {s 1 ..sn } where n > m and labels them 1 or 0 based is the act of preferring an action that it has tried in the past and was on if they should be included in the summary or not. It learns found to be effective, whereas exploration is the act of discovering to assign a score p(yi |si , D, θ ) to each sentence which is directly such actions, i.e. trying out actions that it has not selected before. proportional to its relevance within the summary. Here, θ denotes the model parameter, si denotes the i t h sentence and D represents We conceptualize the summarization model in a reinforcement the document. Summary is chosen by selecting the sentences with learning paradigm. The model can be thought of as an agent inter- top p(yi |si , D, θ ) score. Our network and the objective function is acting with the environment, which consists of documents. The based on the paper [16]. We choose a sequence to sequence network agent reads the document D and assigns a score to each sentence which is composed of three main components: sentence encoder, si ∈ D using the policy p(yi |si , D, θ ). We then rank and get the document encoder and sentence extractor. sampled sentences as the summary. The agent is then given a re- ward based on how close the generated summary is with the gold These components are described in detail below: standard summary. We use F1 score of ROUGE-1, ROUGE-2, and ROUGE-L as the reward r . In our case, gold standard summary is Sentence encoder is composed of convolutional encoder which en- the title and user queries. Agent is then updated based on the re- codes a sentence into a continuous representation and is shown ward using the REINFORCE algorithm [27]. REINFORCE algorithm to capture salient features [4], [9], [8]. The encoding is performed minimizes negative expected reward: using kernel filter K of width h over a window of h words present in the sentence s. This is applied to each possible window of words L(θ ) = − Eŷ∼pθ [r (ŷ)] in the sentence s to produce a feature map f ∈ R k −h+1 , where k is the length of the sentence. Then max pooling is performed over Here, pθ stands for p(y|D, θ ), where θ is the model parameter, D is time on the feature maps and max value is taken corresponding to the document and r is the reward. this particular filter K. Specifically, we use filter of size 2 and 4. REINFORCE algorithm is based on the fact that the expected reward Document encoder: The output of sentence encoder is fed to doc- function of a non differentiable function can be computed as: ument encoder. It composes sequence of sentences to obtain a document representation. We use LSTM to achieve this. Given a ▽L(θ ) = − Eŷ∼pθ [r (ŷ) ▽ log p(ŷ|D, θ )] document D and sequence of sentence (s 1 . . . sn ) we feed sentences Calculating expected gradient in the above expression can be ex- in reverse order to the model. This approach allows the model pensive as each document can have very large number of candidate to learn that the top sentences are more important and has been summaries. It can be approximated by taking single sample ŷ from SIGIR 2019 eCom, July 2019, Paris, France Prateek Verma, Aliasgar Kutiyanawala, and Ke Shen Figure 3: Network architecture Figure 4: Retrieval by matching query understanding with SKU understanding pθ for each training example in a batch, following which the above Figure 5: Precision @ k for the baseline expression gets simplified to: generating the reference summary. For input, we use product de- ▽L(θ ) ≈ −r (ŷ) ▽ log p(ŷ|D, θ ) scriptions and bullets for both the models. −r (ŷ) ni=1 ▽ log p(yˆi |si , D, θ ) Í ≈ We preprocess the title, decription and queries before passing them Since the REINFORCE algorithm starts with a random policy, and to the model. Preprocessing step consists of sentence segmentation, because our task can involve large number of candidate summaries tokenization, conversion of tokens into vocabulary id, truncation for the document, training the model can be time consuming. So, and padding to a fixed length. We use SKUs from grocery category we limit the search space ŷ with smaller number of high probability of our catalog to evaluate the models. For Model 1 we used all the samples Ŷ consisting of top k extracts. The way we choose these SKUs from the grocery category and for Model 2 we used a subset top k extracts is, we select p sentences which have highest ROUGE of SKUs from the category which had engagement above a certain scores on its own and then generate all possible set of combination threshold. Though Model 2 had fewer training data, it was richer using these p sentences with the constraint that maximum length of since it had queries (top 5) associated with each SKU as part of the the extract can be m. We rank these against the gold summary using summary. One advantage of both methods is, it requires almost no F1 score by taking mean of ROUGE-1, ROUGE-2 and ROUGE-L. We manual effort to get the training data, thus is very cheap. Figure 4 choose top k of these ranked summaries as Ŷ. During training, we describes how the two models are set up for training. sample ŷ from Ŷ instead of p(ŷ|θ, D). Since our objective is to have better full text match or attributes 3.3 Input Data for model from the ranked set of sentences, each sentence can be independent We create two summarization models, one with title as its target of each other. This insight is well aligned with the framework of summary (Model 1) and the other with title plus top five queries reinforcement learning based extractive summarization that opti- for which the product was clicked as the target summary (Model mizes ROUGE. 2). Title and each query are treated as independent sentences when Ranking sentences from product description & bullets for better search SIGIR 2019 eCom, July 2019, Paris, France 4 BASELINE MODEL Tfidf is one of the commonly used frequency driven approches for weighting terms to measure importance of a sentence for extractive summarization [1], [19]. It measures the importance of words and identifies very common words in the documents by giving low weights to words appearing in most documents. The wieght of each word is computed by the formula: tfidf(t, d, D) = tf(t, d) · idf(t, D) N idf(t, D) = |{d ∈ D : t ∈ d}| Here, tf(t, d) is the count of the term t in the document d. idf(t, D) is the inverse document frequency. N is the total num- ber of documents in the corpus. |{d ∈ D : t ∈ d}| is the number of documents where the term t appears. If the term is not present in the corpus, it will lead to division by zero. To avoid this, it is a common practice to adjust the denominator to 1 + |{d ∈ D : t ∈ d}|. Figure 6: Precision @ k for Model 1(Title only), Model 2 (Ti- tle and queries) and the baseline For baseline, we use tfidf based model. Our baseline consists of three aproaches that utilizes tfidf to score the sentences to select top K. For the first approach, we sum up (unweighted) tfidf score of the words to measure importance of a sentence and then select as our baseline, as it has the best performance. top K as the summary. Here, tf is computed at the sentence level and idf is across all the SKUs (documents). Figure 6 shows precision@k for the two sequence to sequence based model and the baseline. Blue line indicates the model that For the second approach (weighted), we weigh the tfidf score of was trained using just the title as target summary (Model 1), orange tokens in the description that also appear in the title by multiplying line indicates the model that was trained using title and top five it with a factor of w i . The optimal wieght w i was found by using queries that led to clicks on the SKU (Model 2) while, gray line is grid search method. In our case, it was found to be 2. the precision@k for the baseline. We found that both Model 1 and Model 2 outperform the baseline. Model 2 was better by 3.125% and For the third approach (filtered), we sum up the tfidf score of only 12.08% over Model 1 for precision@2 and precision@3 respectively. those tokens in description that appear in the title. We believe the reason for Model 2 to outperform Model 1 is that queries provide additional context regarding which sentences are Figure 5 shows precision@k for the three models. As we can see important and captures key information of the product, which is from the graph, the weighted approach has highest precision@k, key to summarization. this shows that the words present in title does indicate which sen- tences are of relatively higher importance. However, it is also not a This demonstrates that words present in title capture key informa- right strategy to exclude all the other words, as demonstrated by the tion of the product being sold. Title is provided by the merchant, higher precision@k of unweighted model over filtered model. Thus, so it provides merchant’s point of view regarding what aspect of in summary, boosting words present in title while also retaining the product is important. Whereas, words present in user queries other words for the computation of tfidf score of a sentence seems indicate the attributes of product that the user cares about. So com- to yield best result among all the baseline approaches. bining these two sources of information is a good way to infer relevant sentences of description from a search perspective. Also, 5 EVALUATION since not all SKUs (documents) have user clicks or may have com- Our purpose of ranking is to find sentences that are relevant to the paratively less engagement data associated with it, creating a model product and contain attributes of the product that customers might leveraging title and click through log to find relevant sentences use in their search queries. This will improve results of full text provides a way to generalize it to SKUs (documents) that have little match as well as query understanding, since it depends on matching or no engagement data. user’s intent with attributes extracted from the SKU. To analyze this, we reviewed 100 SKUs randomly sampled from the grocery We provide one instance from our evaluation set as an example. category and manually labeled the sentences based on whether they Figure 7 shows a sample product description that is fed to the model. were relevant or not. We evaluated the model using precision@k, Figures 8 and 9 show output of Model 1 and Model 2 respectively. with k as 1,2 and 3. Sentences that have keyword stuffing tend to be grammatically Based on the evaluation of the three tfidf based models as described incorrect, structurally dissimilar to the title and generally longer. in the section 4, we chose weighted Model (the second approach) Thus, the intuition is that summarization models described above SIGIR 2019 eCom, July 2019, Paris, France Prateek Verma, Aliasgar Kutiyanawala, and Ke Shen [2] Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, and Hui Jiang. 2016. Distraction-based neural networks for document summarization. arXiv preprint arXiv:1610.08462 (2016). [3] Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252 (2016). [4] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of machine learning research 12, Aug (2011), 2493–2537. [5] Günes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research 22 Figure 7: Input to the model: product description (2004), 457–479. [6] Clinton Gormley and Zachary Tong. 2015. Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. " O’Reilly Media, Inc.". [7] Trey Grainger, Timothy Potter, and Yonik Seeley. 2014. Solr in action. Manning Cherry Hill. [8] Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014). [9] Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014). [10] Aliasgar Kutiyanawala, Prateek Verma, and Zheng Yan. 2018. Towards a sim- plified ontology for better e-commerce search. CoRR abs/1807.02039 (2018). arXiv:1807.02039 http://arxiv.org/abs/1807.02039 Figure 8: Model 1’s output (title) [11] Jiwei Li, Minh-Thang Luong, and Dan Jurafsky. 2015. A hierarchical neural autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057 (2015). [12] Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541 (2016). [13] Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out (2004). [14] Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. Figure 9: Model 2’s output (title + query) In Thirty-First AAAI Conference on Artificial Intelligence. [15] Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. 2016. Ab- stractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023 (2016). would rank such sentences lower. [16] Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636 (2018). [17] Shashi Narayan, Nikos Papasarantopoulos, Shay B Cohen, and Mirella Lapata. 6 CONCLUSION AND FUTURE WORK 2017. Neural extractive summarization with side information. arXiv preprint arXiv:1704.04530 (2017). We implemented a framework to rank sentences from product de- [18] Ani Nenkova, Lucy Vanderwende, and Kathleen McKeown. 2006. A composi- scription & bullets based on Extractive Summarization that uses tional context sensitive multi-document summarizer: exploring the factors that influence summarization. In Proceedings of the 29th annual international ACM reinforcement learning to optimize ROUGE and maximum like- SIGIR conference on Research and development in information retrieval. ACM, lihood cross entropy, thus enabling the model to learn rank the 573–580. [19] Joel Larocca Neto, Alexandre D Santos, Celso AA Kaestner, Neto Alexandre, D sentences. We compare two models, one that uses just the title Santos, et al. 2000. Document clustering and text summarization. (2000). and the other that uses queries from click through log along with [20] Romain Paulus, Caiming Xiong, and Richard Socher. 2017. A deep reinforced the title. We show that these two models have higher precision in model for abstractive summarization. arXiv preprint arXiv:1705.04304 (2017). [21] Dragomir R Radev, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda finding relevant sentences than the baseline which is a tf-idf based Celebi, Stanko Dimitrov, Elliott Drabek, Ali Hakim, Wai Lam, Danyu Liu, et al. method to select top sentences. Typically, in search engines, such 2004. MEAD-a platform for multidocument multilingual text summarization. fields (product descriptions, bullets etc.) are either ignored or given (2004). [22] Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. a very low weight compared to fields like product title. Using this 2015. Sequence level training with recurrent neural networks. arXiv preprint framework that ranks the sentences, we can assign a higher weight arXiv:1511.06732 (2015). [23] Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention to ranked set of sentences. In addition, top N sentences from ranked model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 set can also be used to extract attributes and help build the ontology. (2015). [24] Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 Our future plan involves, 1) measuring the precision with two (2017). separate models, one for description and one for bullets, as they [25] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning tend to have different grammatical structure 2) investigate the effect with neural networks. In Advances in neural information processing systems. 3104– 3112. of query length on the ranking of sentences 3) have an algorithmic [26] Ryen W White, Ian Ruthven, and Joemon M Jose. 2002. Finding relevant docu- method to decide on the cut off (Top N) for selecting top sentences ments using top ranking sentences: an evaluation of two alternative schemes. In Proceedings of the 25th annual international ACM SIGIR conference on Research from each SKU. This is because, as length of the content in each and development in information retrieval. ACM, 57–64. SKU varies, number of relevant sentences could be different. [27] Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3-4 (1992), 229–256. [28] Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srini- REFERENCES vasan, and Dragomir Radev. 2017. Graph-based neural multi-document summa- [1] Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D rization. arXiv preprint arXiv:1706.06681 (2017). Trippe, Juan B Gutierrez, and Krys Kochut. 2017. Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017).