=Paper= {{Paper |id=Vol-1986/SML17_paper_3 |storemode=property |title=Topical Sentence Embedding for Query Focused Document Summarization |pdfUrl=https://ceur-ws.org/Vol-1986/SML17_paper_3.pdf |volume=Vol-1986 |authors=Yang Gao,Linjing Wei,Heyan Huang,Qian Liu |dblpUrl=https://dblp.org/rec/conf/ijcai/GaoWHL17 }} ==Topical Sentence Embedding for Query Focused Document Summarization== https://ceur-ws.org/Vol-1986/SML17_paper_3.pdf
       Topical Sentence Embedding for Query Focused Document
                           Summarization

                     Yang Gao                                                          Linjing Wei
      Beijing Institute of Technology (BIT);                           BIT; Beijing Advanced Innovation Center for
          Beijing Engineering Research                                Imaging Technology, Capital Normal University
        Center of High Volume Language                                            weilinjing@bit.edu.cn
           Information Processing and
         Cloud Computing Applications
                gyang@bit.edu.cn
                   Heyan Huang                                                        Qian Liu
       BIT; Beijing Engineering Research                             BIT; Beijing Advanced Innovation Center for
       Center of High Volume Language                               Imaging Technology, Capital Normal University
           Information Processing and                                          liuqian2013@bit.edu.cn
         Cloud Computing Applications
                hhy63@bit.edu.cn



                                                                          1 Introduction
                                                                          Text summarization is an important task in natural language
                            Abstract
                                                                          processing, which is expected to understand the meaning
                                                                          of the documents and then produce a coherent, informative
                                                                          but brief summarization of the original document with in a
     Distributed vector representation for sentences                      limited length. The main approaches of text summarization
     have been utilized in summarization area, since                      can be divided into two categories: extractive and genera-
     it simplifies semantic cosine calculation between                    tive. Most extractive summarization systems extract parts
     sentence to sentence as well as sentence to doc-                     of the document (a few sentences or a few words) that are
     ument. Many extension works have been done                           deemed interesting by some metric (i.e., inverse-document
     to incorporate latent topics and word embedding,                     frequency) and join them to form a summary. Conven-
     however, few of them assign sentences with ex-                       tionally, selecting sentences rely on feature engineering ap-
     plicit topics. Besides, much sentence embedding                      proach in terms of extracting surface feature statistics (i.e.,
     framework follows the same spirit of prediction                      TFIDF cosine similarity) to compare with query and docu-
     task about a word in the sentence, which omits                       ment representation.
     the sentence-to-sentence coherence. To address
                                                                              Recently, distributed vector semantic representation for
     these problems, we proposed a novel sentence
                                                                          words and sentences have achieved overwhelming success
     embedding framework to collaborate the current
                                                                          in summarization area [KMTD14, KNY15, YP15], since it
     sentence representation, word-based content and
                                                                          converts high-dimensional and sparse linguistic data into a
     topic assignment of the sentence to predict the
                                                                          controllable and dense dimension of semantic vectors. It
     next sentence representation. The experiments on
                                                                          becomes more straightforward for generic summarization
     summarization tasks show our model outperforms
                                                                          to compute similarity (or relevance to some extents) and fa-
     state-of-the-art methods.
                                                                          cilitates semantic calculation. Delighted by the successful
                                                                          word2vec model [MCCD13, MSC+ 13], Paragraph Vector
          c by the paper’s authors. Copying permitted for private and
Copyright ⃝
                                                                          (PV) [LM14] model (i.e., the paragraph can be sentence,
academic purposes.                                                        paragraph or document) also contributes to predict the next
 In:A.
In: Proceedings
       Editor, B. of IJCAI Workshop
                  Coeditor           on Semantic
                           (eds.): Proceedings     Machine
                                                of the XYZ Learning
                                                              Workshop,   word given sequential word context and the current para-
Location,(SML  2017),
          Country,     Aug 19-25 2017,published
                   DD-MMM-YYYY,        Melbourne,   Australia.
                                                at http://ceur-ws.org     graph representation. It inherits the semantic representa-
tion and its efficiency, further captures the word order for    Word2Vec method contains two models: CBOW and Skip-
sentence representation. Moreover, the sentence vector can      gram model. CBOW aims at predicting the target word us-
benefit summaries since it directly characterises the rele-     ing the context words in the sliding window. The objective
vance between queries and candidate sentences.                  of CBOW is to maximize the average log probability,
   However, most of the sentence embedding models                                      D
[LM14, YP15] are trained as the prediction task about a                             1 !
                                                                             L=           log P r(wi | C; W ).            (1)
word in the sentence. In these models, sentences are inde-                          D i=1
pendently learnt via their local word content but often omit    where, wi is the target word, C is the word contexts and
the coherent relationship between sentences. Summariza-         W is is word matrix, D is the corpus size. Different from
tion system focuses more on comprehensive attributes of         CBOW, Skip-gram aims to predict context words given the
sentences, such as sentence coherence, sentence topic, sen-     target word. We ignore the details of this approach here.
tence representation and so on. Utilizing the conventional      Paragraph Vector (PV):
sentence vectors may neglect the coherence between candi-       It [LM14] is an unsupervised algorithm that learns fixed-
date sentences as well as sentence topics. Although, mod-       length semantic representations of variable-length of texts,
els incorporating topic and word embedding models, such         which follows the same predicting task with Word2Vec.
as TWE [LLCS15], have achieved successful results in            The only change is the concatenate vector constructed from
some NLP tasks, at sentence level, very few work focuses        W and S, where S is sentence matrix instead of individual
on representing sentences with topics. For example, given       W . The PV model is a strong alternative sentence model,
a user’s query that emphasises on possible plans, progress      and it is widely applied in learning representations for se-
and problems with hydroelectric projects. The query con-        quential data.
tain complex topics like “plans”, “progress”, “problems”            Work on extractive summarization spans a large range
and “hydroelectric projects”. Nevertheless, normal vector-      of approaches. Most existing systems [Gal06, YGVS07]
based models can retrieve those relevant sentences that only    use rank model to select the sentences with highest scores
emphasis on one or two aspects of the query. It is problem-     to form the summarization. However, multi-document texts
atic to capture all the aspects of the query .                  often describe one central topic and some sub-topics, which
   In order to tackle the problems, we propose a novel sen-     cannot be described only depending on ranking model.
tence embedding learning framework to enhance sentence          Then we focus on how to rank the sentences and collab-
representation by incorporating multi-topic semantics for       orate topic coverage.
summarization task, called Topical Sentence Embedding               A variety of features were defined to measure the
(TSE) model. Gaussian distributions are utilised to model       relevance, including TF-IDF cosine similarity [NVM06,
mixtured centralities of the embedding space, which cap-        YGVS07], cue words [LH00], topic theme [HL05], and
ture a prior preference of topic for sentence prediction. In    WordNet similarity [OLLL11], etc. However, these fea-
addition, instead of training to predict words in the docu-     tures usually suffer from lacking of deep understand-
ment, our proposed model represents one sentence by pre-        ing semantics mechanism, which fail to meet the query
dicting the next sentence via jointly training the words in     need. Since Mikolov et al. [MCCD13] proposed the
the current sentence and the topic of the sentence.             efficient word embedding method, there is a surge of
   The rest of this paper is organized as follows. Section      works [LM14, LLCS15] focusing on embedding models
2 summarizes the basic methods of embedding models and          for capturing the linguistic regularities. Embedding mod-
summarization systems. We then introduce a newly sum-           els [KMTD14, KNY15, YP15, CLW+ 15] for words and
marization framework in Section 3, especially in Section        sentences also have encouraged summarization tasks from
3.2, the novel TSE model is proposed. Section 4 reports         the perspective of semantic relevance computing, such as
the experimental results and corresponding analysis. Fi-        DocEmb and CNNLM. However, aforementioned methods
nally, we conclude the paper.                                   usually reward semantic similarity without considering of
                                                                topic coverage, which fail to meet the summary need.
                                                                    Topic-based methods have been proved their successes
2 Background and Related Work
                                                                for summarization. Parveen et al. [PRS15] proposed an ap-
We firstly introduce the Word2Vec and the PV model to in-       proach, which is based on a weighted graphical represen-
vestigate the basic framework of training embedding model       tation of documents obtained by topic modeling. [GNJ07]
for words and sentences.                                        measured topic concentration in a direct manner: a sen-
Word2Vec:                                                       tence was considered relevant to the query if it contained at
The basic assumption behind Word2Vec [MCCD13] is that           least one word from the query. While these work assume
the representation of co-occurred words have the similar        that documents related to the query only talk about one
representation in the semantic space. To this target, a slid-   topic. Tang et al. [TYC09] proposed a unified probabilistic
ing window is employed on the input text stream, where          approach to uncover query-oriented topics and four scor-
the central word is the target word and others are contexts.    ing methods to calculate the importance of each sentence in
the document collection. Wang et al. [WLZD08] propose                                to represent the probability distribution for sampling a vec-
a new multi-document summarization framework (SNMF)                                  tor x from the GMM.
based on sentence-level semantic analysis and symmetric                                  Subsequently, we can infer the posterior probability dis-
non-negative matrix factorization. The symmetric matrix                              tribution of topics. For each sentence s, the posterior dis-
factorization has been shown to be equivalent to normal-                             tribution of its topic is
ized spectral clustering and is used to group sentences into
                                                                                                            πz N (vec(s)|µz , Σz )
clusters. Futhermore, several approaches incorporate vec-                                      q(zs = k) = "K                                    (3)
tor representations with topics , such as NTM [CLL+ 15],                                                    k=1 N (vec(s)|µk , Σk )
TWE [LLCS15] and GMNTM [YCT15], have collaborated                                       Based on the distribution, the topic of sentence s
both benefits of semantic representation and classified top-                         can be vectorized as vec(Ts ) = [q(zs = 1), q(zs =
ics. This motivates us to investigate the cooperation models                         2), · · · , q(zs = K)].
for summarization system.
                                                                                     Generative Sentence Embedding
3 The Framework for Query-focused Sum-                                                  The assumption of the TSE is that sentences are coher-
  marization                                                                         ent and associated with their neighbours. Consequently,
                                                                                     we model one sentence as a prediction task based on se-
Extracting salient sentences is the main task in this study.                         mantic structure of the previous sentences. The semantic is
At sentence level, the sentence embedding and sentence                               represented by collaborating sentence topic, sentence rep-
ranking are utilised to enable sentence relevance to the user                        resentation and its content. The Negative Sampling (NEG)
queries and extract salient summaries.                                               method is applied in [MCCD13] which is an efficient ap-
                                                                                     proximation method. Therefore, we carry on the similar
3.1 The Proposed TSE Model                                                           estimation schema in our model.
Inheriting the superiority of the PV model that constructs a                         Definition 1. Label l!s : A label of sentence s# is 1 or 0. The
continuous semantic space, the novel architecture of learn-                          label of positive sample is 1, the label of negative samples
ing sentence representation, called TSE model, as shown in                           are 0. For ∀#
                                                                                                 s ∈ S,
the Figure 1.                                                                                                      $
                                                                                                           s         1, s# = s;
                                                                                                          l (#
                                                                                                             s) =                                 (4)
                                                                                                                     0, s# ̸= s;
                            s               s*
       classifier
                                1              0                                         Let Xs be a concatenation of given information of
                                                                                     current sentence for predicting the next sentence, s, s′
       concatenate
                                                                                     be the current sentence. Xs = vec(Ts′ ) ⊕ vec(s′ ) ⊕
                                                                                     vec(w1 )⊕, · · · , ⊕vec(wm ). We incorporate the vectors as
                                                                                     the input, which includes topics, sentence embedding, and
        w1           w2     w3 . . .       wn-1      wn            sÿ          Ts    its content of words.
                                     Context                                             Given the collection S, we show how to learn represen-
                                                                                     tation of sentences and topics. In this paper, we concentrate
        GMM           T1   T2         T3       ...   Tk-3   Tk-2        Tk-1    Tk
                                                                                     to exploit the latent relationship between sentences. Sub-
                                                                                     sequently, the target sentence s is predicted purely by the
                                                                                     information from previous sentence, namely Xs . So the
   Figure 1: The structure of the proposed TSE model                                 objective of TSE is to maximize the probability
                                                                                                       %          % %
Topic Vectorization by GMM                                                                      G=        g(s) =                p(u|Xs )       (5)
   Let K represent the number of topics, V is the size of                                            s∈S            s∈S u∈{s∪s− }
vector, and W represent word dictionary. S denotes the
sentence collection, in which s is one of the sentences. Let                         Instead of using softmax function as prediction proba-
vec(Ts ) be the topic vector of sentence s. The vectors of                           bility, we directly use its negative sampling approxima-
sentences and words are represented as vec(s) ∈ RV and                               tion. &The prediction objective function of sentence s is
vec(w) ∈ RV . πk ∈ R, µk ∈ RV , Σk ∈ RV ×V and                                       g(s)= s∈S p(u|Xs), and the probability function is rep-
"K                                                                                   resented as follows
  k=1 πk = 1 are denoted as mixture weights, means and
covariance matrices, respectively. The parameters of the                                                  $
                                                                                                             σ(XsT θu ), ls (#
                                                                                                                             u) = 1
GMM are collectively represented by λ = {πk , µk , Σk },                                       p(u|Xs ) =                                  (6)
where k = 1, · · · , K. Given the collection of parameters,                                                  1 − σ(Xs θ ), ls (#
                                                                                                                      T u
                                                                                                                                 u) = 0
we use                   !K                                                          or write as a whole
              P (x|λ) =      πk N (x|µk , Σk )           (2)
                                                                                       p(u|Xs ) = [σ(XsT θu )]l (!u) · [1 − σ(XsT θu )]1−l (!u) (7)
                                                                                                                s                          s

                                    k=1
where σ(x) = 1/(1 + exp(−x)) and θu ∈ RV is the pa-             4.1 Dataset and Evaluation Metrics
rameter of Xs .
                                                                In this study, we use the standard summarization bench-
   The objective function is taken log-likelihood and de-       mark DUC2005 and DUC20061 for evaluation. DUC2005
fined as                                                        contains 50 query-oriented summarization tasks. For each
       !                                                        query, a relevant document cluster is assumed to be “re-
 L=          ls (u) log[σ(XsT θu )]+                            trieved”, which contains 25-50 documents. DUC2006 con-
       s∈S                                                      tains 50 query-oriented summarization tasks as well and
      (1 − ls (u))(nE(s∗ ∼ N(S))) log[1 − σ(XsT θu )]           each query contains 25 documents. Thus, the task is to
                                                          (8)   generate a summary from the document cluster for answer-
where nE(·) is number of n negative samples as Definition       ing the query2. The length of a result summary is limited
1, and n is set to 10 empirically. Considering convenience      by 250 words.
in estimation, we rewrite the final objective function as           We conducted evaluations by ROUGE [LH03] metrics.
                                                                The measure evaluates the quality of the summarization by
    L(s, u) = ls (u) · log[σ(XsT θu )]+                         counting the number of overlapping units, such as n-grams.
                                                         (9)    Basically, ROUGE-N is n-gram recall measure.
                 [1 − ls (u)] · log[1 − σ(XsT θu )]
                                                                4.2 Baseline Models and Settings
Parameters Estimation
                                                                We compare the TSE model with several query-focused
   The parameters {λ, θu , Xs }, where λ = {πk , µk , Σk }      summarization methods.
are estimated by maximizing the likelihood of the objec-
tive function jointly. A two-phase iteration process is con-      • TF-IDF: this model uses TF-IDF [NVM06] for scor-
ducted.                                                             ing words and sentences.
   Given {θu , Xs }, stochastic gradient descent (SGD) is         • Lead: take the first sentences one by one from the
adopted in updating parameters of the GMM. Given λ,                 document in the collection, where documents are or-
the gradient of θu is calculated using the back propagation         dered randomly. It is often used as an official baseline
based on the objective in Eq. 9.                                    of DUC.

                                                                  • LDA: this method uses Latent Dirichlet
3.2 Sentence Ranking
                                                                    Allocation[BNJ03] to learn the topic model. Af-
Sentence ranking aims to measure the relevant sentences             ter learned the topic model, we give max score to the
with consideration of query information. In this paper,             word of the same topic with query. The reader can
relevance ranking of sentences primarily relys on seman-            refer to the paper [TYC09] for the details.
tic vector-based cosine similarity [KMTD14] that is a
promising measure to compute relatedness for summariza-           • SNMF: this system [WLZD08] is for topic-biased
tion. Additionally, statistics features (i.e., TFIDF score          summarization. It utilised non-negative matrix factor-
[NVM06]). In summary, the ranking score is formulated               ization (SNMF) to cluster sentences and from which
as:                                                                 selected multi-coverage summary sentences.

                                                                  • Word2Vec: the vector representations of words can
                 nw                                                 be learned by Word2Vec [MCCD13, MSC+ 13] mod-
                 !
 Score(S) = α          T F IDF (wt ) + βsim(vec(s), vec(Q))         els. The sentence representation is calculated by using
                 t=1                                                an average of all word embeddings in the sentence.
             + γsim(vec(Ts ), vec(TQ ))                           • PV: PV [LM14] learns sentence vectors based on
                                                        (10)        Word2Vec Model. Thus, we use the same parame-
where Q is the query, sim(·) represents the function to             ters as that in our approach to calculate the scores of
compute similarity, and we use cosine similarity in this pa-        sentences.
per. α, β and γ are parameters in the summarization sys-
tem.                                                              • TWE: TWE [LLCS15] employs LDA to refine Skip-
                                                                    gram model. It learns topical word embeddings based
                                                                    on both words and their topics. The sentence repre-
4 Experiments                                                       sentation is calculated by using an average of all word
In this section, we present experiments to evaluate the per-        vectors in the sentence.
formance of our method in query focused multi-document            1 http://duc.nist.gov/data.html

summarization task.                                               2 In DUC, the query is also called “narrative” or “topic”
                 Table 1: Overall ROUGE evaluation (%) of different models for DUC2005 and DUC 2006
                                                     DUC2005                   DUC2006
                               Method
                                               ROUGE-1 ROUGE-2          ROUGE-1 ROUGE-2
                              LEAD               29.71      4.69          32.61       5.71
                              TF-IDF             33.56      5.20          35.93       6.53
                             Avg-DUC             34.34      6.02          37.95       7.54
                              SNMF               35.0       6.04          37.14       7.52
                             Word2Vec            34.59      5.48          36.33       6.34
                                PV               35.41      6.14          37.52       7.41
                             DocEmb              30.59      4.69          32.77       5.61
                               LDA               31.70      5.33          33.07       6.02
                               TWE               35.05      6.06          37.58       6.52
                               TSE               36.28      6.53          37.96       7.56
                               Impr              2.46       6.35           0.03       0.27

               Table 2: Influence analysis of each factor for the TSE summarization, evaluated on DUC2005
                                   Method
                                                      Rouge-1      Rouge-2       ratio 1      ratio 2
                       TF-IDF       sen sim topic
                                      √       √
                         ×
                         √                    √       35.54          6.37        2.04%        2.45%
                           √          ×
                                      √               34.88          5.99        3.86%        8.27%
                                              ×       35.92          6.47        0.99%        0.91%
   Note that all the baselines are conducted similar with      of each element as shown in Table 2. We calculate the per-
the proposed summary framework as unsupervised query-          centage that the TSE is superior to the one neglecting one
focused summarization system.                                  feature, denoted as ratio 1 for ROUGE-1 metrics and ratio
   The learning rate η is set to 0.05 and gradually reduced    2 for ROUGE-2. As shown the ratio 1 is 3.86% and ratio
to 0.0001 as training converge. The word2vec is addition-      2 is highly up to 8.27%, it illustrates that sentence sim-
ally trained by English Gigaword Fifth Edition 3 and di-       ilarity computation by our proposed sentence embedding
mension is set to 256. The dimension of PV is set to 128,      plays a consistently dominant role for the summary. On
and the TWE is 64, similar as the proposed TSE model.          the contrary, it has improving space for utilizing topics for
                                                               summary.
4.3 Experimental Results and Discussion
                                                               5 Conclusion
In this subsection, we give a report of experimental results
and analysis. Table 1 shows the overall summarization          This work proposes a novel sentence embedding model
performances of the proposed model and baseline mod-           which wisely incorporates sentence coherence and topic
els. It can be observed that our approach gives the best       characteristics in the learning process. It can automatically
summary compare to any other method in ROUGE metrics           generates distributed representations for sentences as well
over two benchmark datasets, which strongly demonstrates       as assigns sentences with semantic and meaningful topics.
the outstanding performance of the proposed summariza-         We conduct extensive experiments on DUC query-focused
tion model. Impr denotes the relative improvements over        summarization datasets. Utilizing the superiority of the
the best of the nine baselines. We can find that the pro-      proposed TSE that facilitates sentence ranking, the system
posed TSE sentence embedding consistently outperforms          achieves competitive performance. A promising future di-
the baselines from 0.03% to 6.35%.                             rection is to strengthen topic optimization during the sen-
   Experimental results have validated our proposed model      tence learning. With the assistance of semantic topic, we
that exploits sentence similarity and topic information can    can extract sentence-based saliance topic representation as
improve the overall performance. Nevertheless, they could      direct summary.
not point out impact of the designed measure of sentence
similarity. Hence, we keep consistency for our algorithm       Acknowledgments
framework except for removing the part of features while
calculating sentence ranking, to investigate the importance    This work is supported by National Basic Research Pro-
                                                               gram of China (973 Program, Grant No.2013CB329303),
  3 https://catalog.ldc.upenn.edu/LDC2011T07                   National Nature Science Foundation of China (Grant
No.61602036), and Beijing Advanced Innovation Center        [LM14]      Quoc V. Le and Tomas Mikolov. Distributed
for Imaging Technology (BAICIT-2016007).                                representations of sentences and documents.
                                                                        Computer Science, 4:1188–1196, 2014.
References                                                  [MCCD13] Tomas Mikolov, Kai Chen, Greg Corrado, and
[BNJ03]    David M. Blei, Andrew Y. Ng, and Michael I.               Jeffrey Dean. Efficient estimation of word rep-
           Jordan. Latent dirichlet allocation. JMLR,                resentations in vector space. Computer Sci-
           3:993–1022, 2003.                                         ence, 2013.

[CLL+ 15] Ziqiang Cao, Sujian Li, Yang Liu, Wenjie Li,      [MSC+ 13] Tomas Mikolov, Ilya Sutskever, Kai Chen,
          and Heng Ji. A novel neural topic model and                 Greg Corrado, and Jeffrey Dean. Distributed
          its supervised extension. In Proceedings of                 representations of words and phrases and their
          AAAI’15, pages 2210–2216, 2015.                             compositionality. 26:3111–3119, 2013.
                                                            [NVM06]     Ani Nenkova, Lucy Vanderwende, and Kath-
[CLW+ 15] Kuan Yu Chen, Shih Hung Liu, Hsin Min
                                                                        leen Mckeown. A compositional context sen-
          Wang, Berlin Chen, and Hsin Hsi Chen.
                                                                        sitive multi-document summarizer: explor-
          Leveraging word embeddings for spoken doc-
                                                                        ing the factors that influence summarization.
          ument summarization. Computer Science,
                                                                        In Proceedings of SIGIR’06, pages 573–580,
          2015.
                                                                        2006.
[Gal06]    M. Galley. A skip-chain conditional random       [OLLL11] You Ouyang, Wenjie Li, Sujian Li, and Qin
           field for ranking meeting utterances by impor-            Lu. Applying regression models to query-
           tance. In Proceedings of EMNLP’07, 2006.                  focused multi-document summarization. In-
                                                                     formation Processing & Management An In-
[GNJ07]    Surabhi Gupta, Ani Nenkova, and Dan Juraf-
                                                                     ternational Journal, 2011.
           sky. Measuring importance and query rele-
           vance in topic-focused multi-document sum-       [PRS15]     Daraksha Parveen, Hans-Martin Ramsl, and
           marization. 2007.                                            Michael Strube. Topical coherence for graph-
                                                                        based extractive summarization. In Proceed-
[HL05]     Sanda Harabagiu and Finley Lacatusu. Topic                   ings of EMNLP’15, pages 1949–1954, 2015.
           themes for multi-document summarization.
           In Proceedings of SIGIR’05, pages 202–209,       [TYC09]     Jie Tang, Limin Yao, and Dewei Chen. Multi-
           2005.                                                        topic based query-oriented summarization. In
                                                                        Proceedings of SDM’09, pages 1147–1158,
[KMTD14] Mikael Kågebäck, Olof Mogren, Nina Tah-                      2009.
         masebi, and Devdatt Dubhashi. Extractive
         summarization using continuous vector space        [WLZD08] Dingding Wang, Tao Li, Shenghuo Zhu, and
         models. In Proceedings of EACL’14, 2014.                    Chris Ding. Multi-document summarization
                                                                     via sentence-level semantic analysis and sym-
[KNY15]    Hayato Kobayashi, Masaki Noguchi, and                     metric matrix factorization. In Proceedings of
           Taichi Yatsuka. Summarization based on                    SIGIR’08, pages 307–314. ACM, 2008.
           embedding distributions. In Proceedings of
           EMNLP’15, 2015.                                  [YCT15]     Min Yang, Tianyi Cui, and Wenting Tu.
                                                                        Ordering-sensitive and semantic-aware topic
[LH00]     Chin Yew Lin and Eduard Hovy. The au-                        modeling. In Proceedings of AAAI’15, 2015.
           tomated acquisition of topic signatures for
                                                            [YGVS07] Wen Tau Yih, Joshua Goodman, Lucy Vander-
           text summarization. In Proceedings of COL-
                                                                     wende, and Hisami Suzuki. Multi-document
           ING’00, pages 495–501, 2000.
                                                                     summarization by maximizing informative
[LH03]     Chin Yew Lin and Eduard Hovy. Auto-                       content-words. In Proceedings of IJCAI’07,
           matic evaluation of summaries using n-gram                pages 1776–1782, 2007.
           co-occurrence statistics. In Proceedings of      [YP15]      Wenpeng Yin and Yulong Pei. Optimizing
           ACL’03, 2003.                                                sentence modeling and selection for document
                                                                        summarization. In Proceedings of IJCAI’15,
[LLCS15] Yang Liu, Zhiyuan Liu, Tat Seng Chua, and
                                                                        2015.
         Maosong Sun. Topical word embeddings. In
         Proceedings of AAAI’15, 2015.