<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>3 http://117.89.118.178:8001/CitationAS/
4 http://project.carrot2.org/
5 http://journals.plos.org/plosone/</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>CitationAS: A Summary Generation Tool Based on Clustering of Retrieved Citation Content</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jie Wang</string-name>
          <email>wangjie1342@qq.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shutian Ma</string-name>
          <email>mashutian0608@hotmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chengzhi Zhang</string-name>
          <email>zhangcz@njust.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Management, Nanjing University of Science and Technology</institution>
          ,
          <addr-line>Nanjing, China, 210094</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University)</institution>
          ,
          <addr-line>Fuzhou, China, 350108</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Usually, if researchers want to understand research status of any field, they need to browse a great number of related academic literatures. Luckily, in order to work more efficiently, automatic documents summarization can be applied for taking a glance at specific scientific topics. In this paper, we focus on summary generation of citation content. An automatic tool named CitationAS is built, whose three core components are clustering algorithms, label generation and important sentences extraction methods. In experiments, we use bisecting Kmeans, Lingo and STC to cluster retrieved citation content. Then Word2Vec, WordNet and combination of them are applied to generate cluster label. Next, we employ two methods, TF-IDF and MMR, to extract important sentences, which are used to generate summaries. Finally, we adopt gold standard to evaluate summaries obtained from CitationAS. According to evaluations, we find the best label generation method for each clustering algorithm. We also discover that combination of Word2Vec and WordNet doesn't have good performance compared with using them separately on three clustering algorithms. Combination of Ling algorithm, Word2Vec label generation method and TF-IDF sentences extraction approach will acquire the highest summary quality. * Corresponding author: Chengzhi Zhang. 1 http://www.cnki.net 2 http://scholar.google.com/</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Conference Topic</title>
      <p>Text mining and information extraction</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>Currently, quantity of electronic academic literatures has reached a massive level. Challenges
have shown up when people want to investigate research status quo in a field (Liu, 2013): (1)
When searching in academic databases (e.g., CNKI 1 ) or search engines (e.g., Google
Scholar2), users are often given the relevant and ranked results which include many redundant
information in themselves or among different platforms. (2) Although manual literature
summaries can help researchers learn quickly about a new field, such summaries are in a
small amount and their formation cycle is long which will definitely lead to hysteresis.
Therefore, tools and systems are urgently needed to automatically generate a comprehensive,
detailed and accurate summary according to the given topic words (Nenkova &amp; McKeown,
2011). At the same time, such tools and systems should also help researchers retrieve relevant
information in real time.</p>
      <p>
        Obviously, the automatic summary tool can deal with problems mentioned above. When
applying such tools, how to choose data for summary generation is another challenge. Firstly,
if all literature contents are used to generate a summary, system cost will be increased and
unimportant and redundant contents might be added. Secondly, if we only use abstracts for
summary generation, there will be information loss compared with using full text. Hence,
citation content can be chosen as dataset and the main reasons include: (1) Citation content is
not only consistent with original abstract, but also can provide more concepts, such as entities
and experimental methods
        <xref ref-type="bibr" rid="ref3">(Divoli, Nakov &amp; Hearst, 2012)</xref>
        , and even retain some original
information from cited articles. (2) Since citation content reflects author’s analysis and
summarization of other articles, it has objectivity and diversity (Elkiss, Shen, Fader, States &amp;
Radev, 2008). Some researchers have applied citation content to generate summaries. For
example, Tandon and Jain (2012) generated structured summary by classifying citation
content into one or more classes.
        <xref ref-type="bibr" rid="ref1">Cohan and Goharian (2015)</xref>
        grouped citation content and its
context at first, and then ranked sentences within each group, finally sentences were selected
for summary. Yang et al. (2016) utilized key phrase, spectral clustering and ILP optimization
framework to generate summary.
      </p>
      <p>In this paper, we use citation content to do automatic documents summarization, and apply
clustering algorithms to build an automatic summary generation tool, named CitationAS3. The
main works include: (1) We build a demonstration website which can automatically generate
summary under a given topic; (2) We optimize a search results clustering engine, Carrot24
(Osiński &amp; Weiss, 2005), in three aspects, similar cluster label merging, important sentences
extraction and summary generation.</p>
    </sec>
    <sec id="sec-3">
      <title>Summary Generation Tool</title>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>In this paper, we collected about 110, 000 articles in xml format from PLOS One5 between
2006 and 2015, covering subjects such as cell biology, chemistry, mental health, computer
science and so on. We identified citation sentences by rules, which discriminated whether a
sentence contains reference marks (e.g., “[1]”, “[2]-[4]”) or not, and then xml labels were
removed. 4, 339, 217 citation sentences were extracted to be used as citation content for
automatic summary generation. Table 1 displays citation sentence examples.</p>
        <p>No.</p>
        <p>1
2
3
4</p>
      </sec>
      <sec id="sec-3-2">
        <title>Gelatin zymography was performed as described previously [27].</title>
      </sec>
      <sec id="sec-3-3">
        <title>Two studies in Drosophila subobscura found considerable differences [22], [42].</title>
      </sec>
      <sec id="sec-3-4">
        <title>Even by knockout of a single VEGF-A allele mice were unable to survive [5]–[7].</title>
      </sec>
      <sec id="sec-3-5">
        <title>The PCP signaling pathway determines planar polarity in a variety of tissues[4], [7]–[8].</title>
      </sec>
      <sec id="sec-3-6">
        <title>Framework of CitationAS</title>
        <p>Framework of CitationAS is shown in Figure 1. Firstly, relevant citation sentences are
retrieved from index files according to search terms from user interface. Then, we apply
clustering algorithms to classify sentences into clusters which share same or similar topic.
After that, we merge clusters whose labels are more similar with each other. Finally, summary
is generated based on important sentences extracted from each cluster. And final evaluation is
carried out by volunteers.</p>
        <p>Sentences</p>
        <p>User Interface
Results Evaluation
Retrieval Module</p>
        <p>Clustering Module
Automatic Summary</p>
        <p>Generation</p>
        <p>Cluster Label</p>
        <p>Generation
citation sentences and structure information (e.g., doi, cited count, position of one sentence
and its first word in original article and paragraph). Our system
also applies built-in
algorithms of Lucene to obtain citation sentences associated with search terms and score
sentences based on relevance. Finally, CitationAS ranks results which are used for the next
step of clustering.</p>
      </sec>
      <sec id="sec-3-7">
        <title>Clustering module</title>
        <p>In this module, we firstly apply VSM (Yang &amp; Pedersen, 1997) to represent citation
sentences and use TF-IDF (Salton &amp; Yu, 1973) to calculate feature weights. In VSM, each
citation</p>
        <p>sentence
( 1, 1 ;…  ,  …;
is
equivalent
to
a
document
and
expressed
as 

= 

 ,  ), where   is the   ℎ feature item,   is feature weight of   in the
  ℎ sentence, meanwhile, 1 ≤  ≤  , 1 ≤  ≤  ,  and  are the number of feature item and
citation sentences. The formula of TF-IDF is shown as (1).</p>
        <p>=   ∗ 
 =   ∗ log(</p>
        <p>+ 0.01)</p>
        <p>Where   is frequency of   in sentence   , and   represents the number of sentences in
which   is located.</p>
        <p>Next, bisecting K-means, Lingo and STC, built-in Carrot2, are used to cluster citation
sentence respectively. Since VSM will represent documents in a high dimension, which will
cost efficiency of clustering algorithms, we adopt NMF algorithm (Lee, 2000) to reduce
dimensions. This algorithm obtains the non-negative matrix after decomposing the
termdocument matrix. It can be described as that for non-negative matrix   ∗ , we need to find
non-negative matrix   ∗ and   ∗ , which should satisfy the following formula:
  ∗ ≈   ∗ ×   ∗
reduction.</p>
        <p>Where   ∗ is the base matrix,   ∗ is the coefficient matrix, and  is the number of new
feature item. When  is less than  , we can replace   ∗ with   ∗ to achieve dimensionality</p>
        <p>In bisecting K-means, we will use coefficient matrix to calculate similarity between citation
sentence and clustering centroid. Each sentence is assigned to the most similar cluster. Labels
of each cluster are individual words which are three feature items with the greatest weight in
term-document matrix.</p>
        <p>Lingo algorithm firstly extracts key phrases by the suffix sorting array and the longest
common prefix array. Then it builds term-phrase matrix based on the key phrases, where
feature weight is calculated by TF-IDF. Thirdly, it constructs base vectors according to the
(1)
(2)
6 http://lucene.apache.org/
term-phrase matrix and the base matrix through NMF. Finally, each base vector gets
corresponding words or phrases to form one cluster label, and sentence containing label’s
words will be assigned to the corresponding cluster.</p>
        <p>STC algorithm (Zamir &amp; Etzioni, 1999) is based on Generalized Suffix Tree which
recognizes key words and phrases that occurred more than once in citation sentences. Then
each such words and phrases are used to come into being one base clusters. There may be
many same citation sentences in two clusters, while the cluster labels are different. So, we
merge these base clusters to form final clusters in order to reduce overlap rate of citation
sentences between clusters.</p>
        <p>Among the three algorithms, Lingo and STC have two common characteristics. They both
create overlapping clusters that means one document can be assigned to more than one cluster.
Besides, their cluster labels may appear phrases. While bisecting K-means is non-overlapping
clustering algorithm, and words included in the generated cluster labels may not correspond
with all cluster’s documents.</p>
      </sec>
      <sec id="sec-3-8">
        <title>Cluster label generation</title>
        <p>It is possible that some cluster labels are semantic similar to each other, for example, labels
like ‘data mining method’ and ‘data mining approach’ for the search terms ‘data mining’. In
order to improve experimental accuracy, similar cluster labels are merged in experiments. We
apply three methods to calculate semantic similarity between labels by using Word2Vec
(Mikolov, Le &amp; Sutskever, 2013) and WordNet (Fellbaum &amp; Miller, 1998).
(1) Similarity Computation Based on Word2Vec

1</p>
        <p>∑</p>
        <p>Word2Vec is a statistical language model based on corpus. It applies neural network to get
word vectors, which can be used to compute similarity between words. Given phrase  , we
assume that it is made up of word  ,  and  . Then we can get the   ℎ dimensional
representation in the phrase  , namely</p>
        <p>=1(  +   +   ), where  means the number of
words in  . Finally, we use cosine value to compute similarity between phrases. The formula
is shown as (3).</p>
        <p>Where  1 and  2 represents phrases,   1</p>
        <p>and   2 means the number of words in phrases,
( 1 ,  2 ), calculated via formula (4), means the similarity between words in  1 and  2.
(3) Similarity Computation Based on Combination of Word2Vec and WordNet
We linearly combine Word2Vec and</p>
        <p>WordNet to obtain a new similarity calculation
method. The formula is shown as (6), where  is a weight and we set it to be 0.5.
( 1,  2) = 

2
( 1,  2) + (1 −  )

( 1,  2)
(6)</p>
      </sec>
      <sec id="sec-3-9">
        <title>Automatic summary generation</title>
        <p>Clusters are sorted according to their size and each cluster is taken as a paragraph in the
final summary. To choose important citation sentences from each cluster, we design two
methods to measure sentence scores.
(2) Similarity Computation Based on WordNet</p>
        <p>WordNet is a semantic dictionary and organizes words in a classification tree, so semantic
similarity between words can be calculated by path in the tree. The formula is shown as (4).</p>
        <p>Then, similarity between phrases uses formula (5) to calculate.</p>
        <p>Where 
( 1,  2
) denotes the shortest path between words in the tree.
the following formula (7):</p>
        <p>Since each citation sentence is represented by the term-document matrix, we can obtain the
sentence weight. For the sentence  =  ( 1, 1;…  ,  …;
 ,  ), its weight is computed via

  = ∑ =1   /</p>
        <p>Thereby, we rank citation sentences in each cluster based on its weight. The sentences with
higher weight will be used as summary sentences.
(2) MMR based Sentences Extraction</p>
        <p>MMR (Carbonell, Jaime &amp;</p>
        <p>Goldstein, 1998) method considers similarity of selected
sentence to search items and redundancy to sentences in summary.</p>
        <p>= 
  ∈ −
[ 
(  ,  ) − (1 −  )</p>
        <p>Where  denotes the set of citation sentences in cluster,  denotes the set of summary
sentences, so   ∈</p>
        <p>−  denotes the set of not selected as summary sentences.   means
current citation sentence and  means search items.  is a parameter and generally set it to be
0.7.</p>
        <p>This method firstly selects maximum score of sentence as a summary sentence from the
candidate sentence set, then it recalculates MMR value of the left sentences. When the
candidate sentence set is empty, this algorithm ends.</p>
      </sec>
      <sec id="sec-3-10">
        <title>User interface of CitationAS</title>
        <p>As shown in Figure 2, users can input search terms and set parameters to get a summary.
The parameters (‘Parameter setup’ scope) are about summary generation methods and the
number of citation sentences for clustering. When users click ‘search’, sub-topics, which are
cluster labels and the number, will appear in the summary frame. Then users can click ‘All
Topics’, the automatic summary will be presented on the right side, where the bold fonts are
titles and others are content in summary’s paragraph. Summary sentence’s structure
information will be displayed, when users put the mouse on it.
(7)
(8)
All Topics
Sub-Topics</p>
        <p>Parameter setup
Summary</p>
        <p>Structure information
Since the summary is based on user’s search terms in CitationAS, we choose 20
highfrequency phrases from dataset as search terms and use them for experiments. Phrases are
shown in Table 2. Here, the frequency refers to the number of phrases presenting in citation
content dataset. We divide them into ten high frequency 2-gram and 3-gram separately. We
also find that phrases are related to medical field, this is because articles about biology and
mental health have a large proportion.</p>
        <p>In the cluster label generation test, we apply Davies-Bouldin (DB) and SC clustering index
(Fahad et al., 2014) to find the best label generation method for each clustering algorithm. SC
index is equal to the ratio of clusters’ separation and compactness. If DB value is lower and
SC value is higher, clusters are more compact and further from each other. The more number
of search terms for consistency between DB and SC, the better clustering results obtained by
the method will be. Through experiments, we find combination of Lingo and Word2Vec has
better clustering results with 8 search terms. When combining STC with WordNet, there are 6
search terms. If combining bisecting K-means with Word2Vec, we find a total of 9 search
terms. However, combination of Word2Vec and WordNet doesn’t have good performance
compared with applying them separately on the three clustering algorithms. The quality of
some cluster results based on this method is between WordNet and Word2Vec. The reason
may be that we only use linear function and set equal weights to combine them, which is too
simple to bring out their strengths. In a word, we use these methods to carry out the final
automatic summary generation experiment.
Phrase (Frequency）
cell line (37507)
gene expression (37001)
amino acid (35165)
transcription factor (25626)
cancer cell (25605)
stem cell (22567)
growth factor (17531)
signaling pathway (16597)
cell proliferation (14203)
meta analysis (12647)</p>
        <p>Phrase (Frequency）
reactive oxygen species (5160)
central nervous system (4418)
smooth muscle cell (3439)
protein protein interaction (3286)
single nucleotide polymorphism (2535)
tumor necrosis factor (2482)
genome wide association (2386)
case control study (2269)
false discovery rate (2209)
innate immune response (2133)</p>
        <p>In this paper, we choose 20 search terms and each of them generates summaries in 6
different approaches. Finally, 120 summaries are produced. Compression ratio is set to be
20%, which means the final summary length equals the number of retrieved citation sentences
multiplies by 20%. Then we invite 2 volunteers to make manual evaluation and apply 5 points
system to score. The evaluation standards are described in Table 3.</p>
        <p>In the evaluation process, we give volunteers 120 produced summaries and the
corresponding search words for each summary, but we do not let them know the generated
method behind each summary. Volunteers are demanded to mark each paragraph in the
summary, thus we can get average score of each summary. Since each summary is obtained
by one method, we can calculate average score of each method. In order to sketch the selected
summary generation approaches, we omit Word2Vec and Wordnet in the Table 4 and Figure
3. For example, method Lingo-Word2Vec-TF-IDF will be described as Lingo-TF-IDF.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Score</title>
      <p>5
4
3
2
1</p>
    </sec>
    <sec id="sec-5">
      <title>Evaluation standards</title>
      <p>Sentences are very smooth. Paragraphs and summaries are very comprehensive,
exist very small redundancy and can fully reflect retrieval topics. The logical
structure of summary is reasonable.</p>
      <p>Sentences are relatively smooth. Paragraphs and summaries are relatively
comprehensive, exist relatively small redundancy and can relatively reflect retrieval
topics. The logical structure of summary is relatively reasonable.</p>
      <p>Sentences are basically smooth. Paragraphs and summaries are basically
comprehensive, exist certain redundancy and can basically reflect retrieval topics.
The logical structure of summary is basically reasonable.</p>
      <p>Sentences are not smooth enough. Paragraphs and summaries are not
comprehensive, exist relatively high redundancy and cannot reflect retrieval topics
enough. The logical structure of summary is confusing.</p>
      <p>The smoothness of sentences becomes very poor. Paragraphs and summaries are far
from comprehensive, exist very high redundancy and cannot fully reflect retrieval
topics. There is no logical structure in the summary.</p>
      <p>We rank the six methods according to average score of each method shown in Table 4. We
can find that rankings of STC-WordNet-MMR, bisecting K-means-Word2Vec-TF-IDF and
bisecting K-means-Word2Vec-MMR are same in two volunteers scores. They both think
summary quality is poor by bisecting K-means algorithm, especially the combination of
bisecting K-means, Word2Vec and TF-IDF. Reasons of this phenomenon may be that
bisecting K-means is hard clustering and each sentence must belong to one cluster. Some
sentences in same cluster may not be subject to the cluster’s topic. And cluster labels may also
not effectively reflect the topic of citation sentences in cluster. Volunteers give different
rankings for the rest of methods, which indicates each of these approaches has its own
advantages and disadvantages.</p>
      <p>In order to make a comprehensive analysis about six methods, we average the scores of two
volunteers. As illustrated in Figure 3, scores obtained by 6 methods are close to 3, indicating
the generated summaries are comprehensive. Among them, combination of Lingo, Word2Vec
and TF-IDF acquires the highest summary quality which is 3.07. When it comes to TF-IDF or
MMR, summary quality obtained based on combination of Lingo and Word2Vec is higher.
The reason may be that Lingo algorithm uses abstract matrix and the longest common prefix
array when obtaining clustering labels, so that it can get more meaningful labels. In addition,
citation sentence is assigned to the cluster containing corresponding labels, instead of
calculating similarity between sentence and cluster centroid. This may be one of the reasons
for using TF-IDF method to get a better summary. Compared to TF-IDF, we also find that
summary quality is higher based on MMR after using combination of bisecting K-means and
Word2Vec. Bisecting K-means algorithm divides citation sentences according to similarity
between cluster centroid and sentences. Meanwhile, MMR also considers similarity between
citation sentences. However, TF-IDF ranks sentences only by their weight. Summary quality
obtained by combination of STC, WordNet and TF-IDF or MMR is almost same, which
indicates that sentences selection approaches do not have much impact on summary quality
based on this clustering algorithm.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper, we establish an automatic summary generation tool, named CitationAS. Our
tool mainly contains three components. The first is clustering algorithms including bisecting
K-means, Lingo and STC. The second is cluster label generation methods, Word2Vec,
WordNet and the combination of them. The last is automatic summary generation approaches
which are TF-IDF and MMR. Citation sentences are applied as summary generation data.
Through experiments, we choose the best label generation approach for each clustering
algorithm from semantic level, and then they are used in automatic summary generation. We
find that combination of Word2Vec and WordNet doesn’t improve system performance
compared with using them separately. Finally, automatic summary obtained by 6 methods are
comprehensive, which means that sentences are basic smooth, summary content is basic
comprehensive and reflects the retrieval topic, but it has redundancy. For soft cluster, such as
Lingo and STC, quality of summary obtained by TF-IDF may be better. The generated
summary by CitationAS may not completely reflect the topic, but people can refer to it.</p>
      <p>In future work, we will apply Ontology to calculate semantic similarity between labels and
use deep learning to improve quality of generative summary. We will also select new
approach to combine WordNet and Word2Vec in order to play their advantages. Besides,
automatic evaluation can be made to avoid wrong judgements by human.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work is supported by Major Projects of National Social Science Fund (No. 16ZAD224), Fujian Provincial
Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201704)
and Qing Lan Project.</p>
      <p>Elkiss, A., Shen, S., Fader, A., States, D., &amp; Radev, D. (2008). Blind Men and Elephants: What do
Citation Summaries Tell Us about a Research Article? Journal of the American Society for
Information Science and Technology, 59(1), 51-62.</p>
      <p>Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A. Y., Foufou, S. &amp; Bouras, A. (2014).</p>
      <p>A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE
Transactions on Emerging Topics in Computing, 2(3), 267-279.</p>
      <p>Fellbaum, C., &amp; Miller, G. (1998). WordNet: An Electronic Lexical Database. MIT Press.
Lee, D. D. (2000). Algorithms for nonnegative matrix factorization. Advances in Neural Information</p>
      <p>Processing Systems, 13(6), 556-562.</p>
      <p>Liu, X. (2013). Generating metadata for cyberlearning resources through information retrieval and
meta-search. Journal of the American Society for Information Science and Technology, 64(4):
771786.</p>
      <p>Mikolov, T., Le, Q. V., &amp; Sutskever, I. (2013). Exploiting Similarities among Languages for Machine</p>
      <p>Translation. Computer Science, 1-10.</p>
      <p>Nenkova, A., &amp; McKeown, K. (2011). Automatic summarization. Meeting of the Association for</p>
      <p>Computational Linguistics, 5(3), 1-42.</p>
      <p>Osiński, S., &amp; Weiss, D. (2005). Carrot2: Design of a Flexible and Efficient Web Information
Retrieval Framework. In Proceedings of the Third International Atlantic Web Intelligence
Conference, 439-444.</p>
      <p>Salton, G., &amp; Yu, C. T. (1973). On the Construction of Effective Vocabularies for Information</p>
      <p>Retrieval. Acm Sigplan Notices, 10(1), 48-60.</p>
      <p>Tandon, N., &amp; Jain, A. (2012). Citation context sentiment analysis for structured summarization of
research papers. In Proceedings of 35th German Conference on Artificial Intelligence, 1-5.
Yang, S., Lu, W., Yang, D., Li, X., Wu, C., &amp; Wei, B. (2016). KeyphraseDS: Automatic generation of
survey by exploiting keyphrase information. Neurocomputing, 224, 58-70.</p>
      <p>Yang, Y., &amp; Pedersen, J. O. (1997). A Comparative Study on Feature Selection in Text Categorization.</p>
      <p>In Proceedings of the 14th International Conference on Machine Learning, 412-420.
Zamir O., &amp; Etzioni O. (1999). Grouper: A dynamic clustering interface to Web search results.</p>
      <p>Computer Networks, 31(11), 1361-1374.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Cohan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Goharian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Scientific article summarization using citation-context and article's discourse structure</article-title>
          .
          <source>Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <fpage>390</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Carbonell</surname>
          </string-name>
          , Jaime, &amp;
          <string-name>
            <surname>Goldstein</surname>
          </string-name>
          . (
          <year>1998</year>
          ).
          <article-title>The use of MMR, diversity-based reranking for reordering documents and producing summaries</article-title>
          .
          <source>In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <fpage>335</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Divoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hearst</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Do Peers See More in a Paper Than Its Authors? Advances in Bioinformatics,</article-title>
          <year>2012</year>
          (
          <year>2012</year>
          ),
          <fpage>750214</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>