<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RALI System Description for CL-SciSumm 2016 Shared Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bruno Malenfant</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guy Lapalme</string-name>
          <email>lapalme@iro.umontreal.ca</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Université de Montréal, CP 6128, Succ Centre-Ville</institution>
          ,
          <addr-line>Montréal, Québec</addr-line>
          ,
          <country country="CA">Canada</country>
          ,
          <addr-line>H3C 3J3</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université de Montréal, CP 6128, Succ Centre-Ville</institution>
          ,
          <addr-line>Montréal, Québec</addr-line>
          ,
          <country country="CA">Canada</country>
          ,
          <addr-line>H3C 3J3</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>146</fpage>
      <lpage>155</lpage>
      <abstract>
        <p>We present our approach to the CL-SciSumm 2016 shared task. We propose a technique to determine the discourse role of a sentence. We differentiate between words linked to the topic of the paper and the ones that link to the facet of the scientific discourse. Using that information, histograms are built over the training data to infer a facet for each sentence of the paper (result, method, aim, implication and hypothesis). This helps us identify the sentences best representing a citation of the same facet. We use this information to build a structured summary of the paper as an HTML page.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        One’s task in research is to read scientific papers to be able to compare them, to identify
new problems, to position a work within the current literature and to elaborate new
research propositions [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>This implies reading many papers before finding the ones we are looking for. With
the growing amount of publications, this task is getting harder. It is becoming important
to have a fast way of determining the utility of a paper for our needs. A first solution
is to use web sites such as CiteSeer, arXiv, Google Scholar and Microsoft Academic
Search that provide cross reference citations to papers. Another approach is automatic
summarization of a group of scientific papers dealing with the subject.</p>
      <p>This year’s CL-SciSumm competition for summarization of computational
linguistics papers proposes a community approach to summarization; it is based on the
assumption that citances, the set of citation sentences to a reference paper, can be used
as a measure of its impact. This task implies identifying the text a citance refers to in
the reference paper and a facet (aim, result, method, implication and hypothesis) for the
referred text.</p>
      <p>We are building a system that given a topic, generates a survey of the topic from a set
of papers. That system uses citations as the primary source of information for building
an annotated summary. Our system must be able to identify the purpose/polarity/facet
of a citation. to direct the reader towards the more relevant information. The summary
is built by selecting sentences from the cited paper and the citations. This process uses a
similarity function between sentences. The resulting summaries are presented in HTML
format with their annotations and links to the original paper. The only task that is not
performed by our system is finding the text referred to by the citation. We intend to
use the information already found by our system (facet of citations and sentences) to
complete that task.</p>
      <p>We had already some experience in dealing with scientific papers and their
references, having participated to Task2 of the Semantic Publishing Challenge of
ESWC2014 (Extended Semantic Web Conference) on the extraction and characterization of
citations. A short review of previous work follows in Sect. 2. We will summarize the
task in Sect. 3 and the techniques for extracting information in Sect. 4. Finally, Sect. 5
will show our results.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Previous Work</title>
      <p>
        There has been a growing attention towards the information carried by citations and
their surrounding sentences (citances). These contain information useful for rhetorical
classification [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], technical surveys [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and emphasize the impact of papers [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Qazvinian [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and Elkiss [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] showed that citations provide information not present in
the abstract.
      </p>
      <p>
        Since the first works of Luhn [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Edmundson [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] many researchers have
developed methods for finding the most relevant sentences of papers to produce abstracts
and summaries. Many metrics have been introduced to measure the relevance of parts of
text, either using special purpose formulas [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] or using learned weights [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The
hypothesis for CL-SciSumm task is that important sentences can be pointed out by other
papers : a citation indicates a paper considered important by the author of the citing
paper.
      </p>
      <p>
        Another domain for study over scientific papers is the classification of their
sentences. Teufel [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] identified the rhetorical status of sentences using Bayes classifier.
      </p>
      <p>
        To find citations inside a paper, we need to analyse the references section.
Dominique Besagni et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] developed a method using pattern recognition to extract fields
from the references while Brett Powley and Robert Dale [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] looked citations and
references simultaneously using informations from one task to help complete the second
task.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Task Description</title>
      <p>
        For this year competition we were given 30 topics, 10 for training, 10 for tuning and
10 for testing [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Each topic is composed of a Reference Paper (RP) and some Citing
Papers (CPs). The citing papers contain citations pointing to the RP. An annotation file
is given for each topic. That file contains information about each citation, the citation
marker and the citance.
      </p>
      <p>There are two mandatory tasks (Task 1A and Task 1B) and an optional task (Task
2)3.</p>
      <p>Task 1A : Find the part of the RP that is indicated with each citance. This will be called
the referenced text.</p>
      <sec id="sec-3-1">
        <title>3 http://wing.comp.nus.edu.sg/cl-scisumm2016/</title>
        <p>Task 1B : Once the referenced text is identified, we need to attribute a facet to it. A
facet is one of these : result, method, aim, implication and hypothesis.
Task 2 : Building a summary for the RP using the referenced text identified in Task
1A.</p>
        <p>Both the training and the developing set of topics contain expected results for these
tasks. The next section will describe how our system performs on the test set.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Our Approach</title>
      <p>For the first task, we have to find the referenced text and its facet. We hypothesized that
the referenced text should be sentences sharing the same facet as the citance. We use
that fact to reduce the set of sentences to choose from for the reference. This is why we
execute Task 1B on all the sentences of the RP and all the citances prior to Task 1A. We
now present how we determine the facet of a citance, then the facet for sentences in the
RP and finally the referenced text.
4.1</p>
      <p>
        Task 1B : Facet Identification
Our goal is to be able to use our system for papers from different domains, without
having to train them again. Toward that objective, our system only uses words that are not
domain specific. Patrick Drouin [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] compiled such a list of words in his
Transdisciplinary scientific lexicon (TSL). This lexicon comprises 1627 words such as acceptance,
gather, newly, severe... We will denote the set of words from the lexicon using w 2 L.
      </p>
      <p>We trained two systems, one to attribute a facet to sentences in the RP and one to
attribute a facet to citances.</p>
      <p>We determine the word distribution for each facet using an histogram. We only use
words appearing in the TSL. This computation yielded a sum of each words present
in all referenced text for each facet. The facet with the highest score is chosen for that
sentence.</p>
      <p>For training our system, we extract the reference sentences from each annotation
with their assigned facet. Each sentence is tokenized using the NLTK library in Python.
Only words from the TSL are kept. Our dataset consists of pairs of list of words with a
facet : D = [(wsi; fi)].</p>
      <p>We build a profile (hf ) for each facet using a histogram. For each word in the
lexicon, we compute the number of times it appears in sentence paired with the a specific
facet.</p>
      <p>hf (w; D) =
cnt(w; wsi) =</p>
      <p>X [cnt(w; wsi) j (wsi; f ) 2 D]</p>
      <p>X [1 j w 2 wsi]
When a word appears more then once in a sentence, all its occurrences are counted.</p>
      <p>Once the histogram is built, we use it to find the facet of new sentence. First, we
extract the words that are part of the lexicon from the sentence, yielding the list of words
p. Then a score sf for each facet is computed by adding the profile of each word for
that facet. The facet that scored the highest value is assigned to the new sentence.
sf (p; D) =</p>
      <p>X hf (w; D)
w2p</p>
      <p>Looking closely at the results for the profile, we saw that some words have a
negative effect on finding the facet. To find a better sublist of words to use within the TSL,
we used a genetic algorithm that uses a population of lists of words.</p>
      <p>A genetic algorithm starts with an initial population (set of possible solutions) and
tries to find better solutions by applying small changes to existing solutions. In our case,
a solution is a subset of words Li of L. The initial population is built using random
subsets.</p>
      <p>To build the next generation, we use three different techniques :
1. Adding a random word to an existing solution : L0i = Li+fwg where w 2 (L Li).
2. Removing a random word from an existing solution : L0i = Li fwg where w 2 Li.
3. Combining two subsets of existing solutions : L0i = Lj [ Lk.</p>
      <p>Once enough solutions are built for the new population, each solution is tested using
cross-validation with the histogram. The list that performed best in the task is kept for
the next generation. We use the same technique over the dataset consisting of the citance
texts and their facets.
4.2</p>
      <p>Task 1A : Finding the Sentences Referred to by Citances
Having determined the facet of sentences in both the RP and citances, we are now ready
to assign referenced text to citances from the CPs. Our hypothesis is that a citance
should have the same facet as the text it refers to. We extract Qf the subset of sentences
from the RP that have the same facet f as a citance ci. To choose the sentence of RP
referred to by the citance, we look for the sentence from Qf that is the most similar
with the citance ci.</p>
      <p>1
simmcs(P1; P2)= (hs(P1; P2) + hs(P2; P1))
2</p>
      <p>Pw2Pi ms(w; Pj ) idfw
hs(Pi; Pj )= Pw2Pi idfw
ms(w; Pj )=max simwup(w; v)
v2Pj
(1)
(2)
(3)</p>
      <p>
        We use the similarity function simmcs defined by Mihalcea, Corley and
Strapparava [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This similarity function between sentences P1 and P2 (Equation 1) averages
two values, the similarity from P1 to P2, and the similarity from P2 to P1. The
similarity from one sentence Pi to the other Pj is computed by first pairing each word from
the first sentence w 2 Pi with a word in the second one v 2 Pj . A word is paired
with the one that is the most similar to it (Equation 3). For each pair (w; v), the value
of the similarity is weighted by the Inverse Document Frequency of the first word idfw
(Equation 2). The average of these weighted similarity values is computed to yield the
similarity between Pi and Pj . We use only words that are Noun, Verb, Adjective and
Adverb for this comparison. The POS_tagger of NLTK was used to assert the tag of
each word. Since we believe that the domain of the paper is important to compute that
similarity, we use all words, not only the ones that are part of the TSL.
      </p>
      <p>
        Mihalcea et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] reported that within the set of possible metrics to compare
words, the one proposed by Zhibiao Wu et Martha Palmer [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] yielded good result
(denoted simwup). This metric is also available with the NLTK package. To use that
metric, we transform each word into their synonym group synset using WordNet. The
IDF was computed for each synset. The computation was done over the set of all the
documents contained in the ACL Anthology Network4.
4.3
      </p>
      <p>
        Task 2 : Summarization
Multiple source summarization adds three problems [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] :
1. Redundancy : a paper will often be cited for the same reason over and over, resulting
in many citances having the same subject.
2. Identifying important differences between sources : our goal will be to find those
citances/references that bring new information and important information to the
summary.
3. Coherence : since sentences come from many sources, we want to ensure that the
summary forms an unified whole.
      </p>
      <p>
        For Task 2, we choose to use the Maximal Marginal Relevance (MMR) proposed by
Jaime G. Carbonell et Jade Goldstein [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Their technique is presented in Equation 4,
in which R is the list of possible sentences and V is the summary. They propose to use
the title of the research paper as the starting query Q.
      </p>
      <p>arg max
si2RnV
simmcs(si; Q)
(1
) max simmcs(si; sj )
sj2V
(4)</p>
      <p>At each iteration, their algorithm adds a sentence si to V . Sentences are choosen
so that they bring new information to the summary (Points 1 and 2) and it must have
a certain amount of similarity with the query (Point 2). must be adjusted to balance
between adding a sentence very similar to the query and a sentence very different from
the ones already in the summary V . We use the same metric (simmcs) as for task 1A to
compare sentences.</p>
      <p>We divided the summarization process in two steps : adding sentences from the
citance (R = CT) and adding sentences from the paper (R = RP). In the first step,
the algorithm chooses sentences from the set of citances until it reaches 150 words. For
that part, we use = 0:3 to give priority to similarity with the query, trying to remove
meaningless citances. Since citances have been identified as bringing new information</p>
      <sec id="sec-4-1">
        <title>4 http://clair.eecs.umich.edu/aan/index.php</title>
        <p>not present in the original paper, we believe it is important to keep them in the summary.
Then, the summary is completed (to 250 words) using sentences choosen from the RP.
Here, we use = 0:7. Since sentences are choosen in the RP, most of them are about
the same subject, we want to give priority to sentences that are more different.</p>
        <p>The summary is built in an XML format. Each sentence is identified with its position
(the id of the paper it was extracted from, the sid and ssid attributes inside the XML
source files). The citances contain the id of the referred paper. This information will
enable to point a reader towards the corresponding paper.</p>
        <p>To help analyse the summaries, our software builds an HTML page containing the
extracted information (see Fig. 1).
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>Task 1
We present our results for facet attribution to citance and reference text. The set of data
we receive is divided in two : the training set contains 197 sentences distributed over
the citances and 247 sentences over the reference text; the development set contains
273 sentences distributed over the citances and 330 sentences over the reference text.
We first train our system using the training data (T) and then we retrained it using both
set training and developing set together (TD). In each case, we test the result over both
sets. We show the result for simple training of the histogram and for the training using
the genetic algorithm (gen_T) to select the list of words to consider. We also trained
our histogram without limiting to the words in the TSL for comparison purpose.</p>
      <p>For the genetic algorithm, we let it run over 25 generations. Each generation started
with 1 000 lists of words. 9 000 lists are added using the proposed mutations, bringing
the number of lists to 10 000.</p>
      <p>The result of these experiments are presented in Table 1 and Table 2. We see that,
using the training set T gives good result on itself but lower result when we apply it on
the development set. After training with both set TD (Test + Development), the result
over the development set raises at the expense of the result for the training set. For
citance, the genetic algorithm yields better result over the training set only. It does not
help to get better histograms. Considering that fact, we ask ourselves if it is possible to
obtain better results using histograms, or if we have reached the limit of that technique?
Limiting our choice of words to the TSL did not give lower results. It is to be tested if the
histograms built with the TSL will perform better in another domain than computational
linguistics.</p>
      <p>Once we had identified the facets, we ran our script for finding the reference text. It
was able to reach an F1 score of 0.095 over the training set and 0.052 over the
development set (table 3). We reduced the search space for the referred text using the facet of
the citance. Since the identification of the facet is not perfect, this reduction might
remove a sentence we are looking for. In the future, we have to test our approach with all
sentences, instead of the reduced set, to see if this reduction of space causes a problem
more than helps the solution.
Figure 1 shows the HTML interface we have generated for showing the result of our
system. It allows for selecting different topics. The top of the page lets us choose
between the different topics that were summarised. Each topic will present, on the left
side, the text of each CPs and RP. The sentences have been divided and citance
identified. The right side contains the different summaries that our software builds (using
different values of ) and the gold standard summary. Each paper links to its pdf version
on the ACL Anthology5.</p>
      <p>On the left side of the top part of the figure we see the RP divided in sentences.
On the right side, there is a summary built by choosing five sentences from the set of
citances using a of 0.3. These sentences where selected to be as different as possible
by the MMR algorithm. The bottom screen shoot (Fig. 1) presents one of the CP on the
left. The citance and citation are colored to be easy to identify. The third sentence from
the top was selected by the algorithm for the summaries.</p>
      <sec id="sec-5-1">
        <title>5 http://aclanthology.info/</title>
        <p>6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We presented the use of distinguishing between topic and non-topic (TSL) words for
determining the facet of sentences in a paper. This technique is useful because it lets
our system work on paper in a domain independent way. We obtained good results with
a simple histogram. We still have to test our histogram over other domains, to see if
they also yield good results. Our experiments with a genetic algorithm to refine the list
of used words did not show any improvement.</p>
      <p>We presented our interface for browsing the results of our system. That interface
presents RP, CPs and summaries with links to the original paper. This interface helps
the reader browse through a topic.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Dominique</given-names>
            <surname>Besagni</surname>
          </string-name>
          , Abdel Belaïd, and
          <article-title>Nelly Benet : A Segmentation Method for Bibliographic References by Contextual Tagging of Fields</article-title>
          .
          <source>ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition</source>
          ,
          <volume>1</volume>
          :
          <fpage>384</fpage>
          -
          <lpage>388</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Jaime G. Carbonell, and Jade Goldstein :
          <article-title>The Use of MMR, Diversity-based Reranking for Reordering Documents</article-title>
          and
          <string-name>
            <given-names>Producing</given-names>
            <surname>Summaries</surname>
          </string-name>
          .
          <source>Research and Development in Information Retrieval - SIGIR</source>
          ,
          <fpage>335</fpage>
          -
          <lpage>336</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Drouin</surname>
          </string-name>
          :
          <article-title>Extracting a Bilingual Transdisciplinary Scientific Lexicon</article-title>
          .
          <source>Proceedings of eLexicography in the 21st Century : New Challenges</source>
          , New Applications. Presses universitaires de Louvain,
          <article-title>Louvain-la-</article-title>
          <string-name>
            <surname>Neuve</surname>
          </string-name>
          ,
          <volume>7</volume>
          :
          <fpage>43</fpage>
          -
          <lpage>54</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Drouin</surname>
          </string-name>
          :
          <article-title>From a Bilingual Transdisciplinary Scientific Lexicon to Bilingual Transdisciplinary Scientific Colloations</article-title>
          .
          <source>Proceedings of the 14th EURALEX International Congress. Fryske Akademy</source>
          , Leeuwarden/Ljouwert, Pays-Bas,
          <fpage>296</fpage>
          -
          <lpage>305</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Harold</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Edmundson</surname>
          </string-name>
          :
          <article-title>New Methods in Automatic Extracting</article-title>
          .
          <source>Journal of the ACM (JACM)</source>
          ,
          <volume>16</volume>
          (
          <issue>2</issue>
          ):
          <fpage>264</fpage>
          -
          <lpage>285</lpage>
          (
          <year>1969</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Elkiss</surname>
          </string-name>
          , Siwei Shen, Anthony Fader, Günes Erkan,
          <string-name>
            <given-names>David J.</given-names>
            <surname>States</surname>
          </string-name>
          , and
          <string-name>
            <surname>Dragomir R. Radev</surname>
          </string-name>
          <article-title>: Blind Men and Elephants: What Do Citation Summaries Tell Us About a Research Article?</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology - JASIS</source>
          ,
          <volume>59</volume>
          (
          <issue>1</issue>
          ):
          <fpage>51</fpage>
          -
          <lpage>62</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          Giles and
          <string-name>
            <given-names>Kurt D.</given-names>
            <surname>Bollacker</surname>
          </string-name>
          and Steve Lawrence :
          <article-title>CiteSeer: an Automatic Citation Indexing System</article-title>
          .
          <source>Proceedings of the Third ACM Conference on Digital Libraries</source>
          ,
          <fpage>89</fpage>
          -
          <lpage>98</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Kokil</given-names>
            <surname>Jaidka</surname>
          </string-name>
          , Christopher S.G. Khoo,
          <string-name>
            <surname>Jin-Cheon Na</surname>
          </string-name>
          , and Wee Kim Wee :
          <article-title>Deconstructing Human Literature Reviews - A Framework for Multi-Document Summarization</article-title>
          .
          <source>Proceedings of the 14th European Workshop on Natural Language Generation</source>
          ,
          <fpage>125</fpage>
          -
          <lpage>135</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Kokil</given-names>
            <surname>Jaidka</surname>
          </string-name>
          , Muthu Kumar Chandrasekaran, Sajal Rustagi, and
          <string-name>
            <surname>Min-Yen</surname>
            <given-names>Kan</given-names>
          </string-name>
          :
          <article-title>Overview of the 2nd Computational Linguistics Scientific Document Summarization Shared Task (CLSciSumm</article-title>
          <year>2016</year>
          ). To appear
          <source>in the Proceedings of the Joint Workshop on Bibliometricenhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL</source>
          <year>2016</year>
          ), Newark, New Jersey, USA. (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Julian</surname>
            <given-names>Kupiec</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jan O. Pedersen</surname>
          </string-name>
          , and
          <article-title>Francine Chen : A Trainable Document Summarizer</article-title>
          .
          <source>Research and Development in Information Retrieval - SIGIR</source>
          ,
          <fpage>68</fpage>
          -
          <lpage>73</lpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hans P. Luhn</surname>
          </string-name>
          <article-title>: The Automatic Creation of Literature Abstracts</article-title>
          .
          <source>IBM Journal of Research and Development - IBMRD</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ):
          <fpage>159</fpage>
          -
          <lpage>165</lpage>
          (
          <year>1958</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Qiaozhu</surname>
            <given-names>Mei</given-names>
          </string-name>
          , and ChengXiang Zhai :
          <article-title>Generating Impact-Based Summaries for Scientific Literature</article-title>
          .
          <article-title>Meeting of the Association for Computational Linguistics -</article-title>
          ACL,
          <fpage>816</fpage>
          -
          <lpage>824</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Rada</surname>
            <given-names>Mihalcea</given-names>
          </string-name>
          , Courtney Corley, and Carlo Strapparava :
          <article-title>Corpus-based and Knowledgebased Measures of Text Semantic Similarity</article-title>
          . AAAI,
          <volume>6</volume>
          :
          <fpage>775</fpage>
          -
          <lpage>780</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Saif</surname>
            <given-names>Mohammad</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Bonnie J.</given-names>
            <surname>Dorr</surname>
          </string-name>
          , Melissa Egan, Ahmed Hassan, Pradeep Muthukrishnan, Vahed Qazvinian,
          <string-name>
            <surname>Dragomir R. Radev</surname>
          </string-name>
          , and David M.
          <article-title>Zajic : Using Citations to Generate Surveys of Scientific Paradigms</article-title>
          .
          <article-title>North American Chapter of the Association for Computational Linguistics -</article-title>
          NAACL,
          <fpage>584</fpage>
          -
          <lpage>592</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Brett</surname>
            <given-names>Powley</given-names>
          </string-name>
          , and Robert Dale :
          <article-title>Evidence-based Information Extraction for High Accuracy Citation and Author Name Identification</article-title>
          .
          <source>RIAO '07 Large Scale Semantic Access to Content</source>
          ,
          <volume>618</volume>
          -
          <fpage>632</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Vahed</surname>
            <given-names>Qazvinian</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dragomir R. Radev</surname>
            , Saif Mohammad,
            <given-names>Bonnie J.</given-names>
          </string-name>
          <string-name>
            <surname>Dorr</surname>
            ,
            <given-names>David M.</given-names>
          </string-name>
          <string-name>
            <surname>Zajic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Whidby</surname>
          </string-name>
          , and T. Moon :
          <article-title>Generating Extractive Summaries of Scientific Paradigms</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>46</volume>
          :
          <fpage>165</fpage>
          -
          <lpage>201</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Dragomir R. Radev</surname>
          </string-name>
          , Eduard Hovy, and
          <string-name>
            <surname>Kathleen McKeown</surname>
          </string-name>
          :
          <article-title>Introduction to the Special Issue on Summarization</article-title>
          .
          <source>Computational Linguistics - Summarization</source>
          ,
          <volume>28</volume>
          (
          <issue>4</issue>
          ):
          <fpage>399</fpage>
          -
          <lpage>408</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Advaith</surname>
            <given-names>Siddharthan</given-names>
          </string-name>
          , and Simone Teufel :
          <article-title>Whose Idea Was This, and Why Does it Matter? Attributing Scientific Work to Citations. North American Chapter of the Association for Computational Linguistics -</article-title>
          NAACL,
          <fpage>316</fpage>
          -
          <lpage>323</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Simone</surname>
            <given-names>Teufel</given-names>
          </string-name>
          , and Marc Moens :
          <article-title>Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status</article-title>
          .
          <source>Computational Linguistics - COLI</source>
          ,
          <volume>28</volume>
          (
          <issue>4</issue>
          ):
          <fpage>409</fpage>
          -
          <lpage>445</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Zhibiao Wu</surname>
          </string-name>
          , and Martha Palmer :
          <article-title>Verbs Semantics and Lexical Selection</article-title>
          .
          <source>ACL '94 Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics</source>
          ,
          <fpage>133</fpage>
          -
          <lpage>138</lpage>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Peter N. Yianilos</surname>
          </string-name>
          , and Kirk G.
          <article-title>Kanzelberger : The LikeIt Intelligent String Comparison Facility</article-title>
          . NEC Research Institute. (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>