<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Helium @ CL-SciSumm-19 : Transfer learning for e ective scienti c research comprehension</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bakhtiyar Syed</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vijayasaradhi Indurthi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Balaji Vasan Srinivasan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasudeva Varma</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Adobe Research</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IIIT Hyderabad</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Automatic research paper summarization is a fairly interesting topic that has garnered signi cant interest in the research community in recent years. In this paper, we introduce team Helium's system description for the CL-SciSumm shared task colocated with SIGIR 2019. We speci cally attempt the rst task, targeting in building an improved recall system of reference text spans from a given citing research paper (Task 1A) and constructing better models for comprehension of scienti c facets (Task 1B). Our architecture incorporates transfer learning by utilising a combination of pretrained embeddings which are subsequently used for building models for the given tasks. In particular - for task 1A, we locate the related text spans referred to by the citation text by creating paired text representations and employ pre-trained embedding mechanisms in conjunction with XGBoost, a gradient boosted decision tree algorithm to identify textual entailment. For task 1B, we make use of the same pretrained embeddings and use the RAKEL algorithm for multi-label classi cation. Our goal is to enable better scienti c research comprehension and we believe that a new approach involving transfer learning will certainly add value to the research community working on these tasks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Traditionally, summarization has been a key requirement for facilitating
easier comprehension of documents. Automatic summarization is the process of
shortening a text document with software, in order to create a summary with
the major points of the original document. The CL-SciSumm 2019 Shared Task
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] helps in facilitating advances in scienti c communication summarization. It
encourages the comprehension of information in automatic scienti c paper
summarization in the form of facet identi cation, , and the use of new resources, such
as the mini-summaries written in other papers by other scholars, and concept
taxonomies developed for computational linguistics.
      </p>
      <p>As the number of scienti c publications increases, researchers have to spend
more time reading them. Reading complete scienti c publications to understand
and fully comprehend the content is time-consuming for researchers and
enthusiasts alike. In many cases, it even drives away the newcomers from a particular
eld of study just because there are not enough background articles to aid the
process of easier comprehension of highly technical research papers. While
abstracts present in the paper help the researchers get a big picture of the paper,
they may not cover all the important aspects of the paper. Moreover, the
abstracts may be biased, overstated or understated, or the abstracts may not be
considered as good summaries by the research community. Automatic
summarization of the paper can help capture the details and contributions of a paper
more accurately in an unbiased manner as compared to the abstract of the
paper. Generating automatic summaries of scienti c papers is hence, an important
and challenging task.</p>
      <p>A citation is a reference to a published or unpublished source in the body
of the document. Many digital documents cite other relevant document(s) in
their body. News documents, legal documents, wikipedia articles and scienti c
papers have citations to each other. Citations allow the reader to determine
independently whether the referenced material supports the author's argument
in the claimed way, and to help the reader gauge the strength and validity of the
material the author has used. Citations of a scienti c paper help understand the
relevant ideas and their evolution. The sentences of the citing articles containing
the citation to the publication, also known as the citances are useful in analysing
the reference publications and thereby contribute to better summarization of
scienti c publications.</p>
      <p>The rest of the paper is organized as follows. We brie y describe related
work in Section 2. We describe and formulate the problem of the shared task
in Section 3. Next, our focus shifts to the methodology and the experiments in
Section 4. We conclude the paper with mentioning the inferred conclusions and
possible directions for future work in section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The CL-SciSumm series of shared tasks [
        <xref ref-type="bibr" rid="ref11 ref17 ref18">11, 17, 18</xref>
        ] has garnered much attention
and attracted many contributions from the research community.
      </p>
      <p>A large number of related works exist as this shared task has been running
since 2016. For the subtask 1A, which comprises of identi cation of text spans
according to citations, the solution techniques can be categorized broadly into
two types - solutions based on retrieval task and solutions based on classi cation
task. The former methods formulate the problem as an information retrieval
problem - learning to rank and selecting the top item from the ranked list.</p>
      <p>
        For subtask 1A, Felber et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] created an index of the reference papers
and treating each citance as a query and the results were ranked by VSM and
BM25 model. Prasad et al. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] used tf-idf and LCS for the syntactic score and
pairwise neural network ranking model to calculate semantic relatedness score.
Cao et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] simplify the problem as a ranking problem and select the rst item
from the ranked list retrieved.
      </p>
      <p>
        The classi cation methods include works from Ma et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], Cao et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
Zhang et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], Li et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Ma et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] used four classi ers with di erent
features and a majority voting mechanism to vote for the nal result. Cao et
al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] use SVM with features like tf-idf, named entity features and position
information of the reference sentence. Zhang et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] computed features based
on sentence-level and character-level tf-idf scores and word2vec similarity and
then used a logistic regression classi er to classify sentences which will be selected
or not. Li et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] aggregated the results from several basic methods and used
majority voting for the nal result. Wang et al. [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] use Information Retrieval
Model by incorporating word embeddings and domain ontology. L. Moraes et al.
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] use a sentence similarity method using Siamese Deep Learning Networks [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]
and a Positional Language Model approach [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
      </p>
      <p>
        For subtask 1B, which comprises of identi cation of the facet, almost all
the teams have formulated this as a text classi cation problem Classi cation
methods for subtask 1B can be divided into rule based methods and supervised
learning methods using a gamut of features ranging from TF-IDF together with
a variety of supervised machine learning algorithms. Moraes et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] use SVMs,
Random Forests, Decision Trees and Multi layer Perceptron and an ensemble
method AdaBoost using TF-IDF features. Some teams like Lauscher et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
have used deep learning text classi cation techniques like Convolutional Neural
Networks (CNNs).
      </p>
      <p>It can be observed that most of the researchers have formulated subtask 1A
as a retrieval task than a classi cation task.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Problem</title>
    </sec>
    <sec id="sec-4">
      <title>De nition and Formulation</title>
      <p>In this section, we de ne the problem formally. Our team, Helium participated
in Task 1 - comprising of subtasks 1-A and 1-B speci cally. We will introduce the
problem de nition and then proceed to describe our formulation of the problem.
3.1</p>
      <sec id="sec-4-1">
        <title>Task Description</title>
        <p>Given: A topic consisting of a Reference Paper (RP) and Citing Papers (CPs)
that all contain citations to the RP. In each CP, the text spans (i.e., citances)
have been identi ed that pertain to a particular citation to the RP.</p>
        <p>Task 1-A: For each citance, identify the spans of text (cited text spans) in
the RP that most accurately re ect the citance. These are of the granularity of
a sentence fragment, a full sentence, or several consecutive sentences (no more
than 5).</p>
        <p>Task 1B: For each cited text span, identify what facet of the paper it belongs
to, from a prede ned set of facets. These pre-de ned list of facets as de ned by
the organizers are Method, Aim, Hypothesis, Implication, Results.</p>
        <p>
          We pose the problem of nding reference text spans from the citing sentences
(Task 1A) as a sentence-pair classi cation problem. The closest problem in
literature is the Paraphrase Detection problem [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and the Entailment detection
problem [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>For Task 1B, since there are multiple instances in the dataset of two or more
labels for each citance text within the CP, we formulate the problem as one
belonging to the multilabel category of problems.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Dataset Creation</title>
        <p>The task organizers provided us with 1018 Reference Papers(RPs) and their
corresponding Citing Papers(CPs) along with their citations to the particular
RP. Each citation consists of a Reference O set. There may be one or more
Reference O sets for a particular citation. These o sets point to the sentence
IDs in the RP wherein the citation is being referred to.</p>
        <p>For our task, we simplify the dataset to create pairs for each citance text in
the CP. Each citance text in the CP is paired with the reference text
(corresponding to the reference o set). Each pair was labeled with a 1/0 binary variable to
show if there is a possibility of entailment of the reference text with respect to
the citance text. From the training set of 2018, we collect 180,685 such pairs.
Similarly, we were able to collect 3,398,218 such pairs from the newly released
2019 dataset. As we pose task 1B of scienti c facet comprehension as a
multilabel classi cation task, we build the dataset from the existing labels available to
us. Since the existing labels are only available for the 2018 dataset, we restrict
ourselves to the 752 cleaned citance texts from the CPs for our prediction task.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Methodology and Experiments</title>
      <p>In this section, we go over the critical aspects of our team's submitted system to
the shared task. We rst provide an overview of pre-trained text representations
along with how we set the stage for the transfer learning process. Next, we look
at each of the sub-tasks in detail and the algorithms used for nal predictions
for the dataset given.
4.1</p>
      <sec id="sec-5-1">
        <title>Pre-trained text representations and Transfer Learning</title>
        <p>
          Pre-trained text representations have been used fairly well for a number of
natural language processing (NLP) tasks [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Unsurprisingly, performance of
various NLP tasks have improved as a result of using pre-trained representations,
most often as the embedding layer as the rst step of a deep neural network
architecture [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] . Word embeddings have been widely used in modern NLP
applications as they provide a ready-to-use simple vector representation of words.
They capture the semantic properties of words and the linguistic relationship
between them. These word embeddings have improved the performance of many
downstream tasks across many domains like text classi cation, machine
comprehension etc. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Multiple ways of generating word embeddings exist, such as
Neural Probabilistic Language Model [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], Word2Vec [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], GloVe [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], and more
recently ELMo [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. These word embeddings rely on the distributional linguistic
hypothesis. They di er in the way they capture the meaning of the words or the
way they are trained. Each word embedding captures a di erent set of semantic
attributes which may or may not be captured by other word embeddings. In
general, it is di cult to predict the relative performance of these word embeddings
on downstream tasks. The choice of which word embeddings should be used for
a given downstream task depends on experimentation and evaluation.
        </p>
        <p>
          While word embeddings can produce representations for words which can
capture the linguistic properties and the semantics of the words, the idea of
representing sentences as vectors is an important and open research problem [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
Finding a universal representation of a sentence which works with a variety of
downstream tasks is the major goal of many sentence embedding techniques.
A common approach of obtaining a sentence representation using word
embeddings is by the simple and nave way of using the simple arithmetic mean of all
the embeddings of the words present in the sentence. Smooth inverse frequency,
which uses weighted averages and modi es it using Singular Value
Decomposition (SVD), has been a strong contender as a baseline over traditional averaging
technique [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Other sentence embedding techniques include p-means [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ],
InferSent [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], SkipThought [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], Universal Encoder [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          In particular, for our task, we use these pretrained sentence encoders to get
a dense vector representations for each of the citance/reference texts and we
then use these embeddings as features for building models for the downstream
classi cation tasks. Our team's system focuses primarily on the use of Universal
Sentence Encoder [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] to get the text representations which are then used in the
transfer learning mechanism to other machine learning algorithms as we show
below.
4.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Finding Reference Text Correspondence (Task 1A)</title>
        <p>As reported in section 3, we treat task 1A as a sentence-pair classi cation
algorithm problem where a (reference text - citance text) pair will be classi ed as a
positive class only if the reference text of the RP accurately re ects the citance
text of the CP, negative class otherwise. We were able to construct 180,685 such
pairs from the 2018 version of the dataset. Similarly, we were able to construct
3,398,218 pairs of citance-reference text from the 2019 dataset. As our pipeline
suggests, we go through the following steps:
1. Obtain pre-trained dense sentence representations for the citance text and
reference text separately for each pair using the Universal Sentence Encoder.
2. Once the pre-trained representations are available, construct the features for
the transfer learning phase in the pipeline. These features are constructed
by element-wise subtraction of the 512-dimensional dense vector
representations. We hypothesise that this step enables carrying forward better features
for the next step in the pipeline.
3. Post obtaining the 512-dimensional output from the above step, we use a
variant of gradient boosted decision tree algorithm for the binary classi
cation task. Speci cally, we employ eXtreme Gradient Boosting (XGBoost).</p>
        <p>Gradient Boosting utilises the principles of Gradient Descent and Boosting
to form the core of its algorithmic prowess. Boosting, essentially is an
ensemble of weak learners where the misclassi ed records are given greater weight
(boosted) to correctly predict them in later models. These weak learners
are later combined through a linear combination to produce a single strong
learner. With eXtreme Gradient Boosting (XGBoost), we take advantage of
the following features3 of the tree-based boosting algorithm XGBoost:
{ Approximation for split- nding.
{ Column block for parallel learning.
{ Sparsity-awareness.
{ Cache-aware access.
{ Out-of-core computation.
{ Regularized Learning Objective.
{ Fast!
Since XGBoost is also used in a lot of machine learning contests which are
held competitively4, we speci cally chose this as a successor to the
pretrained representation in the transfer learning pipeline.
4. During the prediction phase: For each of the CPs - for each citance text,
we then select the corresponding RP's reference texts and rank the top 5
candidates based on the probabilities of being an entailment. These
probabilities are readily available when XGBoost is being used to classify the
instances. We nally select the top 5 ranked reference texts from the RP for
the corresponding citance text and re ect it in our system's submission.
Due to limitations on the compute memory requirements, we do not run
crossvalidation over the entire dataset, rather we select a random subset of 500 labels
from the entailment category exhibiting the label 1 and similarly a random subset
of 500 for the label 0. Our results for subtask 1A are detailed in Table 1.</p>
        <p>Class
Reference text accurately re ects citance? (Label 1)
Reference text accurately re ects citance? (Label 0)
Precision Recall F-1
80.75 64.60 71.78
70.50 84.60 76.91
We have 752 instances of citance text - many of them for which for which more
than one labels are given - corresponding to the 2018 dataset. There are 5 possible
scienti c facets as annotated - Method, Aim, Hypothesis, Implication, Results.
We seek to build a system which can detect one or more than one labels for the
given citance text at hand. Formally, multi-label classi cation is the problem of
3 https://www.kdnuggets.com/2017/10/xgboost-concise-technical-overview.html
4 www.kaggle.com
nding a model that maps inputs x to binary vectors y (assigning a value of 0
or 1 for each element (label) in y). In our case - y - is a 5-vector array for each
of the given scienti c facets.</p>
        <p>Since the number of samples in the dataset is small, deep learning techniques
do not perform well. We aim to thus use advantage of ensemble-based methods
for the multi-label classi cation problem. Traditionally, multi-label classi cation
problems have been tackled by problem transformation, i.e., either treating them
as a binary classi cation problem - by building a separate classi er for each label,
or as multi-class classi cation problem - by creating one binary classi er for every
label combination present in the training set(also known as the label powerset
transformation). Instead, we use an ensemble of classi ers which help us in the
task. Speci cally, we use the RAKEL algorithm - which employs random
klabel subsets approach wherein there is extensive use of multiple label-powerset
classi ers, each classi er trained on a random subset of the original labels for the
instances (in our case, 5). Finally, a voting mechanism is employed to nd the
correct labels. The RAndom k-labELsets (RAKEL) algorithm constructs each
member of the ensemble by considering a small random subset of labels and
learning a single-label classi er for the prediction of each element in the powerset
of this subset. In this way, the proposed algorithm aims to take into account
label correlations using single-label classi ers that are applied on subtasks with
manageable number of labels and adequate number of examples per label. The
authors of the RAKEL algorithm show in their paper that t their experimental
results on common multilabel domains involving protein, document and scene
classi cation show that better performance can be achieved compared to popular
multilabel classi cation approaches using the random k-labelsets approach.</p>
        <p>For scienti c facet comprehension, we rst construct dense vector
representations from the pre-trained Universal Sentence Encoder for the given citance
texts from the citance papers (CPs) and do a 5-fold cross validation on the 752
instances in the dataset. We report precision, recall, F-1, accuracy scores for
the same. To combat the harsh metrics (since we are dealing with a multi-label
problem and accuracies can be unsurprisingly low), we also report the Hamming
loss for the same.</p>
        <p>The average Hamming loss is 0.222, whereas the average accuracy is 43.29%.
The statistics for precision, recall and F-1 for each of the scienti c facets are
reported in Table 2.</p>
        <p>Scienti c Facet Precision Recall F-1</p>
        <p>Method 82.38 61.68 70.45</p>
        <p>Aim 14.64 38.02 20.79</p>
        <p>Results 28.19 42.89 33.94
Hypothesis 11.65 41.33 17.33</p>
        <p>Implication 15.46 17.37 15.26</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>The results of our experiments lead us to believe that transfer learning can pave
way for better scienti c comprehension and indeed bettering the cause as the
rst step towards building automated scienti c research summarization systems.
At the same time, techniques like utilizing pretrained word and sentence
embeddings can help build systems for better understanding of di erent scienti c facets
and can aid in e ective segmentation of research papers for further processing.
Indeed, some of the shortcomings are that the precision and recall scores for a
lot of the scienti c facets other than the Method facet are low. This is because
of a high imbalance in the data with the majority class being that of the Method
facet, and cross-validation e ectively tends to delete some of the labels for which
we have low data, and tends to give lower performance metrics across some of the
other classes. This is also a possible exploration for some of the future directions
which can be improved upon. We did indeed notice even lower performances for
some of the facets with a few of the other learning algorithms, which we omit in
the interest of space.</p>
      <p>A few of the future directions can be led by the possibilities of exploring
training sentence embedding mechanisms from scratch { on all scienti c research
papers data for particular domains and see if it helps in increased performance
across scienti c comprehension tasks. A strong emphasis can also be laid on
learning domain-speci c features for the cause. This is a promising area and we
believe such explorations will indeed bene t the cause of the scienti c community
in aiding automated scienti c summarization.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , T.:
          <article-title>A simple but tough-to-beat baseline for sentence embeddings. A simple but tough-to-beat baseline for sentence embeddings (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ducharme</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincent</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jauvin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A neural probabilistic language model</article-title>
          .
          <source>Journal of machine learning research 3(Feb)</source>
          ,
          <volume>1137</volume>
          {
          <fpage>1155</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Camacho-Collados</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pilehvar</surname>
          </string-name>
          , M.T.:
          <article-title>From word to sense embeddings: A survey on vector representations of meaning</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          <volume>63</volume>
          ,
          <volume>743</volume>
          {
          <fpage>788</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Polyu at cl-scisumm
          <year>2016</year>
          .
          <source>In: Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)</source>
          . pp.
          <volume>132</volume>
          {
          <issue>138</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kong</surname>
          </string-name>
          , S.y.,
          <string-name>
            <surname>Hua</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Limtiaco</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>John</surname>
          </string-name>
          , R.S.,
          <string-name>
            <surname>Constant</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guajardo-Cespedes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tar</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al.:
          <article-title>Universal sentence encoder</article-title>
          . arXiv preprint arXiv:
          <year>1803</year>
          .
          <volume>11175</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yasunaga</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitag</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
          </string-name>
          , M.Y.:
          <article-title>Overview and results: Cl-scisumm sharedtask</article-title>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiela</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrault</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Supervised learning of universal sentence representations from natural language inference data</article-title>
          .
          <source>arXiv preprint arXiv:1705.02364</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dagan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dolan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magnini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Recognizing textual entailment: Rational, evaluation and approaches{erratum</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>16</volume>
          (
          <issue>1</issue>
          ),
          <volume>105</volume>
          {
          <fpage>105</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>De Moraes</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karimi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verma</surname>
          </string-name>
          , R.M.: University of houston@ clscisumm
          <year>2018</year>
          . In: BIRNDL@ SIGIR. pp.
          <volume>142</volume>
          {
          <issue>149</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Felber</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kern</surname>
          </string-name>
          , R.:
          <article-title>Query generation strategies for cl-scisumm 2017 shared task</article-title>
          .
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          ) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Jaidka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rustagi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
          </string-name>
          , M.Y.:
          <article-title>Overview of the clscisumm 2016 shared task</article-title>
          .
          <source>In: Proceedings of the Joint Workshop on Bibliometricenhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)</source>
          . pp.
          <volume>93</volume>
          {
          <issue>102</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kiros</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zemel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Urtasun</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torralba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fidler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Skip-thought vectors</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3294</volume>
          {
          <issue>3302</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lauscher</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glavas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eckert</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Citation-based summarization of scienti c articles using semantic textual similarity</article-title>
          .
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          ) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Cist@ clscisumm17: Multiple features based citation linkage, classi cation and summarization</article-title>
          .
          <source>In: BIRNDL@ SIGIR (2)</source>
          . pp.
          <volume>43</volume>
          {
          <issue>54</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lv</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Positional language models for information retrieval</article-title>
          .
          <source>In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <volume>299</volume>
          {
          <fpage>306</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang, H.,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>C.</surname>
          </string-name>
          : Njust@
          <fpage>clscisumm</fpage>
          -
          <lpage>18</lpage>
          . In: BIRNDL@ SIGIR. pp.
          <volume>114</volume>
          {
          <issue>129</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mayr</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaidka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Report on the 2nd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries</article-title>
          (birndl
          <year>2017</year>
          ).
          <source>In: ACM SIGIR Forum</source>
          . vol.
          <volume>51</volume>
          , pp.
          <volume>107</volume>
          {
          <fpage>113</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Mayr</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaidka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Report on the 3rd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries</article-title>
          (birndl
          <year>2018</year>
          ).
          <source>In: ACM SIGIR Forum</source>
          . vol.
          <volume>52</volume>
          , pp.
          <volume>105</volume>
          {
          <fpage>110</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puhrsch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Advances in pretraining distributed word representations</article-title>
          .
          <source>arXiv preprint arXiv:1712.09405</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Mueller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thyagarajan</surname>
            ,
            <given-names>A.:</given-names>
          </string-name>
          <article-title>Siamese recurrent architectures for learning sentence similarity</article-title>
          .
          <source>In: Thirtieth AAAI Conference on Arti cial Intelligence</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>arXiv preprint arXiv:1802</source>
          .
          <volume>05365</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Wing-nus at cl-scisumm 2017: Learning from syntactic and semantic similarity for citation contextualization</article-title>
          .
          <source>In: BIRNDL@ SIGIR (2)</source>
          . pp.
          <volume>26</volume>
          {
          <issue>32</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25. Ruckle,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Eger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Peyrard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Concatenated p-mean word embeddings as universal cross-lingual sentence representations</article-title>
          .
          <source>arXiv preprint arXiv:1803</source>
          .
          <volume>01400</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>E.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Dynamic pooling and unfolding recursive autoencoders for paraphrase detection</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>801</volume>
          {
          <issue>809</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Turian</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ratinov</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Word representations: a simple and general method for semi-supervised learning</article-title>
          . In:
          <article-title>Proceedings of the 48th annual meeting of the association for computational linguistics</article-title>
          . pp.
          <volume>384</volume>
          {
          <fpage>394</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
          </string-name>
          , J.: Nudt@
          <fpage>clscisumm</fpage>
          -
          <lpage>18</lpage>
          . In: BIRNDL@ SIGIR. pp.
          <volume>102</volume>
          {
          <issue>113</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Pku@
          <fpage>clscisumm</fpage>
          -
          <lpage>17</lpage>
          :
          <article-title>Citation contextualization</article-title>
          .
          <source>In: BIRNDL@ SIGIR (2)</source>
          . pp.
          <volume>86</volume>
          {
          <issue>93</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>