<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards dataset creation and establishing baselines for sentence-level neural clinical paraphrase generation and simplification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viraj Adduru</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sadid A. Hasan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joey Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuan Ling</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vivek Datla</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kathy Lee</string-name>
          <email>kathy.lee_1@philips.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ashequl Qadir</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oladimeji Farri</string-name>
          <email>dimeji.farri@philips.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Artificial Intelligence Lab, Philips Research North America</institution>
          ,
          <addr-line>Cambridge, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Rochester Institute of Technology</institution>
          ,
          <addr-line>Rochester, NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A paraphrase is a restatement of a text while retaining the meaning. Clinical paraphrasing involves restatement of sentences, paragraphs, or documents containing complex vocabulary used by clinicians. Paraphrasing can result in an alternative text that is either simple or complex form of the original input text. Simplification is a form of paraphrasing in which a sentence is restated into a linguistically simpler sentence yet retaining the meaning of the original sentence. Clinical text simplification has potential applications such as simplification of clinical reports for patients towards better understanding of their clinical conditions. Deep learning has emerged as a successful technique for various natural language understanding tasks preconditioned with large annotated datasets. In this paper, we propose a methodology to create preliminary datasets for clinical paraphrasing, and clinical text simplification to foster training of deep learningbased clinical paraphrase generation and simplification models.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Paraphrasing (a.k.a. paraphrase generation) is transforming
a text that can be a word, phrase, sentence, paragraph, or a
document, while retaining the meaning and content. For
example, the sentence ‘I am very well’ can be paraphrased
as ‘I am doing great’. Paraphrasing can lead to a new text
which may be simpler, more complex or at the same
complexity level as the source text. The task of paraphrasing text
to a simpler form is called simplification. In simplification,
the output text is a linguistically simplified version of the
input text. Paraphrasing and simplification may have
numerous applications such as document summarization, text
simplification for target audience e.g. children, and question
answering [Madnani and Dorr, 2010].</p>
      <p>In the clinical context, health care systems and medical
knowledge-bases contain large collections of texts that are
often not comprehensible to the layman population. For
example, clinical texts like radiology reports are used by
radiologists to professionally communicate their findings to
——————</p>
      <p> *This work was conducted as part of an internship program at
Philips Research.
other physicians [Qenam et al., 2017]. They contain
complex medical terminologies that the patients are not familiar
with. A recent study reported that allowing patients to
access their clinical notes has showed an improvement in their
health care process [Kosten et al., 2012]. Realizing the need
for increased inclusion of patients in their health care
process, large health care systems have allowed for the patients
to access their medical records [Delbanco et al., 2015].
However, these medical records contain raw complex
clinical text intended for the communication between medical
professionals. Paraphrasing or simplification of clinical text
will improve the patients’ understanding of their health
conditions and thereby play an important role in connecting
patients and caregivers across the clinical continuum
towards better patient outcome.</p>
      <p>Traditional clinical paraphrasing and simplification
approaches use lexical methods [Kandula et al., 2010;
Pivovarov and Elhadad, 2015; Qenam et al., 2017], which
are typically focused on identifying complex clinical words,
phrases, or sentences and replace them with their
alternatives in case of paraphrasing or simpler versions in case of
simplification. Lexical methods take advantage of
knowledge sources like Unified Medical Language System
(UMLS) metathesaurus [Lindberg et al., 1993] which
contains grouped words and phrases that describe various
medical concepts. Simplification is traditionally performed by
mapping UMLS concepts to their alternatives provided in
consumer health vocabulary (CHV) [Qenam et al., 2017].</p>
      <p>Recently, paraphrase generation was casted as a
monolingual machine translation problem resulting in the
development of data-driven methods using statistical machine
translation (SMT) [Koehn, 2010], and neural machine translation
(NMT) principles [Koehn, 2017]. SMT methods [Quirk et
al., 2004; Wubben et al., 2010; Zhao et al., 2009] model the
conditional distributions of the words and phrases and
replace the phrases in the source text with the phrases that
maximize the probability of the resulting text. However,
syntactic relationships are difficult to model using SMT
methods. Monolingual NMT systems use neural network
architectures to model complex relationships by
automatically learning from large datasets containing source and
target text pairs, both belonging to the same language.
Current NMT systems for paraphrase generation or
simplification [Brad and Rebedea, 2017; Hasan et al., 2016; Prakash
et al., 2016] use sequence-to-sequence networks based on
encoder-decoder architectures. Unlike traditional methods,
NMTs do not need semantic or syntactic rules to be
explicitly defined. However, they need carefully constructed
datasets that contain sufficient information to robustly train the
deep neural networks.</p>
      <p>Existing clinical paraphrasing and simplification datasets
are limited to short phrases. Hasan et al., (2016) trained an
attention-based encoder-decoder model [Bahdanau et al.,
2015] using a dataset created by merging two word and
phrase level datasets: the
paraphrase
database (PPDB)
[Pavlick et al., 2015] and the UMLS metathesaurus. They
showed that their model outperformed an upper bound
paraphrasing baseline. However, they used a phrasal dataset that
does not contain more complex contextual knowledge like a
sentential dataset, and the ability of the network to simplify
the clinical text was not explored. In contrast to
paraphrasing, simplification is a harder problem and may involve
addition, deletion or splitting of sentences to suite the target
audience. These operations require additional knowledge
that a dataset with longer sequences like sentences or
paragraphs could provide.</p>
      <p>Other studies [Brad and Rebedea,
2017; Prakash et al., 2016] have trained encoder-decoder
architectures with attention for paraphrasing using general
domain sentence level datasets like Microsoft Common
Objects in Context (MSCOCO)
[Lin et al., 2014], Newsela
[Xu et al., 2015] and Wikianswers [Fader et al., 2013]. They
demonstrated that neural machine translation models
successfully captured the complex semantic relationships from
the general domain datasets. However, it is unclear how
these networks would perform on complex clinical text.</p>
      <p>In this paper, our aim is to pioneer the creation of parallel
(with source and target pairs) sentential datasets for clinical
paraphrase generation and simplification.</p>
      <p>Web-based
unstructured knowledge sources like www.mayoclinic.com
contain articles on various medical topics. We obtain
articles
with
matching
titles
from
different
web-based
knowledge sources and align the sentences using various
metrics to create paraphrase and simplification pairs.
Additionally, we train NMT models using the prepared clinical
datasets and present baseline performance metrics for both
clinical paraphrase generation and simplification.</p>
      <p>Next section outlines our approach to create clinical
paraphrase generation and simplification datasets. First, we
discuss our proposed methodology for extracting sentence pairs
from web-based clinical knowledge sources. Then we
describe various metrics to align the pairs of related sentences
for dataset creation. Section 3 discusses the neural network
architectures used for establishing baselines. Sections 4 and
5 present the performance evaluation of the models and in
section 6 we conclude and discuss the future work.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
    </sec>
    <sec id="sec-3">
      <title>Paraphrase pairs from web-based resources</title>
      <p>Web-based textual resources contain large collections of
articles for various medical topics related to diseases,
anatomy, treatment, symptoms etc. These articles are often
targeted for general (non-clinician) users and are easier to
understand unlike the complex clinical reports written by
clinicians. We crawl the articles with same topics from two or
more web-based knowledge sources. Each sentence in a
topic (i.e. in an article) from one resource is mapped to
sentences belonging to the same topic from another resource(s)
using a one-to-many scheme to create all possible sentence
pair combinations. These sentence pairs essentially contain
a large number of unrelated pairs from which meaningful
paraphrasing pairs are identified.</p>
      <p>Manual identification of the relevant paraphrase pairs is a
tedious task as the sentence pair combinations (as discussed
above) contain a large number (in millions) of unrelated
sentence pairs. Therefore, we use an automated approach to
identify the paraphrase pairs from the sentence pair
combinations. Our method is similar to the approach by [Zhu et
al., 2010]. They use TF-IDF [M. Shieber and Nelken, 2006]
metric to align sentences between Wikipedia and
SimpleWikipedia knowledge sources to create sentence pairs for
the text simplification task. However, some studies, e.g. Xu
et al., 2015, reveal the noisy nature of such datasets, which
motivated us to explore various textual similarity/distance
metrics instead of relying on one single metric for sentence
alignment. Our intuition is that the strengths of a collection
of diverse metrics may be useful for better sentence
alignment. In addition to various existing metrics, we train a
neural paraphrase identification model to estimate a similarity
score between two sentences, which is also used as a
supplementary sentence alignment metric.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Sentence alignment</title>
      <p>Paraphrase pairs can be identified by computing various
sentence similarity/distance metrics between the two
sentences in a pair. Various character-level and word-level
metrics that we used are described below.</p>
      <sec id="sec-4-1">
        <title>Levenshtein distance</title>
        <p>Levenshtein distance [Levenshtein, 1966] is defined as the
minimum number of string operations consisting of
additions, deletions, and substitutions of symbols that are
necessary to transform one string into another. Normalized
Levenshtein distance (LDN) is computed by dividing the
number of string operations required by the length of the longer
string. Character- or word-level LDN is calculated by
treating characters or words as symbols respectively:

=
( ,  )
(1)
where N is the minimum number of string operations to
transform a text x to y or vice versa, and n and m are the
number of symbols in the texts x and y respectively.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Damerau-Levenshtein distance</title>
        <p>Damerau-Levenshtein distance [Damerau, 1964] is similar
to LDN and is defined as the minimum number of string
operations needed to transform one string into the other. In
addition to the string operations in Levenshtein distance,
Damerau-Levenshtein distance further includes
transposition
of two
adjacent symbols.</p>
        <sec id="sec-4-2-1">
          <title>Normalized</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>DemerauLevenshtein distance (DLDN) is calculated by dividing the longer string.</title>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Optimal string alignment distance</title>
        <p>Optimal string alignment distance [Herranz et al., 2011] is a
variant of DLDN but under a restriction that no substring is
edited more than once. The normalized form is computed
similarly as in DLDN.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Jaro-Winkler distance</title>
        <p>Jaro-Winkler distance (JWD) [Winkler, 1990] computes the
distance between two strings, where the substitution of two
close symbols is considered more important than the
substitution of two symbols that are far from each other. The
JaroWinkler distance JWD is given by:
= {
 ,

 +   (1 −   ),
 ℎ
where k is the length of the common prefix at the start of
the string up to 4 symbols, p is the constant usually set to
0.1 and dj is the Jaro distance given by:


 −</p>
        <p>+
+
) ,
 ℎ

where q is the number of matching words between the
two texts x and y with lengths  and 
respectively and  is
half of the number of transpositions. Jaro-Winkler distance
is a normalized quantity ranging from 0 to 1.</p>
      </sec>
      <sec id="sec-4-5">
        <title>Longest common subsequence</title>
        <p>Longest
common
subsequence
distance
(LCSD)
[Bakkelund, 2009] is computed using the following
equation:
= 1 −
( ,  )</p>
        <p>&lt; 0
  = 0
(2)
(3)
(4)
(5)
number of string operations by the number of symbols in the</p>
      </sec>
      <sec id="sec-4-6">
        <title>Jaccard similarity</title>
        <p>Jaccard similarity is calculated as the ratio of the
intersection to the union of the items in the two strings.</p>
        <p>Sorensen similarity (also called Sorensen-Dice coefficient)
[Sørensen, 1948] is similar to Jaccard similarity and it is
computed as the ratio of twice the number of common items
(intersection) and the sum of number of items in the two
strings.</p>
        <p>All the above metrics are used in their normalized forms
(values between 0 to 1). These metrics calculate the
similarity/distance between the sentence pairs using the
character- or word-level overlap and the pattern of their
occurrences in the sentences. However, these metrics do not consider
the presence of concepts (e.g. words or phrases) that are
paraphrased using a different vocabulary (e.g. ‘glioma’ can
be paraphrased with its synonym ‘brain tumor’) and also do
not perform well for sentences that differ by a few words
resulting in contradicting sentences. Therefore, we need a
similarity metric that can consider complex semantic
relationships between the concepts represented in the sentences.
Deep neural network architectures
with recurrent neural
networks (RNNs), and convolution neural networks (CNNs)
have
so
far demonstrated
state-of-the-art
performance
[Conneau et al., 2017] in learning semantic associations
between the sentences. Therefore, deep-learning based
systems are increasingly being used for advanced natural
language inferencing tasks like paraphrase identification, and
textual entailment [Ghaeini et al., 2018], which motivated us
to create a neural paraphrase identification model for the
purpose of supplementing our sentence similarity measures
for better sentence pair alignment.
2.3</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Paraphrase identification metric</title>
      <p>Neural paraphrase identification can be stated as a binary
classification task in which a neural network model
estimates the probability that the two sentences are paraphrases.
This estimated probability can be used as a similarity metric
to align the sentence pairs.</p>
      <sec id="sec-5-1">
        <title>Neural paraphrase identification</title>
        <p>The network consists of stacked bidirectional long
shortterm</p>
        <p>memory (BiLSTM) layers in a Siamese architecture
[Dadashov et al., 2017] (Figure 1). Each arm of the Siamese
network consists of three stacked BiLSTM layers. The
outputs of the final BiLSTM layers of both the arms are
concatenated and fed into the dense layer with ReLU activation
followed by a second dense layer with a sigmoid activation
function. We use a depth of 300 for all the BiLSTM layers
and the dense layers. The maximum sequence length of the
BiLSTM layers is set to 30. The words in the input
sentences are embedded using Word2Vec embeddings pre-trained
on the Google news corpus.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Hybrid dataset for paraphrase identification</title>
        <p>Our paraphrase identification model is trained using a
hybrid corpus created by</p>
        <p>merging two paraphrase corpora:
Quora question pairs, and Paralex question pairs. The Quora
question pair corpus [Iyer et al., 2017] consists of 404289
question pairs with 149263 paraphrase pairs and 255027
non-paraphrase pairs. The Paralex dataset [Fader et al.,
2013] consists of 35,692,309 question pairs, where all the
where LCS (longest common subsequence) is the longest
subsequence common to strings x and y with lengths n and
m respectively.</p>
      </sec>
      <sec id="sec-5-3">
        <title>N-gram distance</title>
        <p>N-gram is a contiguous sequence of n items from a given
sample of a text. N-gram distance [Kondrak, 2005] is
similar to computing LCS but in this case the symbols are
ngrams. We used n = 4 in this paper.</p>
      </sec>
      <sec id="sec-5-4">
        <title>Cosine similarity</title>
        <p>Cosine similarity between two strings is computed as the
cosine of the angle between the vector representation of two
strings (x and y) and is given by the equation:
|  |. |  |
question pairs are paraphrases of each other. The Paralex
dataset is unbalanced as it does not contain any
nonparaphrase pairs. After merging the sentence pairs from both
the corpora, we have 35692309 sentence pairs with
35437283 paraphrase pairs and only 255027 non-paraphrase
pairs. To balance the dataset, we identify the list of unique
questions and then randomly select two questions from the
unique questions list and add the pair to the merged corpus
as a non-paraphrase pair if the pair does not already exist.
Non-paraphrase pairs are created until the non-paraphrase,
and paraphrase pairs are equal in number, resulting in a
balanced dataset of 70 million pairs.</p>
      </sec>
      <sec id="sec-5-5">
        <title>Training</title>
        <p>The dataset is preprocessed by removing punctuations,
normalization with respect to case, and standard tokenization.
The tokens are embedded using Word2Vec embeddings
pretrained on Google news corpus [Mikolov et al., 2013].
Words that are not found in the pre-trained vocabulary are
embedded with a zero-vector representing an UNK token.
Longer sentences (&gt; 30 words) are truncated, and smaller
sentences (&lt; 30 words) are padded with UNK tokens. As the
sentences are in a bidirectional relationship to each other,
the training pairs are swapped to increase the dataset size.
The dataset is split into 80%, 10% and 10% for training,
validation and testing respectively.</p>
        <p>The paraphrase identification model is trained using
Adam optimizer [Kingma and Ba, 2014] with Nesterov
momentum [Nesterov, 1983] to optimize a binary cross entropy
loss. The update direction is calculated using a batch size of
512. We utilize early stopping using validation error with
patience of 3 epochs to prevent overfitting.</p>
        <p>The network is trained for 18 epochs before early
stopping at 22 minutes per epoch. The validation accuracy of
our model is 95% and test accuracy is 97%.</p>
      </sec>
      <sec id="sec-5-6">
        <title>Paraphrase identification model for sentence alignment</title>
        <p>The probability score from our paraphrase identification
model for the predicted class is used along with the
wordand character-level similarity/distance metrics to calculate a
mean similarity score. Note that, all normalized distance
metrics are converted into similarity metrics by subtracting
the corresponding score from 1, thereby obtaining 12
different similarity metrics. The mean similarity score is
computed using the formula given below:



= 
(</p>
      </sec>
      <sec id="sec-5-7">
        <title>Wikipedia and SimpleWikipedia</title>
        <p>SimpleWikipedia contains simplified versions of pages from
the original Wikipedia. However, the text in the
corresponding documents is unaligned (no sentence-to-sentence
matching). Pairing sentences from Wikipedia to those in
SimpleWikipedia leads to a parallel corpus for the simplification
dataset as the later mostly contains simplified versions of
the former. However, in the case of paraphrase generation,
the resultant pairs can be swapped as paraphrasing applies in
both directions. The swapping also helps augmenting the
dataset. We create a parallel corpus using 164 matched titles
from Wikipedia from clinically relevant categories such as
anatomy, brain, disease, medical condition etc. Sentences
from each of 164 Wikipedia documents are paired with all
the sentences from the documents with identical titles from
SimpleWiki. Thus, we obtain 818520 sentence pairs for
which we compute similarity scores as discussed in the
previous subsections. We finally obtain 1491 related sentence
pairs after thresholding the mean similarity score and we
name this parallel corpus as WikiSWiki.</p>
      </sec>
      <sec id="sec-5-8">
        <title>Mayoclinic</title>
        <p>Mayoclinic contains pages for 48 identically matched titles
from the 164 titles identified from Wikipedia. Unique
sentences from WikiSWiki were paired with the sentences
obtained from the pages with matched titles from Mayoclinic
and similarity scores are computed. Using the same
thresholds as above, 3203 sentence pairs are selected. These pairs
are added to the WikiSWiki corpus to form a corpus
containing 4694 sentence pairs; we name it WikiSwikiMayo.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>2.5 Simplification dataset</title>
      <p>The WikiSWiki is a simplification corpus as it mostly
contains sentences mapped to their simpler forms. However, the
small number of sentence pairs may be insufficient for
training the network to learn complex relationships required for
clinical text simplification. Therefore, we use additional
web-based knowledge sources to increase the dataset size.
Web-based knowledge sources: www.webmd.com (webmd)
and www.medicinenet.com (medicinenet), are other
clinical knowledge sources that are similar to MayoClinic.
Through manual inspection, we found that webmd contains
simpler sentences than medicinenet in many topics that we
have examined, which is reasonable as medicinenet content
is curated by clinicians. Therefore, we use them as
additional knowledge sources to create our simplification dataset.
For 164 topics from the WikiSWiki dataset we perform a
google search with ‘webmd’ and ‘medicinenet’ as additional
search terms. The search returns 61314 sentences from
webmd and medicinenet for all 164 topics. Sentences from
medicinenet are paired to the sentences from SimpleWiki
and webmd from the articles with matched titles. Sentences
from Wikipedia articles are paired with sentences from
webmd separately as they are already paired with
SimpleWikipedia. We obtain 714608 new sentence pairs
resulting in 1002 final pairs after computing similarity scores and
thresholding. These sentence pairs are merged with
WikiSWiki dataset to create the monolingual clinical
simplification dataset containing 2493 sentence pairs. Although our
final corpus contains a small number of sentence pairs, our
main contribution in this paper is to introduce an automated
method to create sentence pairs from web-based knowledge
sources, towards creating a large clinical simplification
corpus in the future.</p>
    </sec>
    <sec id="sec-7">
      <title>Paraphrase generation and simplification 3</title>
      <p>3.1</p>
    </sec>
    <sec id="sec-8">
      <title>Model</title>
      <p>Sequence-to-sequence models using encoder-decoder
architecture with attention [Vinyals et al., 2015] (Figure 2) are
trained for both paraphrase generation and simplification
tasks. The encoder and decoder are made of three stacked
RNN layers using BiLSTM cells and LSTM cells
respectively. We use a cell depth of 1024 for all the layers in the
encoder and the decoder. The maximum sequence length is
set to 50. The sentences are preprocessed, and the words are
encoded using one-hot vector encoding. The outputs of the
decoder are projected onto the output vocabulary space
using a dense layer with a softmax activation function.
3.2</p>
    </sec>
    <sec id="sec-9">
      <title>Training</title>
      <p>The network parameters are optimized by minimizing a
sampled softmax loss function. The gradients are truncated
by limiting the global norm to 1. The network is trained
using mini-batch gradient descent algorithm with batch size
of 128. An initial learning rate of 0.5 is used with a decay of
0.99 for every step. The training set is shuffled for every
epoch. The networks are trained using 80% of the sentence
pairs and validated on 10% and tested on 10%. Both models
are developed using Tensorflow, version 1.2, and two Tesla
K20 GPUs.</p>
      <p>For paraphrase generation, the network is trained using
the WikiSwikiMayo corpus containing 4694 sentence pairs.
The source and target sentences are swapped as
paraphrasing is bidirectional thereby, doubling the number of
sentence pairs to 9388. The dataset is divided into training,
validation and test sets. The training sentence pairs that contain
sentences from the source side of the test are removed to
prevent data leak issues. Same is repeated for the validation
set. Using this we make sure that any sentence occurs as a
source sentence in exactly one of the sets (training, test or
validation). The number of sentence pairs in training, test
and validation datasets are 6095, 611 and 611 respectively.
The paraphrase generation network is trained for 10000
steps with a batch size of 128 samples per step.</p>
      <p>The simplification corpus containing 2493 sentence pairs
is used to train the simplification network. Vocabularies for
source and target are created separately in case of
simplification. The source and target vocabularies are different in
case of text simplification. As simplification is a
unidirectional task, we do not use data swapping. We prevent data
leak issues using the same procedure as paraphrase
generation while splitting the data. The training, test and validation
sets contain 1918, 187 and 187 sentence pairs respectively.
The simplification network is trained for 3500 steps.
4</p>
    </sec>
    <sec id="sec-10">
      <title>Evaluation metrics</title>
      <p>BLEU [Papineni et al., 2002], METEOR [Banerjee and
Lavie, 2005] and translation error rate (TER) [Snover et al.,
2006] are used to evaluate our models. These metrics are
shown to correlate with human judgements for evaluating
paraphrase generation models [Wubben et al., 2010]. BLEU
looks for exact string matching using n-gram overlaps to
evaluate the similarity between two sentences. METEOR
uses WordNet to obtain synonymously related words to
evaluate sentence similarity. Higher BLEU and METEOR
scores indicate higher similarity. TER score measures the
number of edits necessary to transform the source sentence
to the target. Lower TER score indicates higher similarity.
5</p>
    </sec>
    <sec id="sec-11">
      <title>Results and discussion</title>
    </sec>
    <sec id="sec-12">
      <title>5.1 Sentence alignment</title>
      <p>Table 1 presents a few examples of the aligned sentence
pairs for both clinical paraphrase generation and
simplification.
Example 1: Good
S1: No drug is currently approved for the treatment of
smallpox.</p>
      <p>S2: No cure or treatment for smallpox exists
Example 2: Acceptable
S1: Worldwide, breast cancer is the most common invasive
cancer in women.</p>
      <p>S2: After skin cancer, breast cancer is the most common
cancer diagnosed in women in the United States
Example 3: Bad
S1: Gallbladder cancer is a rare type of cancer which forms
in the gallbladder.</p>
      <p>S2: At this stage, gallbladder cancer is confined to the inner
layers of the gallbladder
Clinical Text Simplification
Example 1: Good
S1: In Western cultures, ingestion of or exposure to peanuts,
wheat, nuts, certain types of seafood like shellfish, milk,
and eggs are the most prevalent causes.</p>
      <p>S2: In the Western world, the most common causes are eating
or touching peanuts, wheat, tree nuts, shellfish, milk, and
eggs.</p>
      <p>Example 2: Acceptable
S1: Together the bones in the body form the skeleton.</p>
      <p>S2: The bones are the framework of the body.</p>
      <p>Example 3: Bad
S1: There are two major types of diabetes, called type 1 and
type 2
S2: There are other kinds of diabetes, like diabetes insipidus.
Mean
Sim.</p>
      <p>Score</p>
      <p>In Table 1, for both paraphrase generation and text
simplification tasks, though the similarity score between the
sentence pairs is similar across all the examples there is a
large variability in the classification of the sentence pair.
This means there is an overlap between the distributions of
the mean similarity score of the paraphrase pairs and the
non-paraphrase pairs. Therefore, the selection of minimum
threshold less than 0.5 introduces more non-paraphrase pairs
into the dataset and by selecting the threshold more than 0.5
we lose a large number of pairs that are paraphrases. One
desirable approach is to train a linear regression or any
multi-variate machine learning model to classify the paraphrase
pairs using all the computed similarity metrics. However,
training such machine learning systems requires
groundtruth data and therefore is outside the scope of this paper.</p>
      <p>Our paraphrase identification system uses a vocabulary
from the Google News corpus dataset. The words that are
not present in this vocabulary are assigned the UNK token.
Therefore, the neural paraphrase identification network is
not sensitive when two semantically similar sentences refer
to different objects. However, this problem is minimized in
our case as we pair the sentences from the pages belonging
dengue fever
pronounced den gay is an
infectious disease caused
by the dengue virus</p>
      <p>Lung cancer often
spreads
(metastasizes) to other parts of
the body, such as the
brain and the bones
to the same topic. Furthermore, using other similarity
metrics that are based on word matching helps in overcoming
this problem in cases where the paraphrase identification
metric is insensitive. We examined that this holds true in
majority of the pairs by visual inspection of the selected
sentence pairs, for both the datasets.
5.2</p>
    </sec>
    <sec id="sec-13">
      <title>Paraphrase generation and simplification</title>
      <p>Average quality scores on the test sets for the clinical
paraphrase generation and the text simplification models are
presented in Table 2. These scores serve as baselines for
clinical paraphrase generation and text simplification for the
datasets that we have created. The quality metrics are lower
for clinical text simplification than the paraphrase
generation. This is expected as in the case of paraphrase generation
many of the words from the source sentence can be retained
in the paraphrased sentence whereas simplification involves
complex transformations which results in different words in
the resulting sentence and hence the quality scores are low.
Further human evaluations are required to better rate the
performance of the simplification model.</p>
      <p>Task
Clinical Paraphrase
Generation
Clinical Text
Simplification</p>
      <p>BLEU
9.4±0.5
9.9±1.6</p>
      <p>METEOR
15.1±0.3
10.6±0.8</p>
      <p>TER
108.7±1.5
97.7±2.9</p>
      <p>Few example outputs of the clinical paraphrase
generation and simplification system are presented in Table 3. The
examples show that both paraphrase generation and
simplification models retained the knowledge of the overall topic
in the generated sentences. Example 2 in both models shows
that, though the topic of the generated sentence matches
with the source, the sentence is not a paraphrase or the
simplification respectively, as the context in the resultant
sentence is different from that of the source. This may be
because of the failure in the alignment of the sentences while
creating the datasets. This shows that the paraphrase
identification model and the metrics were not fully sufficient to
pair the sentences accurately. In particular, the paraphrase
identification model trained on general domain question
pairs may not generalize well to identify paraphrase pairs in
case of clinical texts. The solution may be using transfer
learning and training the paraphrase identification network
on a subset of human rated clinical paraphrases.</p>
      <p>Example 1</p>
      <p>Example 2
Clinical
Paraphrase
Generation</p>
      <p>Source</p>
      <p>Our datasets consist of a small number of sentence pairs
(few thousands) and may not be sufficient for the neural
network models to learn complex clinical concepts.
Furthermore, we use only 164 medical topics from Wikipedia
for this work. Improving the efficiency of paraphrase
identification and inclusion of more knowledge sources and topics
will create larger and better training datasets. Many
sentences that are paired contain text related to additional
information that the other sentence does not contain. For
example:</p>
      <p>Source: “It isn’t clear why some people get asthma and
others don’t, but it’s probably due to a combination of
environmental and genetic factors”.</p>
      <p>Target: “Asthma is thought to be caused by a
combination of genetic and environmental factors”.</p>
      <p>The removal of the additional text in the first part of the
source sentence will improve the training of the neural
network as it can focus more on the important text. The
unwanted text in this example can be easily removed as it is
clearly separated from the rest of the sentence. However,
many sentences that contain unwanted text are not easily
separable. Moreover, manual removal of unwanted text
from thousands of sentences (if not millions) is not
practical. Automated methods are needed to remove unwanted
text during sentence alignment, which would help to create
cleaner datasets.</p>
      <p>Previous research has found that existing simplification
datasets created using Wikipedia-like knowledge sources
are noisy [Xu et al., 2015] as these knowledge sources are
not created with a specific objective. However, task specific
datasets for clinical paraphrase generation and
simplification do not exist as of writing this paper. Therefore, we
approached the creation of such datasets for clinical
paraphrase generation and simplification using web-based
knowledge sources. We hope that this serves as a starting
point towards developing automated approaches for creating
task specific datasets using unstructured knowledge sources.
6</p>
    </sec>
    <sec id="sec-14">
      <title>Conclusion and future work</title>
      <p>This paper presents a preliminary work on automated
methodology to create clinical paraphrase generation and
simplification datasets. We use web-based knowledge sources and
automatically align sentence pairs from matching topics to
create the datasets. Additionally, these datasets are used to
train sequence-to-sequence models leveraging an
encoderdecoder architecture with attention for paraphrase
generation and simplification. Further research to improve string
similarity metrics is required to accurately identify similar
sentence pairs to create cleaner datasets. In future, we will
include more knowledge sources and topics to create larger
datasets and use automated methods to remove unrelated or
unwanted text in the paired sentences.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bahdanau et al.,
          <year>2015</year>
          ] Bahdanau,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.,.</surname>
          </string-name>
          <article-title>Neural Machine Translation By Jointly Learning To Align and Translate</article-title>
          , in: ICLR. pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Bakkelund</source>
          , 2009] Bakkelund,
          <string-name>
            <surname>D.</surname>
          </string-name>
          ,.
          <article-title>An LCS-based string metric</article-title>
          . University of Oslo, Oslo, Norway,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Banerjee and Lavie</source>
          , 2005] Banerjee,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Lavie</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          ,. METEOR:
          <article-title>An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments</article-title>
          , in: ACL. pp.
          <fpage>65</fpage>
          -
          <lpage>72</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Brad and Rebedea</source>
          , 2017] Brad,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Rebedea</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          ,.
          <article-title>Neural Paraphrase Generation using Transfer Learning</article-title>
          , in: INLG. pp.
          <fpage>257</fpage>
          -
          <lpage>261</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Conneau et al.,
          <year>2017</year>
          ] Conneau,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          ,.
          <article-title>Supervised Learning of Universal Sentence Representations from Natural Language Inference Data</article-title>
          , in: CoRR.
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Dadashov et al.,
          <year>2017</year>
          ] Dadashov,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Sakshuwong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          ,.
          <source>Quora Question Duplication 1-9</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Damerau</source>
          , 1964] Damerau,
          <string-name>
            <surname>F.J.</surname>
          </string-name>
          ,.
          <article-title>A Technique for Computer Detection and Correction of Spelling Errors</article-title>
          .
          <source>Commun. ACM 7</source>
          ,
          <fpage>171</fpage>
          -
          <lpage>176</lpage>
          ,
          <year>1964</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Delbanco et al.,
          <year>2015</year>
          ] Delbanco,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Darer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.D.</given-names>
            ,
            <surname>Elmore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.G.</given-names>
            ,
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.J.</surname>
          </string-name>
          ,. Open Notes: Doctors and Patients Signing On.
          <source>Ann. Intern. Med</source>
          .
          <volume>153</volume>
          ,
          <fpage>121</fpage>
          -
          <lpage>126</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>Paraphrase-Driven Learning for Open Question Answering</article-title>
          , in: ACL. pp.
          <fpage>1608</fpage>
          -
          <lpage>1618</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Ghaeini et al.,
          <year>2018</year>
          ] Ghaeini,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Datla</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          et al.
          <article-title>DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference</article-title>
          ,
          <source>in: NAACL HTL</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Hasan et al.,
          <year>2016</year>
          ] Hasan,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          et al.
          <article-title>Neural Clinical Paraphrase Generation with Attention</article-title>
          ,
          <source>in: CNLP Workshop</source>
          . pp.
          <fpage>42</fpage>
          -
          <lpage>53</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Herranz et al.,
          <year>2011</year>
          ] Herranz,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Nin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Sole</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ,.
          <article-title>Optimal Symbol Alignment Distance: A New Distance for Sequences of Symbols</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>23</volume>
          ,
          <fpage>1541</fpage>
          -
          <lpage>1554</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Iyer et al.,
          <year>2017</year>
          ] Iyer,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Dandekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Csernai</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          ,.
          <source>Quora question pair dataset [WWW Document]</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Kandula et al.,
          <year>2010</year>
          ] Kandula,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Curtis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Zeng-Treitler</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q.</surname>
          </string-name>
          ,.
          <article-title>A semantic and syntactic text simplification tool for health content</article-title>
          .,
          <source>in: AMIA</source>
          . pp.
          <fpage>366</fpage>
          -
          <lpage>70</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Kingma and Ba</source>
          , 2014] Kingma,
          <string-name>
            <given-names>D.P.</given-names>
            ,
            <surname>Ba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,.
            <surname>Adam</surname>
          </string-name>
          :
          <article-title>A Method for Stochastic Optimization</article-title>
          , in: ICLR. pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[Koehn</source>
          , 2017] Koehn,
          <string-name>
            <surname>P.</surname>
          </string-name>
          ,.
          <source>Neural Machine Translation. CoRR</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>[Koehn</source>
          , 2010] Koehn,
          <string-name>
            <surname>P.</surname>
          </string-name>
          ,.
          <source>Statistical Machine Translation</source>
          , 1st ed. Cambridge University Press, NY, USA,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Kondrak</source>
          , 2005] Kondrak,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ,.
          <article-title>N-gram similarity and distance</article-title>
          .
          <source>SPIR 115-126</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Kosten et al.,
          <year>2012</year>
          ] Kosten,
          <string-name>
            <given-names>T.R.</given-names>
            ,
            <surname>Domingo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.B.</given-names>
            ,
            <surname>Shorter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Orson</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          et al.
          <article-title>Inviting Patients to Read Their Doctors' Notes: A Quasi- experimental Study and a Look Ahead</article-title>
          .
          <source>Ann. Intern. Med</source>
          .
          <volume>157</volume>
          ,
          <fpage>461</fpage>
          -
          <lpage>470</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[Levenshtein</source>
          , 1966] Levenshtein,
          <string-name>
            <given-names>V.</given-names>
            ,. Binary Codes Capable of Correcting Deletions, Insertions, and
            <surname>Reversals</surname>
          </string-name>
          .
          <source>Sov. Phys. Dokl</source>
          .
          <volume>10</volume>
          ,
          <fpage>707</fpage>
          -
          <lpage>710</lpage>
          ,
          <year>1966</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>[Lin</surname>
          </string-name>
          et al.,
          <year>2014</year>
          ] Lin,
          <string-name>
            <given-names>T.Y.</given-names>
            ,
            <surname>Maire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Belongie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Hays</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Perona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ramanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Dollár</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Zitnick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.L.</given-names>
            ,.
            <surname>Microsoft</surname>
          </string-name>
          <string-name>
            <surname>COCO</surname>
          </string-name>
          :
          <article-title>Common objects in context</article-title>
          , in: ECCV. pp.
          <fpage>740</fpage>
          -
          <lpage>755</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Lindberg et al.,
          <year>1993</year>
          ] Lindberg,
          <string-name>
            <given-names>D.A.</given-names>
            ,
            <surname>Humphreys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.L.</given-names>
            ,
            <surname>McCray</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.T.</surname>
          </string-name>
          ,.
          <article-title>The Unified Medical Language System</article-title>
          .
          <source>Methods Inf. Med</source>
          .
          <volume>32</volume>
          ,
          <fpage>281</fpage>
          -
          <lpage>291</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>[M.</given-names>
            <surname>Shieber</surname>
          </string-name>
          and Nelken, 2006]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shieber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Nelken</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ,.
          <article-title>Towards robust context-sensitive sentence alignment for monolingual corpora</article-title>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>[Madnani and Dorr</source>
          , 2010] Madnani,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Dorr</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.J.</surname>
          </string-name>
          ,.
          <article-title>Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods</article-title>
          .
          <source>Comput. Linguist</source>
          .
          <volume>36</volume>
          ,
          <fpage>341</fpage>
          -
          <lpage>387</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [Mikolov et al.,
          <year>2013</year>
          ] Mikolov,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ,.
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          .
          <source>NIPS 1-9</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>[Nesterov</source>
          , 1983] Nesterov,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          ,.
          <article-title>A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2)</article-title>
          .
          <source>Dokl. AN USSR 269</source>
          ,
          <fpage>543</fpage>
          -
          <lpage>547</lpage>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [Papineni et al.,
          <year>2002</year>
          ] Papineni,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Roukos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Ward</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          ,. BLEU:
          <article-title>a method for automatic evaluation of machine translation</article-title>
          ,
          <source>in: ACL</source>
          . pp.
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [Pavlick et al.,
          <year>2015</year>
          ] Pavlick,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Rastogi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ganitkevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Durme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Van</surname>
          </string-name>
          ,
          <string-name>
            <surname>Callison-Burch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,.
          <source>PPDB 2</source>
          .
          <article-title>0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification</article-title>
          .
          <source>ACL 425-430</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>[Pivovarov and Elhadad</source>
          , 2015] Pivovarov,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Elhadad</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          ,.
          <article-title>Automated methods for the summarization of electronic health records</article-title>
          .
          <source>J. Am. Med</source>
          . Informatics Assoc.
          <volume>22</volume>
          ,
          <fpage>938</fpage>
          -
          <lpage>947</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [Prakash et al.,
          <year>2016</year>
          ] Prakash,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Datla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Qadir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Farri</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          ,.
          <article-title>Neural Paraphrase Generation with Stacked Residual LSTM Networks</article-title>
          , in: COLING. pp.
          <fpage>2923</fpage>
          -
          <lpage>2934</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [Qenam et al.,
          <year>2017</year>
          ] Qenam,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kim</surname>
          </string-name>
          , T.Y.,
          <string-name>
            <surname>Carroll</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogarth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,.
          <article-title>Text Simplification Using Consumer Health Vocabulary to Generate Patient-Centered Radiology Reporting: Translation and Evaluation</article-title>
          .
          <source>J. Med. Internet Res</source>
          .
          <volume>19</volume>
          ,
          <issue>e417</issue>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [Quirk et al.,
          <year>2004</year>
          ] Quirk,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Brockett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dolan</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          ,.
          <article-title>Monolingual Machine Translation for Paraphrase Generation</article-title>
          , in: ACL,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [Snover et al.,
          <year>2006</year>
          ] Snover,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Dorr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Schwartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Micciulla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Makhoul</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ,.
          <article-title>A Study of Translation Edit Rate with Targeted Human Annotation</article-title>
          , in: AMTA. pp.
          <fpage>223</fpage>
          -
          <lpage>231</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <source>[Sørensen</source>
          , 1948] Sørensen,
          <string-name>
            <surname>T.</surname>
          </string-name>
          ,.
          <article-title>A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons</article-title>
          .
          <source>Biol. Skr. 5</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          ,
          <year>1948</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [Vinyals et al.,
          <year>2015</year>
          ] Vinyals,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Koo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ,.
          <article-title>Grammar as a Foreign Language</article-title>
          , in: NIPS,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <source>[Winkler</source>
          , 1990] Winkler,
          <string-name>
            <surname>W.E.</surname>
          </string-name>
          ,.
          <article-title>String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage</article-title>
          , in: ASA. pp.
          <fpage>354</fpage>
          -
          <lpage>359</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [Wubben et al.,
          <year>2010</year>
          ] Wubben, S., van den Bosch, A.,
          <string-name>
            <surname>Krahmer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,.
          <article-title>Paraphrase Generation As Monolingual Translation: Data and Evaluation</article-title>
          , in: INLG, INLG '10. pp.
          <fpage>203</fpage>
          -
          <lpage>207</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [Xu et al.,
          <year>2015</year>
          ] Xu,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Callison-Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Napoles</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          ,. Problems in Current Text Simplification Research:
          <article-title>New Data Can Help</article-title>
          , in: ACL. pp.
          <fpage>283</fpage>
          -
          <lpage>297</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <surname>[Zhao</surname>
          </string-name>
          et al.,
          <year>2009</year>
          ]
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,.
          <article-title>Applicationdriven statistical paraphrase generation</article-title>
          ,
          <source>in: ACL</source>
          . pp.
          <fpage>834</fpage>
          -
          <lpage>842</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [Zhu et al.,
          <year>2010</year>
          ] Zhu,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Bernhard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Gurevych</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          ,.
          <article-title>A Monolingual Tree-based Translation Model for Sentence Simplification</article-title>
          , in: COLING. pp.
          <fpage>1353</fpage>
          -
          <lpage>1361</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>