<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Efficient Keyphrase Generation with GANs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>ioni[</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Moh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>rlo T</string-name>
          <email>carlo.tasso@uniud.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universit`a degli Studi di Udine</institution>
          ,
          <addr-line>Udine</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Keyphrase Generation is the task of predicting keyphrases: short text sequences that convey the main semantic meaning of a document. In this paper, we introduce a keyphrase generation approach that makes use of a Generative Adversarial Networks (GANs) architecture. In our system, the Generator produces a sequence of keyphrases for an input document. The Discriminator, in turn, tries to distinguish between machine generated and human curated keyphrases. We propose a novel Discriminator architecture based on a BERT pretrained model fine-tuned for Sequence Classification. We train our proposed architecture using only a small subset of the standard available training dataset, amounting to less than 1% of the total, achieving a great level of data efficiency. The resulting model is evaluated on five public datasets, obtaining competitive and promising results with respect to four state-of-the-art generative models.</p>
      </abstract>
      <kwd-group>
        <kwd>Keyphrase Generation</kwd>
        <kwd>GAN</kwd>
        <kwd>Reinforcement Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        A keyphrase is a sequence of words that summarizes the content of a whole
document and expresses its core concepts. High quality keyphrases (KPs) can
facilitate the understanding of a document and they are used to provide and
retrieve information regarding the whole document on a high level. The world
wide growth of digital libraries has made the task of automatic KP prediction
both useful and necessary. There are many topics in which such ability could be
positively applied, such as text summarization [34], opinion mining [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], document
clustering [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], information retrieval [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and text categorization [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>KPs can be either present or absent. Present KPs are exact substrings of the
document and can be extracted from its text, while absent KPs are sequences of
Copyright c 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). This volume is published
and copyrighted by its editors. IRCDL 2021, February 18-19, 2021, Padua, Italy.
words which do not exist in the text, but can be abstracted from its contents.
They are also referred to as extractive and abstractive KPs, respectively. The
research community has made great efforts in the task of predicting KPs so
far, and the proposed solutions all rely on the following two main approaches:
1. extraction of sequences of words from the document (automatic extractive
methods); 2. generation of words and phrases related to the document (automatic
abstractive methods).</p>
      <p>
        Extractive approaches are only able to deal with present KPs [
        <xref ref-type="bibr" rid="ref18 ref29">29, 18, 33</xref>
        ]: the
greatest drawback in this case is that predicted KPs are related to the written
content of the source document, and not to its semantic meaning.
      </p>
      <p>
        The other main approach is the abstractive one, which has been introduced
to address the limitations of the extractive approaches and to better mimic the
way humans assign KPs to a given document. Despite it being a more recent
line of research, there are a good number of studies tackling this problem [
        <xref ref-type="bibr" rid="ref19 ref3 ref4">19,
3, 4</xref>
        ]. Abstractive methods are designed to produce sets of KPs which are not
strictly related to the words of source text. In principle this method could be
used to predict both absent and present KPs from a given source text.
Generative models are best suited for abstractive approaches. A lot of examples can
be found in literature in which such kind of models are used, mainly leveraging
the Encode-Decoder framework [
        <xref ref-type="bibr" rid="ref19 ref3">19, 3</xref>
        ]. This architecture works by compressing
the contents of the input (e.g. the text document) into a hidden representation
using an Encoder module. The same representation is then decompressed using
the Decoder module, which returns the desired output (e.g. a sequence of KPs).
Recently, Generative Adversarial Networks (GANs) have been introduced in text
generation task [31], and in particular in keyphrase generation [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. GANs are
based on an architecture that simultaneously trains two models: a generative
model that captures the data distribution, and a discriminative model that
estimates the probability that a sample came from the (real) training data rather
than from the generator. The aforementioned approaches rely on the use of large
datasets to perform training, with a high consumption in terms of computational
resources.
      </p>
      <p>In this paper we introduce a new GAN architecture for keyphrase generation
with a focus on data efficiency: the aim is to only use a small subset of the
training data and still achieve reasonably good results. The main contribution of
our approach is the introduction of a novel Discriminator model based on BERT
that is able to distinguish between human and machine-generated keyphrases
leveraging on the powerful language model of BERT. A Reinforcement Learning
strategy has been used in our architecture to overcome the problems given by
the direct application of GAN in text generation. Our architecture achieved
competitive results using only 1% of the available training samples compared to
previous approaches.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <sec id="sec-2-1">
        <title>Automatic Keyphrase Extraction</title>
        <p>
          Many different extractive approaches have been proposed in the literature, but
most of them consist of the following two steps. Firstly, a reasonable number
of KP candidates are extracted. The number of candidates usually exceeds the
number of correct candidates and it is selected using heuristic methods.
Secondly, a ranking algorithm is used to give a score to each candidate based on
the source text. This whole process can be performed either in a supervised or
unsupervised fashion. For supervised methods, this task is treated as a binary
classification task [
          <xref ref-type="bibr" rid="ref21 ref26">26, 21</xref>
          ], and gives positive scores to the correct candidates in
the list. Unsupervised methods aim to find central nodes of the text graph [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ],
or detect phrases from topical clusters [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>There are also other studies that differ from the previously described pipeline.
For example the authors in [33] applied an alignment model to learn the
conversion from the source text to target KPs. Also, recurrent neural network have
been used to build sequence labeling models to extract KPs from tweets [33].
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Automatic Keyphrase Generation</title>
        <p>
          Abstractive methods represent an important approach which is gaining a
growing attention, as they allow to generate results which are more in the line of
human expectations. Sequence-to-sequence (Seq2seq) models showed great success
in keyphrase generation and can generate human-like results. They are based
on the Encoder-Decoder framework, where the Encoder generates a semantic
representation of the source text and the Decoder is responsible for
generating target texts based on such semantic representation. CopyRNN [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] was the
first specific Encoder-Decoder model to be used in the topic of keyphrase
generation; it incorporates an attention mechanism. CorrRNN [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] was introduced
later and focused on capturing the correlation between KPs. TG-Net [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] exploits
the information given by the title to learn a better representation for the input
documents. Chen et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] leveraged extractive models to improve the
performance of the (abstractive) keyphrase generation one. Ye et al. [30] proposed a
semi-supervised approach considering a limited training dataset to improve the
performance. All the previous approaches used the beam search algorithm to
generate large number of KPs from which to choose the k-best ones as final
predictions. CatSeq and CatSeqD [32] were the first two recurrent generative
models with the ability to predict the appropriate number of predicted KPs for
each document (instead of predicting a fixed number of KPs for each sample).
CatSeq proposed several novelties. Firstly, an orthogonal regularization module
to prevent the model from predicting the same word after generating the KP
separator token. Secondly, semantic coverage, a self-supervised technique with
the aim of enhancing the semantic content of the predictions.
        </p>
        <p>
          Reinforcement Learning has been used in a wide range of text generation
tasks [
          <xref ref-type="bibr" rid="ref22 ref28">28, 22</xref>
          ]. The generative models CatSeq, CatSeqD, CorrRNN and
TGNet have been improved by applying a Reinforcement Learning (RL) approach
with adaptive reward to produce their improved versions catSeq-2RF1,
catSeqD2RF1, catSeqCorr-2RF1 and catSeqTG-2RF1 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], the authors propose a
keyphrase generation approach using Generative Adversarial Networks (GAN)
conditioned on scientific documents. The architecture is composed of a CatSeq
model as Generator and a hierarchical attention-based model as Discriminator.
This was the first attempt to apply GAN in the Keyphrase generation task. This
approach was able to show improvements in the generation of abstractive KPs,
but no significant improvements in extractive KPs.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>The proposed approach</title>
      <p>The novelty of the approach presented in this paper is two-fold: first, we
introduce a BERT Discriminator as part of a Generative Adversarial Networks
(GAN) architecture for the keyphrase generation task; and second, we train our
system with only a small amount of the available data to pursue data efficiency.</p>
      <p>A general overview of the implemented system is given in Figure 1. It is
based on two main components: a state-of-the-art Generator that relies on the
Encoder-Decoder model and is able to generate a list of KPs for a given input
text, and the new BERT-based Discriminator that is trained to separate the true
KPs from the fake ones by giving them a score: the higher the score, the more
likely is the keyphrase list to be real.</p>
      <p>To overcome the well known problems of differentiability that arise when
employing GAN architectures for text generation, Reinforcement Learning (RL)
paradigm is adopted for training the system [31].
3.1</p>
      <sec id="sec-3-1">
        <title>Formal Problem Definition</title>
        <p>A source document x and the related list of M ground-truth keyphrases y =
(y1, y2, . . . , yM ) (True KPs) are represented by the pair (x, y). Both x and yi
are sequences of words:
x = x1, x2, . . . , xL
yi = y1i, y2i, . . . , yKii
where L and Ki are the number of words of x and of its i-th KP respectively.</p>
        <p>A keyphrase generation model will predict a set of keyphrases yˆ =
(yˆ1, yˆ2, . . . , yˆN ) (Fake KPs) with the aim to reproduce the true ones, so that
yˆ ≡ y.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Details of the System</title>
        <p>
          Generator The task of the Generator G is to take a source document x and
generate a sequence of predicted KPs yˆ. For our system we chose catSeq [32]
that is based on the CopyRNN [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], a generative model optimized for KP
generation. It introduces the ability to predict a sequence of KPs that is obtained by
concatenating together target KPs separated by a special token. In this way the
training schema moves from one-to-many to one-to-seq, and the system can be
trained to generate a variable number of KPs. It also employs the Copy
Mechanism [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] to deal with long-tail words, which are the less frequent words in the
vocabulary of the input samples. They are removed to gain efficiency during
training, but being frequently very specific for the topic of the document, they
could be part of KPs. The Copy Mechanism employs a positional attention to
give a score to the words surroundings the ones which were removed, recovering
the best scoring ones. Implementation relies on a bidirectional Gated Recurrent
Unit (GRU) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] for the encoder, and a forward GRU for the decoder.
(
(
(
x
x
x
, y… ) treu
, … )
, y^… ) faek
^
y
Generator
        </p>
        <p>x
( x , y… )
(
, y… )</p>
        <p>Discriminator</p>
        <p>Reinforcement</p>
        <p>Learning
Cycle
rewards
high = real KP
low = fake KP
regressiron score
⇑ Regression Layer ⇑
Embedding[Document, KP 1, ..., KP M]</p>
        <p>⇑ Output Aggregation ⇑
... H[SEP] ... ; ... ; ...</p>
        <p>⇑ BERT Modelling ⇑
... [SEP] ... ; ... ;
⇑ Input Preparation ⇑
y11 ... yK11 ... y1M ... yKMM</p>
        <p>Keyphrases (KP 1, ..., KP M)</p>
        <p>H[SEP]
... [SEP]</p>
        <p>Discriminator The Discriminator D is basically a binary classifier whose aim
is to separate the true samples (x, y) from the fake ones (x, yˆ). It performs this
task by computing a regression score for each sample, giving a high value to the
reputedly real samples and a low value to the others.</p>
        <p>
          We introduce a novel BERT-based model for our Discriminator. BERT [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
is part of the Transformer architecture, and since its introduction has achieved
state-of-the-art results in many Natural Language Processing tasks. Our
implementation is based on a BERT pretrained model, fine-tuned for Sequence
Classification. The input samples are processed in four steps as (see Figure 1):
1. Input Preparation. Input pairs (x, y) are first lower-cased and tokenized,
then the tokens are concatenated together in the form
        </p>
        <p>[CLS]&lt;x&gt;[SEP]&lt;y1&gt;&lt;;&gt;...&lt;;&gt;&lt;yn&gt;[SEP]
where &lt;x&gt; and &lt;yi&gt; are the sequences of tokens of the input document and of
the i-th KP; [CLS] is the BERT special token marking the start of the sequence;
[SEP] is the BERT special token for marking the end of the sequence, also used
to separate the input document from the list of related KPs; and the semicolon
&lt;;&gt; is the KP separator.</p>
        <p>2. BERT Modelling. The prepared input sequence is passed through the 12
consecutive Encoder blocks of the pretrained BERT model. Pretrained weights
act as initialization, and are optimized during training. Since BERT processing
is positional, each input token is mapped to its corresponding output.</p>
        <p>
          3. Output Aggregation. Output tokens are averaged together to obtain an
Embedding of the whole input sequence. Note that generally when using a
BERTbased model, the output of the [CLS] token is usually considered as a sentence
embedding. Nevertheless, we averaged all the output tokens, as this aggregated
value has proven to be a better estimate of the semantic content of the input
(see also [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]).
        </p>
        <p>4. Regression Layer. The sentence embedding is passed through a dense layer
that evaluates a Regression score. This score is used to perform the classification
of the input samples: the higher the score is, the more probable that the sample is
a real one. The same score is also used as the Reward given by the Discriminator
to the Generator in the Reinforcement Learning schema.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Reinforcement Learning with Policy Gradient We follow the Reinforce</title>
        <p>
          ment Learning paradigm to train the system, as proposed in [
          <xref ref-type="bibr" rid="ref24">31, 24</xref>
          ].
        </p>
        <p>In detail, we consider the Generator G as an agent whose action a at step t is
to generate a word yˆt which is part of the set yˆ of predicted KPs for the document
x. Action a is performed following the policy π(yˆt|st, x, θ) that represents the
probability of sampling yˆt given the state st = (yˆ1, . . . , yˆt−1), the sequence of
words generated up to step t − 1. The policy is differentiable with respect to
the parameters θ of G. As the agent G generates the predicted list of KPs, the
Discriminator D, that plays the role of the environment, evaluates them and
gives back a reward:</p>
        <p>R(yˆ) = r(yˆT |sT ) = D(yˆ|x)
where r(yˆt|st) is the expected accumulative reward at step t and T denotes
the steps needed to generate the whole prediction yˆ. The aim of the agent G is
to maximize the function J (θ) defined as the expected value of the final reward
under the distribution of probability given by the policy π:</p>
        <p>J (θ) = Eπ [R(yˆ)] = X r(yˆ|s) · π(yˆ|s, x, θ)</p>
        <p>yˆ</p>
        <p>
          The gradient of J (θ) is evaluated by means of the policy gradient theorem
and the REINFORCE algorithm [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]:
∇J (θ) = Eπ
" T
        </p>
        <p>X r(yˆt|st) · ∇log π(yˆt|st, x, θ)
t=1
#</p>
        <p>Expectation Eπ in Equation 3 can be approximated by sampling with yˆ ∼
π(·|x, θ). Then, defining the loss function of G as L(θ) = −J (θ), an estimator of
its gradient is:
∇L(θ) ≈ − X r(yˆt|st) − bt · ∇log π(yˆt|st, x, θ)
t
(1)
(2)
(3)
(4)</p>
        <p>
          A regularization term bt has been introduced as an expected accumulative
reward evaluated on a greedy decoded sequence of predictions, as suggested in
[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Its aim is two-fold: to lower the variance of the process, and to support the
predictions with a higher reward with respect to the greedy decoded sequence.
GAN Training Training is an iterative process in which G and D are trained
separately, see Algorithm 1.
        </p>
        <p>At the first step, an initial version G0 of the Generator is trained using
Maximum Likelihood Estimation (MLE) loss. Its predictions (x, yˆ) together with
ground truth (x, y) are used to train the first version of the Discriminator D0
with Mean Squared Error (MSE) loss. The regression scores evaluated by D0 are
then employed in the training of next Generator G1, using the RL optimization
as defined in Section 3.2. All the subsequent generators Gi are trained in the same
way by means of the rewards given by Di−1. Discriminators are always trained
with MSE loss. g-steps and d-steps refer to the updating iterations during G and
D training.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Algorithm 1: GAN training</title>
        <p>Data: Samples (x, y)
Pre-train G0 with MLE loss; generate yˆ0 = G0(x);
Pre-train D0 with MSE loss; evaluate D0(y) and D0(yˆ0);
while Di(ˆy) ≪ Di(y) do
i=i+1;
for g-steps do</p>
        <p>Generate predictions: yˆ = Gi(x);
Evaluate rewards: R = Di−1(yˆ);</p>
        <p>Update Gi with Policy Gradient RL maximizing R;
end
for d-steps do</p>
        <p>Generate predictions: yˆ = Gi(x);
Evaluate Di(y) and Di(yˆ);</p>
        <p>Update Di with MSE loss;
end
end</p>
        <p>G is evaluated on test datasets
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <sec id="sec-4-1">
        <title>Datasets</title>
        <p>Five well known datasets largely used in literature have been considered in this
work:</p>
        <p>
          KP20k [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] 567,830 titles and abstracts from papers about computer science;
of them, 20,000 samples are usually employed for testing, another 20,000 for
validation, and the remaining 527,830 samples for training. This is the only
dataset used for training duties. In our data-efficient training approach we only
use 2,000 out of the &gt;500,000 training samples.
        </p>
        <p>
          Inspec [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] 2,000 abstracts from disciplines: Computers and Control, and
Information Technology. Only 500 samples are used for testing.
        </p>
        <p>
          Krapivin [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] 500 articles about computer science. Since no hint is given by
the authors on how to split testing data, the first 400 samples in alphabetical
order are taken for testing.
        </p>
        <p>
          NUS [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] 211 papers selected from scientific publications; used for testing.
        </p>
        <p>
          Semeval2010 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] 288 articles from the ACL Computer Library, of whose
100 are used for testing.
        </p>
        <p>
          Some statistics about test samples are given in Table 1. Procedures that are
standard protocol in KP generation are applied to data (see for example [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]):
all duplicate documents are removed from training set; for each document the
sequence of KPs is given in order of appearance; digits are replaced with the
&lt;digit&gt; token; out of vocabulary words are replaced with the &lt;unk&gt; token.
        </p>
        <p>
          The vocabulary of the generator VG consists of the 50,000 most frequent
words in the training dataset. The vocabulary of the discriminator VD is the
one of the pretrained BERT base uncased (english version): it contains 30,522
words and chunks of words (called wordpieces) eventually used to compose all the
possible flections (e.g.: ’hexahedral’ is tokenized as ’he’, ’##xa’, ’##hedral’).
Optimization of generator G is performed with Adam [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. G0 is trained with
MLE loss and a batch size of 12; following Gi are trained with RL optimization
and batch size of 32. Optimization of the discriminator D is performed with
AdamW optimizer [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. It is trained with MSE loss and a batch size of 3. For the
Discriminator, we refer to the BERT implementation provided by huggingface
[
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]1. The model is a bert-base-uncased with 12 layers, 12 attention heads, and
hidden size of 768. Input sequences are trimmed to 384 tokens. The model is
fine-tuned for Sequence Classification with one label (regression). Training and
testing run on a PC with a Titan RTX GPU, 24GiB.
1 https://github.com/huggingface/transformers
4.3
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Comparative Results</title>
        <p>
          The system has been trained with 2,000 samples randomly extracted from
KP20k training dataset, and then evaluated using the five baseline datasets
described in Section 4.1. A comparison has been carried out with four
state-ofthe-art approaches, namely catSeqD [32]; catSeqCorr-2RF1 and catSeqTG-2RF1
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], and GAN [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. Results are shown in terms of F 1 score: F 1@5 is evaluated
over the top 5 high scoring KPs, while F 1@M takes into account all the
predictions. Results are shown in Table 2.
        </p>
        <p>Our approach achieves competitive results with respect of the above
mentioned models. In depth, it is by far the best on Inspec, both in F 1@M and
F 1@5 scores. Also it performs very well on Semeval2010, where we match the
best F 1@M score and are close to the best F 1@5. Note that Semeval2010 is
the smallest of the five testing datasets and contains the smallest amount of
target KPs. This makes it a very difficult test set to perform well on.</p>
        <p>We point out that our approach shows good performance in the F 1@5 score.
Since the F 1@5 score is evaluated taking the best 5 predicted KPs, we claim
that our approach is able to generate good quality KPs.</p>
        <p>A final consideration has to be made about Equation 3: the expectation
of the policy function is evaluated using only complete sequences yˆ, and this
determines relatively large oscillations in the ∇J , inducing instability in the
training process [31]. Our system has shown a great efficiency in dealing with
this problem, even in a training scenario characterized by the scarce availability
of resources in terms of data. In fact, we observed a trend to a quick convergence
of the training, obtaining the best results at second iteration of the Generator
(G2). We consider that this quick convergence was achieved due to the strength
of the language model embedded in the architecture.
Ranking Analysis In order to better analyze our method of data-efficient
training, we performed a comparison in terms of ranking metrics between our
system (trained on 2,000 samples) and the original catSeq model (trained on the
whole training set). Two evaluation measures have been used: the Mean Average
Precision MAP and the normalized Discounted Cumulative Gain nDCG.</p>
        <p>MAP is defined as the mean of the average precision P scores evaluated for
each set of predicted KPs. It is a measure of the proportion of relevant KPs
among the predicted ones. nDCG is a measure of the usefulness, or gain, of a
document based on its position, or rank, in the result list. It is widely used in
information retrieval, specifically in web search and related tasks. For both the
metrics the higher the scores, the better the accordance between the relevance
of the predicted KPs with respect to the real ones.</p>
        <p>Results are shown in Table 3. For four out of five datasets and for both
measures, our approach achieves better or nearly the same results than the
baseline catSeq, clearly showing the strength of our method.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper we presented a system for Keyphrase Generation using a GAN
architecture with Reinforcement Learning. Thanks to the characteristics of our
approach, we have been able to train the system in a data-efficient way using
only a small fraction of the available data. We tested it on five baseline datasets,
achieving results that are competitive with some state-of-the-art generative
models. To the best of our knowledge, this is the first attempt to train such a complex
architecture for the demanding task of Keyphrase Generation in an scenario in
which only a small amount of data is available.
30. Ye, H., Wang, L.: Semi-Supervised Learning for Neural Keyphrase Generation.</p>
      <p>arXiv preprint arXiv:1808.06773 (2018)
31. Yu, L., Zhang, W., Wang, J., Yu, Y.: SeqGAN: Sequence Generative Adversarial</p>
      <p>Nets with Policy Gradient. In: AAAI (2016)
32. Yuan, X., Wang, T., Meng, R., Thaker, K., He, D., Trischler, A.: Generating
Diverse Numbers of Diverse Keyphrases. ArXiv:abs/1810.05241 (2018)
33. Zhang, Q., Wang, Y., Gong, Y., Huang, X.: Keyphrase Extraction Using Deep</p>
      <p>Recurrent Neural Networks on Twitter. In: EMNLP (2016)
34. Zhang, Y., Zincir-Heywood, A.N., Milios, E.E.: World Wide Web site
summarization. Web Intelligence and Agent Systems (2004)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Berend</surname>
          </string-name>
          , G.:
          <article-title>Opinion Expression Mining by Exploiting Keyphrase Extraction</article-title>
          . In: IJCNLP (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>King</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards</article-title>
          . In: ACL (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Keyphrase Generation with Correlation Constraints</article-title>
          . In: EMNLP (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bing</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>King</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction</article-title>
          . In:
          <string-name>
            <surname>NAACL-HLT</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>King</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lyu</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          :
          <article-title>Title-Guided Encoding for Keyphrase Generation</article-title>
          . In: AAAI (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cho</surname>
            , K., van Merri¨enboer,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulcehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bougares</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation</article-title>
          . In: EMNLP (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          . In:
          <string-name>
            <surname>NAACL-HLT</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>V.O.K.</given-names>
          </string-name>
          :
          <article-title>Incorporating Copying Mechanism in Sequenceto-Sequence Learning</article-title>
          . In: ACL (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hammouda</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matute</surname>
            ,
            <given-names>D.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kamel</surname>
            ,
            <given-names>M.S.:</given-names>
          </string-name>
          <article-title>CorePhrase: Keyphrase Extraction for Document Clustering</article-title>
          . In: MLDM (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hulth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Improved Automatic Keyword Extraction Given More Linguistic Knowledge</article-title>
          . In: EMNLP (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hulth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Megyesi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A Study on Automatically Extracted Keywords in Text Categorization</article-title>
          . In: ACL (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staveley</surname>
            ,
            <given-names>M.S.:</given-names>
          </string-name>
          <article-title>Phrasier: A System for Interactive Document Retrieval Using Keyphrases</article-title>
          . In: SIGIR (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medelyan</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>M.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
          </string-name>
          , T.:
          <article-title>SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles</article-title>
          . In: Workshop on Semantic Evaluation (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          . In: ICLR (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Krapivin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Autaeu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marchese</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Large Dataset for Keyphrases Extraction</article-title>
          .
          <source>Technical Report DISI-09-055</source>
          , University of Trento (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Clustering to Find Exemplar Terms for Keyphrase Extraction</article-title>
          . In: EMNLP (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Loshchilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Fixing Weight Decay Regularization in Adam</article-title>
          .
          <source>ICLR</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Luan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ostendorf</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajishirzi</surname>
          </string-name>
          , H.:
          <article-title>Scientific Information Extraction with Semi-supervised Neural Tagging</article-title>
          . In: EMNLP (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Han,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Brusilovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Deep Keyphrase Generation</article-title>
          . In: ACL (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tarau</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>TextRank: Bringing Order into Text</article-title>
          . In: EMNLP (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          , T.D.,
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Keyphrase Extraction in Scientific Publications</article-title>
          . In: ICADL (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chopra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaremba</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Sequence Level Training with Recurrent Neural Networks</article-title>
          . In: ICLR (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Rennie</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcheret</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mroueh</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ross</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Self-Critical Sequence Training for Image Captioning</article-title>
          . In: CVPR (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Swaminathan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>R.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>Mahata</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gosangi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          :
          <article-title>Keyphrase Generation for Scientific Articles using GANs</article-title>
          . In: AAAI (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          :
          <article-title>Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning</article-title>
          .
          <source>Machine Learning</source>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paynter</surname>
            ,
            <given-names>G.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutwin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nevill-Manning</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          :
          <article-title>KEA: Practical Automatic Keyphrase Extraction</article-title>
          . In: ACM (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debut</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaumond</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delangue</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cistac</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rault</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Funtowicz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brew</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>HuggingFace's Transformers: Stateof-the-art Natural Language Processing</article-title>
          . ArXiv:abs/
          <year>1910</year>
          .03771 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norouzi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macherey</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krikun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macherey</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klingner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gouws</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kato</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kudo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazawa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurian</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patil</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Young</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riesa</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudnick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hughes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation</article-title>
          .
          <source>CoRR</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Ye</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Semi-Supervised Learning for Neural Keyphrase Generation</article-title>
          . In: EMNLP (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>